Friday, April 29, 2011

Historian switching from Spreadsheet to MySQL

She has a new MAC and assumes that it may not run her spreadsheet app:

My reply:

But, surely there are spreadsheets for Mac which should function much like Lotus or Excel UNLESS you write complex VBA (Visual Basic Application) macros to execute in the background. The VBA language which is very powerful and yet counter intuitive and requires much training and practice (which I never achieved) that VBA is the only difference between MS spreadsheet and other spreadsheets. Plus, the old Mac Book I received has Microsoft Excel on it. Also it is said that Macs CAN run MS software as well. I must say I am impressed by the possibilities of MySQL as I begin to consider its abilties, especially that their are small, medium and LONG text fields (sometimes called BLOBS) as well as the indexing of fields for searches. I think that a PhP routine could be written to read all the documents in a given folder, analyze them, and insert them into MySQL records. It could be driven by commands which you put in the document such as $!Author $!Publication $!Date $!ArchivedAt (I am just making stuff up and making up a trigger sequence to tell the program that it has encountered another command to take action upon.) Some people would call $! "dollar bang" bang being the explanation point. I bet you could find some on line group of programmers around the world, and start your own small community of volunteers who would help you build an open source application designed for the kind of work you need to do. 

IF you could get documents scanned in machine readable format, then you could have an analyzer program that would plow through 800 pages, build an INDEX of key words and select off those PAGES which are crucial to whatever study. THEN that subset of data could be fed into the primary application.

I do not have handy the character lengths of the types of fields but the LONGTEXT seems like it could hold the Britannica. The mediumtext seems like it could hold War and Peace. The short text would hold a Senior Essay (I am just guessing). But one interesting question is WHAT sort of tables, containing what sort of data types, would be the minimal to do the job you do and yet achieve compact database size and efficiency.

As to test data, I was thinking that IF I get MySql to the stage of some prototype, then I would simply grab the open source online text of something like a Dickens novel, or the Bible, or Gibbon, ... there are no shortage of open source on line texts in Guttenberg.org and other sites. What might be MORE interesting is if one did testing on something which had a mixture of legal contracts, deeds, commercial invoices inventories, tax assessments, census data, student records..... the idea being that an application which can gain a handle on such diverse and technical info would be more useful to an historian than something which has been tested on a novel or history. Such an application would have a list of throw-away words like "a" "and" "the" "those" and would build an index of significant words like "guilty" "debt" "born" "purchased" "credit" and a tally of the number of occurrences, together with page and volume and storage box/warehouse/library numbers to get back to the original source.

We see all the time how Google is just such an engine of search in the hands of people such as you, me, Ruth Johnston and any number of people with a scholarly bent either amateur or professional. 

I want to close by stressing that to my knowledge the MAC should be able to run all sorts of "PC" programs with no difficulty.


Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?