There is Always, Always, Always an Easier Way
Even before we had gotten hold of the sample database, we knew that data export, import, and manipulation somewhere between was going to be a requirement. Dr. Kreie made exportation/importation a requirement for our course, in fact, but it was up to each group to figure out how best to do this. Most groups settled with the tried-and-true method of search-and-replace in Excel.
Here’s where I’ll be honest: I hate manipulating data by hand. Search-and-replace? Too archaic. Plus, unlike our peers, I was well aware of Excel’s inherent limitations. Recall from earlier on (page one to be exact) where I described the extent of the data we were given. This presents a conundrum: Excel’s internal limitation is around 64,000 rows (65535 to be exact) and our sample data had two tables that exceeded this amount. The obvious solution was to trim down the data set, but this would mean that the database we created would only be a sample. Our group decided that we didn’t want to discard anything. We wanted our database to be a full representation of all the data we were given. I was probably the one to blame for imposing this requirement. Although, I still haven’t any idea who was nuttier: myself for having thought up such an outrageous notion or my group mates for having listened to me in the first place.
Some things are better left as an exercise to the reader.
The initial concern was that exporting, manipulating, and importing the sample data would create a significantly increased workload for our team. I reassured everyone that it wasn’t terribly difficult, and that the additional work would give our project higher ratings for having gone the extra mile. Besides, I had already discovered when Anthony and I were playing around with the source data that Access was capable of exporting individual tables in a comma-separated value (CSV) format. I also had available other tools that I hold a strong affinity for, not the least of which happens to be Python. (It’s a beautiful language, and translating data from one specification to another takes so few lines it’s almost a sin.)
I Play with Snakes
Python is a great language for rapid prototyping or for writing quick and dirty scripts that have broad requirements. Most classical sysadmin-types would probably resort to Perl, however. Perl is an interesting language (and useful!), but when it’s used for small scripts that slowly balloon in both size and features, it tends toward becoming a write-once implementation. Python by nature of its design tends to remain infinitely readable, even when an application’s feature set creeps far beyond what was originally intended. For these reasons and the sake of maintainability (I really didn’t know if I would need to import more data in the future), I chose Python for manipulating the data we exported from the sample database.
Several transformations were required before the exported data could be imported into the final database structure. Tables had to be normalized; data had to be extracted and placed into holding for new tables; primary keys had to be added, altered, and changed; and the new keys had to be mapped into existing tables, new tables, or modified to work with a convoluted mix of the two. The great thing was that Python’s standard library already comes with tools to manipulate CSV files, which we created from Access during the exportation. (Again, we couldn’t use Excel to manipulate the data–there were too many rows.) The Python scripts I wrote would open the exported CSVs, perform the transformations required by the destination tables, then create new CSVs to use for importation. The process was far from automated, but I think it was the best approach given our own requirements and the time limitations we were confined to.
As far as I am aware, no other groups in the class bothered with changing the fundamental structure of the source data. We did, and I think that the extra work we performed up front with restructuring the entire database ultimately saved us time. I recall more than one group asking Dr. Kreie about the meaning of a particular column throughout the project. We never had to ask such questions once our initial design was finished; those questions were asked early in the design process precisely because we wanted to understand how the data was structured so we could improve upon the structure and fix the design mistakes that had been made.
Here’s a brief aside. I think it’s important to ask questions. Not only does this improve your understanding of the system you’re intending to replace, but it also helps you work out minor details if you do need to start over. In many cases, the source data is going to be so severely tainted that working with it as is simply won’t work; you have to make the time sacrifice before doing any real work on other aspects of the database. I sincerely believe that our project was chosen because we went “out of our way” to do the right thing. I didn’t see it as an inconvenience, mind you. I saw the additional work as part of the project–something that had to be done before we could continue.
Anthony and I were also surprised by one significant advantage to restructuring the database, too. We discovered that it made creation of the user forms for data entry, search, and display much easier.
The scripts I used for importing data will be posted to the appendix. There will be links in this section in addition to links at the beginning of the document.
To Build an Interface
While I was hashing out the details of importing data into our database, Anthony spent his weekend working out the structure for our project’s user interface. He must have sent me dozens of iterations of the design for review (though I hadn’t a clue about best practices). I remember feeling guilty that he had been dumping so much time into the user interface while I was doing something “fun” with data importation. I’m sure he still considers me a headcase to this day!
I’ll confess. I hadn’t the first clue about designing a useful interface when we started the project. In fact, I still don’t think I do! (My .NET course did help, but I didn’t take that until the semester following this one.) Anthony was a natural at it, and he busted his tail working on the design of our application. Good thing, too, because he cleaned up a number of mistakes I made when I started playing with the underlying code.
When we started the project, I had little experience with any dialect of Visual Basic. Worse, Visual Basic for Applications is a bastardized version of VB. I remember that I wasn’t looking forward to the requirement of writing VBA for our backend, but I also saw it as a challenge. So, while Anthony was polishing up about a dozen different forms and menus (I’m not kidding, he worked insanely fast on creating a beautiful UI), I began the tedious part of learning Visual Basic.
VBA is an oddly mixed bag of assorted fruits and nuts. The syntax is awful, but it’s similar enough to BASIC that it’s possible to muck through it without much experience at all with the “Visual” or “Applications” aspect of it. There were some things I couldn’t figure out at first, of course, and generating queries wasn’t very enjoyable. It could be said that there are always two ways of gathering data from the database: The right way and the Microsoft way (LINQ is really nice, I admit).
Much of our initial task with VBA was to tie the forms Anthony had created to the database. He had conjured up a really impressive and unique system for integrating individual observations of a species on the species form itself, and double-clicking on the observation would open up the editor for observations. It was a brilliant design! The only problem was a matter of implementation. We needed to provide simple filtration of data and tie the forms to the backend such that changes could be saved by clicking a button or by browsing to the next record. (That latter bit wasn’t as difficult as it sounds; tooling around with the various onload/onunload events in Access was enough to do what we wanted.)
We were also required to implement a search system for our project, but neither Anthony nor I wanted to stick with the conventional principle of providing a single text field for search. While we did provide a conventional term-based search feature, he and I came up with an innovative idea to search for a particular species one weekend: categorical search.
The categorical search was a brilliant epiphany brought on by the late night haze we were probably suffering after having worked without fail on the project for a couple of weeks. I think the epiphany hit us when we realized that species classifications (phylum, genus, and so forth) are–guess what–categorical! Thus, we implemented a search system that provided what I verbosely coined as the “categorical search reduction” form. It was comprised of a series of drop-down menus that would operate from a broad category of living creatures (plants or animals) and then reduce those categories further into their phylum, genus, and so forth. Each drop down menu would then have its available choices reduced based upon what was selected in the former. Thus, at the very last menu where the actual species name was listed, only those species matching all of the criteria reduced by the selections made above would be listed.
They say that college students tend to drown their semester sorrows during spring break and spend much of their time partying hardy. I never remember having much of a break from homework during “spring break,” so I honestly can’t imagine what classes those sorts of students take! We were so desperate to finish the project we weren’t about to squander our time with social cruft, and I honestly don’t think that Anthony slept during that week. He was glued to Access. Of course, I didn’t sleep much either; we spent the entirety of our “break” polishing up the final versions of the project. Anthony would tweak a UI here or some code there, I’d make some changes to the backend, we’d compare versions, merge where necessary, and so the process repeated until Friday. We were fairly confident that the version we’d attained was close enough to finished.