Sticker Shock?
I’m sure there exists a proper term for the combined feelings of surprise, dismay, and mild shock when one first encounters a database that violates nearly every principle of proper database design you’ll learned over several years of schooling. I’m sure such a term exists, because I’ve heard it before. It also isn’t a term I could appropriately share without imposing a degree of offense on the reader. Thus, I think “sticker shock” is the closest phrase that can invoke imagery reasonably close to what we collectively thought when we gazed upon the abhorrent smattering of tables, unused columns, and other violations of normalization wherever we looked. The data set was also very, very large. I don’t recall which was more intimidating: The volume of information, the inconsistencies, or the rather horrible design of each table. Maybe it was all three.
I don’t think there was ever any question whether we could actually fulfill the project goals. The true question was whether we could fulfill the project goals in a reasonable period of time. We each agreed that we could only do our best. Anything else was simply fluff. After all, the database clearly needed a lot of work. I also had an idea.
Aside from moderate intimidation, I think we were also a little frightened when we examined the sample data. I knew we’d have to redo the entire database, but I recall that the prospect of what seemed to be extra (and unnecessary) work would be a difficult sell to make. However, Anthony and I had already discussed this potential issue the week prior to receiving the sample database, and we agreed that we were probably going to recreate the entire database from scratch if it exceeded two or three tables. I believe Kohl and Hope were already quite willing to recreate the database and simply import the data we needed, but it was a subject I had been preoccupied with for a few days. We overheard several other groups discussing the possibility of simply adding a few tables to what they were given if the database were simple enough. In retrospect, I think that’s why our project was chosen as the winner. Everyone else in the class followed the path of least resistance, which was to make only minimal changes and turn in a product that could meet the project requirements with as little work as possible.
We Can Rebuild It
There wasn’t any question whether to rebuild the database or not after we examined what we were given. Frankly, selling the notion of doing everything from scratch was simply a matter of pointing out how much extra work would have to be done to the database we were given before it would even come close to to matching our desired quality.
We spent the first week structuring the tables and relationships of the new database. Much of the time was spent examining the contents of the sample, tweaking the new database, examining the old, and tweaking the new ad nauseum until we were satisfied with the near-final design. The process was tedious, but I remember that it was only made worse when we discovered the grotesque and far-reaching inconsistencies. One of the greatest nuisances to data migration occurs when the previous system was maintained by multiple people and no controls were in place to limit the extent of data entry. For example: data entries weren’t simply specified by a month, a year, and a day; we encountered ranges of a year (Apr 2001-May 2002), seasons (Spring 2002), double and quadruple digit years (88 versus 1995), and situations where the original importer had confused MS Access and the date field was imported as “######”.
Since the sample database we were given had between two and three fields in use as keys, we decided that the best course of action was to leave primary key generation to the database. The source database was designed such that technicians in the field could write a single, meaningful letter to represent anything about the species on a printed form. Unfortunately, since the data was transposed from forms to spreadsheets and later to Access, these “keys” (I use that term loosely) were dragged along with them. This design created the greatest imposition upon our team requirement that individual attributes could be altered or expanded, and we elected to resolve the issue by pulling out as much of the meaningless data as possible and replacing them with automatically incremented keys. This would facilitate adding new attributes, for example, and make editing existing ones possible. Had this been attempted with the database as it was given to us, a single change to the species’ population characteristics, for example, would have required modification of as many as 30,000 records. Unacceptable.
When we finally settled on a design that appeared to work for the greatest number of test cases and was flexible enough that attribute modifications occurred only on the attribute tables, we decided that the best course of action was to export the data from the old database, modify it to our needs, and then re-import the translated data into the new database. This became not only an exercise in patience but also a task intended exclusively for the domain of the clinically insane.
This is where my expertise stepped in.
Leave a comment