It's all in my head

Usually I try to avoid writing about the technical details of my work because it is basically talking to myself; I doubt most of you are very interested in the obscure quandaries of the trade. But I am going to indulge in those details today.

My database died. The error it was producing was no longer fixed by running a compact & repair. Microsoft Access databases have a tendency to develop permanent fatal errors, especially when they are undergoing heavy structural change. Deleting an entire table of data is a more drastic change than deleting all the records from the table; creating new queries is more drastic than running existing queries. A database is meant to continually accumulate more data, of course, but it is least stressful to do this by simply adding information into an existing table, or perhaps changing some existing data. Creating and deleting the places and rules for the data, and the ways it is manipulated, always risks loose ends, orphaned bits of data, gaps, and leftovers. As usual for Microsoft, Access is far easier for most people to use but a lot less resilient and powerful than industry-grade software.

Microsoft Word is great for writing letters, memos, and short documents. It’s bad for writing book-length manuscripts and completely incapable of producing publication-ready electronic document files. Microsoft Access is great for small-scale data management, like maybe a Mom-and-Pop floral shop, but not up to tracking the volume of data that describes the operation of a factory.

Realizing this, I tried to make my last version of my workhorse database complete and concerned with only a limited amount of the data available. There were a few things that I still relied on temporary tables to do, which meant some “high trauma” was an integral part of the database from the beginning. On top of that, I kept using the wealth of information in that database for new purposes, both one-time offbeat requests and new regular data reports. Constantly adding new pieces to the database, and usually not knowing in advance if I would need to reuse the pieces, left me with a mess of tables and queries, tied together in such overlapping ways that I did not know what was critically necessary and what was long since obsolete.

This is why you are supposed to carefully document the reasons, design, and function of your original database, and continue to document all changes. When any work you do on your database is considered superfluous or at best accessory to your job, though, time is too paltry to spend time explaining what you are trying to do.

When an Access database does fail, you can usually fix it by copying all the pieces into a new database. This can be a pain, but it is better than the whole thing becoming absolutely worthless. But with my ever-expanding chores for my database, I wanted to rebuild the database from the ground up to be more robust, able to support more demands.

With the eventual demise of this database certain from the start, I planned to separate the basic building blocks of the database into separated database files, using one database to tie all the pieces together. Also, I plan to do all my ad-hoc work in another database, so that the central coordinating piece can churn out reports automatically without tying up my work time or being disrupted by my innovations.

Because my databases are working parasitically off of the official factory software, combining the data the official system has in disparate, unrelated pieces into coherent meaningful information, the first challenge for me is to actually get the data. I have one database siphoning off the demand inforamtion, one siphoning the inventory information, and anothering siphoning the transactional information, and another siphoning the shipment information. Order and shipment information has to be augmented with data not available from our factory database, which is sent nightly to our site from the hub site. I also have developed some information that never reaches our official database at all, some from the shipping software and some just collected as direct input, regarding claims and productivity. I will be able to tie this into my new database, but I haven’t gotten that far yet.

Coordinating this information is the real trick. A shipment fulfills an order by way of a transaction, depleting inventory. Since my database does not own any of this information but clones it from the official system (a live data connection is too slow and times out), I have to make decisions on when I will refresh my information. The larger data pool I hold the longer it takes to add new data too it (if I use indexing, or data rules that insure the same information is not captured twice, which necessarily involves checking the existing data when adding). Also, larger amounts of data are more likely to develop corruption. But part of the necessity of my database is that the system database does not keep adequately detailed or related information, so if I don’t archive the data for an adequately long time I have not gained anything.

The approach I am taking this time is to keep the most detailed data, which requires the most frequent cloning or sampling from the official system, for a relatively short time, and roll some of that back into a long-term archive that will still be short in terms of serious data retention, but long enough for useful comparisons. I have mulled this over in my head for months and months, trying to find an ideal set of numbers (two different time periods or three, or four? Short data daily or weekly or monthly? Long data for years or quarters or months?), and I can’t remember for certain what I finally wrote down and comitted to, for better or worse. I believe I decided on ten days for short term (a week, plus buffer time for “dud” data days like holidays) and 100 days for long term (about one quarter and one week). One hundred days is not a long term, but at that point I realized I have to just admit that I am not building a real industrial data system, and refer all questions back to the official database, however dissappointing it may be.

Just as I do not have time to document my system, I likewise cannot afford to build it as a coded, error-trapped, well-tested system. I am using quick and dirty macros to run queries that all presume the data is there; that the connection to the official system is working, that the supplementary data has been sent up on scheduled, that nothing has corrupted one of my databases and that the macros to update their information have also run. This is about like planning that it won’t rain a single day for a given two-week period; in our climate, that is possible, and very well might not happen during a dry spell, but you also know that at some point within a year the assumption will not hold.

Thus, even though I need my routines to run automatically, they have to run within my realm of awareness so that I can fix the problems and get them going again when something breaks down. But when an automatic routine is running, I am unable to use the database that is working or to open up another database; and the morning routine opens all kinds of windows that get in the way of what I am trying to do. I solved this by taking advantage of a computer that is used for little besides printing off some extra shipping labels as needed. The computer used to sit in the break room but P.B. had it moved into the office, which, although he had his own reasons for doing it, provided me with the perfect vehicle for doing my slave work.

The dangerous thing about this system is that is uses a generic log in. Almost anyone in the factory could get on the computer while it was doing whatever, and get in my database. So far I have never secured my databases, because it is incredibly easy to lock yourself out of a database (as I have done several times); but one thing I would like to do with the new database system is allow other people in the factory to access some reports themselves, providing their own parameters, so that my time is not taken up with requests for ordinary data. I would use another, locked-down database to accomplish this, but it would still behoove me to have ever database in my system secured to prevent accidental or incompent corruption of a fundamental part of my data network. Of course this is completely undermined if I have a generic account running my most sensitive administrative tasks.

The most essential pieces of my new system are now operational. Without any bother from me, the information is being stowed away for later use, available to be analyzed and compared and referenced at any time later. Now I need to begin opening that up into actual useful reports, and that is where the system that I built is exposed to abuse, damage, and ruin. By separating key data into distinct databases, less of the system should go bad at any one time, and it should take less time to rebuild. But the keystone database that coordinates all the information, and uses layers upon layers of queries to accomplish it, is already turning into a maze.

I wanted to have several coordinating databases, so that each long-term stable report could run on its own, but that is not possible without sophisticated coding, because besides relying on the same base tables of information, many of my reports rely on the same sub-queries that have to be tweaked and updated. It is hard enough to maintain version consistency between the queries used in the basic databases and their clones in the coordinating database (because, with ordinary Access means, you cannot share queries between databases the same way you can share tables).

All of this could be greatly improved on, but it would take more time, to be coded, or more money, for a better system, or–heaven forbid–an central, official system that is properly designed for data storage and open for retrieval so that none of this fooling around with Access hacks is necessary. But I shouldn’t complain, as it is my ticket to developing my skills in database concepts.

In this overview of the larger structure of my database, I haven’t gotten into the considerations of sorting away the data, how it is divided up and kept most efficiently, and various compromises on those principles for sake of usefulness. But this is what I like to do.

This entry was posted on Saturday, December 22nd, 2007 at 5:07 pm in Uncategorized. You can leave a response, or