Friday, December 14, 2007

Flat File Foes - Relational Database Management Systems

Flat Files are Dead - Long live RDMS!
This has been the battle cry of SQL, Oracle, SAP and other database admins - and almost every consultant - for many years. Since the first publication of the idea of using a Relational Database came about in the late 70s, people have touted the benefits of them: less storage space, more efficient (less data moved internally) and centralized look at the data. However, they never ever talk about the downsides of RDMS - to get ANY data in or out you MUST write an application. All the data is never in a single spot - so you cannot just copy 1 table (file) and have everything - you must copy EVERYTHING in order to backup ANYTHING (you just tripled your storage needs if lucky, you never have just 1 backup!) 
Unlike flat files where if a physical file corruption occurs the whole file (data) is gone (reasons to have backups!), in RDMS if a table is corrupted only the items in that table are gone - but then realizing that it IS corrupted is a lot harder and a person has to look at EVERY table in them to actually find out if one is bad - and then figure out how to roll back EVERY table to the same point in time to restore to.
For those who do not know how a RDBMS works here is a base example with some history.
A flat file is like the old Hollerith cards - all the data fits on 80 columns where holes can be punched. If you use only the first 20 columns the other 60 take up space and is of no use - but you still must try and read all 80 columns each time. When this got moved onto 2400 foot reels of tape the same was true - even if you did not use a field it still had to be read.
Now imagine if you had every address of every user in the USA - all 310 million people - in the above flat file you would have 310 million data fields with the name of a city in it. For Portland Oregon that means 537081 duplicate entries of Portland in the database - lots of space used up. A RDB saves space by only have 1 entry in a table with the word Portland, and in the records of each person a pointer - which takes up way less space - points to the entry in that table where the word Portland is stored. So when a record is read the pointer is read which tells the system which table where  the actual data is at and then read the value and shows it for a report, screen - whatever. Course since this is just a word - the same entry could be used for those people living in Portland Maine - but their pointer for the state would point to their state name in a table when retrieved.
A person looking at the RDBM has a hard time to reading all this - because often a table of pointers points to other tables of pointers which is within a program used to decipher the cipher in the fields to get to the real value - thus you need a program to get into and out of it on even how it is set up. There is no way to ever see a single record without a full fledged application written to even DO that. Compare this to a single Excel Spreadsheet where 1 line contains everything about that one record.
To me the flat file is a lot more functional - you can always export it to a CSV file, all the records are there logically and readable by a human, and it is self contained when the number of distinct records is below 100,000. A file cabinet of data. With a RDBM there is a whole lot more to it than what I described above even with the benefit - the knowledge - and number of people required to run it - goes up a WHOLE lot once you jump into a true RDBMS.

0 Comments:

Post a Comment

<< Home