Database 501

Back in my university days none of us were very enthusiastic about database class.  We were future programmers, not data administrators.  We understood that school is full of pointless challenges set up by “clueless” administrators.  But, most of us persevered and made it though such drivel as database class.  Little did we know!

 

During my first summer internship I was introduced to data driven web programming.  Two or three properly designed tables with the appropriate data could generate hundreds of web pages.  I was hooked on data.  And I’ve been hooked every since.  During the years I’ve picked up a few basic principles.  Most of these ideas can be found in database classes, but perhaps not described exactly this way.

 

A)  Make it Easy to Remember Field/Table Names.  (Looking up objects takes a lot of time and can easily derail thought processes)

1)       Use full names unless for everything except for extremely common abbreviations.  (Like ID for identification)   It will take at least a minute every time you need to look up the name of a field you’ve forgotten, but only milliseconds to type in some extra characters.  Optimize for remembering not typing.  (Oracle’s 30 character limit is a sad relic of a previous age)

2)       Use the same name for the same piece of data.  For example, if everyone has used date_added, don’t name your field added_date.

3)       The same applies to work breaks.  If the convention is to use underlines, use underlines.  CustomerNumber may be more “pure” than customer_number, but making everything consistent is more important.

 

B) Make it Easy to Identify Fields/Tables

1)       Identify keys with the table they belong to.  For example account_id is a good key for the account table.  sequence_key is a really bad field name.

2)       Preface tables with system names.  For example sales_accounts, sales_transactions etc.  However—I’ve seen this one misused.  Don’t go overboard.  Many systems are highly integrated and prefaces can be more confusing that beneficial.  If you will find yourself asking “Now was “items” in manufacturing or sales?” then use the same preface for both systems.

 

C) Be Prepared for Change

1)       Never believe anyone when they say a field will never change.  Use auto-sequence keys rather than natural keys for foreign key references.  When the natural key changes (notice I use the word when not if) the only thing that will need to change is the natural key’s index. The foreign keys can (and should) remain unchanged.

 

D) Appropriately Use Normalization

First and foremost–a transactional system should be normalized.  Normalization is not “academic”.  Nothing is more practical than a well designed database structure.  Allowing duplicate data to sprout across a system may seem “easy”, but in the end it will results in hundreds (even thousands) of lines of code merely enforcing data consistency.  There’s no reason  that code should handle what a database was already designed to handle.

However, there is an exception to the normalization rule.  Non-transactional reporting data should not be normalized.  For example, consider a typical transactional system with an inventory table, an item master table and a sales history table.  In a typical environment, when the item master is changed, we would want to see queries against the inventory table reflecting those changes.  (i.e. using a join between inventory and items)  For example, say a price change or minor description correction of a certain part/product.

On the other hand, if we are querying against the sales history table, we do not want to see the changes in the item master.  We want instead to see the sales data at the time it took place.  Even if it’s inside a typical transactional database driven system, the sales history table is actually a data-mart/data-warehousing table.  In data warehousing the rules totally change.  There are still rules—just different rules.  A data warehouse is not normalized.  Instead, you snapshot all the data needed for reporting.  The result is a wonderful (for reporting) set of completely static de-normalized tables.  Unfortunately, most software developers have not had any experience with data warehousing (mine own is extremely shallow) so we often try to apply normalization rules and fail miserably.

This is not revolutionary—data-marts and data-warehouses have de-normalized forever.  What is new is the recognition that even small systems often have data-warehouse style tables mixed into the typical set of “normal” tables.  (A common example of a data-warehousing table would be an event log)  The key to properly designing such tables is recognizing what they are and applying the appropriate design principles.  (And above all don’t try to have one table accomplish both—it won’t work)

 

E) Leverage the Power of the Database.  These should be obvious. . .

1)       Always define datatypes – varchar aren’t a substitute for any kind of data.  Generic expansion fields (attribute1, attribute2, etc) are equally evil.  (An unavoidable one for those of us working with Oracle applications.  L)

2)       Don’t put comma separated values inside database fields.  (Yes, I’ve known this to actually happen)

3)       Leverage the power of database programming—PL/SQL, T-SQL, etc.

 

If any of ya’ll have other suggestions please comment below. 

Ps. I reserve the right to delete “religious” arguments if they get out of hand!      

Advertisements

13 Responses to “Database 501”

  1. Richard Druce Says:

    Hi Jeff, Good post, full of that good wisdom that when you read sounds so intuitive that its just common sense. I was just wandering about your last point with regards to database programming. I’m no database expert so I put a lot of my code that could be moved into a database language into my business logic. Currently i use RoR, so quite nice and clean. Whats my motivation for dropping stuff out of there and into the database languages? (as an actor would say).
    cheers,
    Richard.

  2. Jonas Says:

    So what’s the suggestion for “solving” the Oracle name limit?

  3. jeffspost Says:

    Great questions
    Richard: I think there are three main benefits for using db languages. The first it forces separation between UI and BL. Sounds like you already have that mastered. The second is that DB languages are usually faster–not only because it’s close to the data, but usually the database is on the best hardware. The third is that the upgrade path is usually quite clean. The lifecycle of DB languages tends to be extremely long so even if your UI becomes obsolete, BL routines on the DB may still be usable.

    However, I won’t argue with success. If RoR is fulfilling your requirements (present and anticipated) who can say it’s the wrong choice? However, you may want to experiment with creating a few stored procedures and see how they play in your application environment.

    Jonas: It would be nice if Oracle increased the 32 character limit. (Anyone from Oracle listening out there?) In the meantime, my personal solution is to try to come up with very descriptive, yet short names. It’s not always easy, but the way I look at it is if I’m going to live with a table or field name for the next 5 years–I can afford spending 5 minutes brainstorming different options. The beauty of English is there are so many synonyms all with different spellings. 🙂

  4. Jonas Says:

    I’m all with you on that single item wish list for Oracle, but we can’t ignore reality. If I’m to expect change then a move to Oracle is never far off (been there, done that) so naming should (I really do hate myself for having to say this) be restricted to 32 characters. The real pain is indexes as one tends to make them up using table AND column names …

    One solution, the one I’m using currently, is to have an alias feature “in between” where truncated table names, or renamed to whatever, in the database is only a configuration issue (code still uses “full” names). I’d drop that in an instance though, if Oracle just entered the modern age.

  5. Andrew Wood Says:

    Hi

    Like the list a lot. Kind of putting it into words for how we do it here.

    As part of the DB design we describe tables (within 30 chars) as best we can and also create an alias to the table (this is just a recorded alias in the modelling tool). We then use this alias for all foreign key references.

    Example

    BOOKINGS – BKG
    BOOKING_ITEMS – BIT

    So in BOOKING_ITEMS we have a FK to bookings BKG_ID. If you follow this approach you then get very readable SQL

    ..
    from
    bookings bkg,
    booking_items bit
    where
    bit.bkg_id = bkg.id
    ..

    You also keep the foreign key names short which allows the 30 char limit not to be exceeded. The alias is either 3 chars or 7 (2 lots of 3 with an _ between)

    PACKAGE_PRODUCTS – PKG_PRD
    BOOKING_PACKAGES – BKG_PKG

    etc.

    You can extend it to FK naming so

    BOOKING_ITEMS.BKG_ID would have a FK name of BIT_BKG_FK

    any indexes for the FK have an I appended i.e. BIT_BKG_FK_I

    The project I am on at the momment has around 250 tables and this approach gets more sensible the larger number of tables.

    I agree with ORACLE name lengths it must be a real issue for them not to change it. A company that has produced what they have done surely would have made that change long ago if they could. From the outside you imagine it is like falling off a log.

  6. wls Says:

    why is it not good to save comma separated values inside database fields?

  7. Andrew Wood Says:

    wls

    If the data is a string that happens to have commas then it is ok. It is when the field is being used to represent a sequence of values that would otherwise occupy a seperate table that is foreign keyed to the table in which you have put the csv values in.

    Example

    Table of screen sizes and screen types (field:value)

    Type:Plasma size:32,37,42,50,60 resolution 766×355,766×512…
    Type:LCD size 28,32,37,40,42,47,50 Resolution 1024×768,1280,960…

    Should be a table for screen type

    ID Type
    1 Plasma
    2 LCD

    and a table for size

    ID SCREEN Size Resolution
    1 1 32 766×355
    2 1 37 766×512
    3 1 42 ..
    ..
    ..
    n 2 28 1024×768

  8. Bobby Says:

    wls

    Also, the problem is, you have to write a program/procedure outside the database language to extract useful information from the comma separated list items (the list items could separated in other ways). Say you were recording sales of pies, your non normalized table might be:

    Day Sales
    Mon Apple 2, Blackberry 2, Cherry 3
    Tue Blackberry 2, Apple 2, Cherry 3
    Wed Apple 1, Blackberry 2, Apple 1, Cherry 3

    The sales for each day are the same, but you can not easily tell. How would you count Apple sales each day or in total? (Partial) Normalizing might give:

    Day Product Qty
    Mon Apple 2
    Mon Blackberry 2
    Mon Cherry 3
    Tue Apple 2
    Tue Blackberry 2
    Tue Cherry 3
    Wed Apple 2
    Wed Blackberry 2
    Wed Cherry 3

    Using something like:
    SELECT Product, COUNT(Qty) AS TotalQty
    FROM DailyProductSales
    GROUP BY Product ;

    Giving:

    Product TotalQty
    Apple 6
    Blackberry 6
    Cherry 9

  9. D Conlon Says:

    Another tip: Know when *not* to use a database. Recently I wrote a system to report on global revenue across a handful of multi-level “regions”. Needless to say these regions were arbitrary and would be subject to change.

    Sounds like they need to be coerced into a hierarchical DB structure right? Well, that’s what I did, until a few months later when my colleague realised that no one was changing this info, and it was really just adding unnecessary complexity and latency.

    We moved the hierarchy into the code itself, where it’s better-documented, checked into source control, and much faster.

  10. PL-it Says:

    This should be database 101. I agree with much said, however, please don’t do things like …

    PACKAGE_PRODUCTS – PKG_PRD
    BOOKING_PACKAGES – BKG_PKG

    I’d like to point out two issues with the above. 1) This is not intuitive for newer people working on your db. You are adding in an unnecessary learning curve, which, despite a persons skills, can present a considerable road block to “getting things done”. How much more difficult is it to type out 6 more characters, in each object name? 2) Please avoid UPPER CASE object names. Not only does it tend to make reading complex queries or scanning a high number of db objects more difficult, it also can cause confusion in case sensitive db’s (e.g. Postgres). Rather, I advocate creating db objects in all lower case, while allowing mixed case of those object names within queries. The advantages you’ll have are: easy readability of objects in queries, easy portability/usage in case sensitive db’s, and scanning db objects in a visual db tool won’t be nearly as difficult as it would be if all objects were UPPER CASE.

  11. Andrew Says:

    PL-it

    You misunderstand. The alias is not used as the table name but as a standard identifier for field names used as foreign keys. In the ideal world we would use full table names for fields but ORACLE impose a 30 char limit and so the alias approach is the way to go.

    It does not increase the learning curve and makes the FK fields stand out when you look at a field list as they are of the form PKG_PRD_ID.

    In fact a list of alias and table names is available so it makes reference very easy and we also have copious model diagrams and an excellent modelling tool.

    Well each to their own I guess but when you go from 10 table DB’s to 261 at the last count you start to appreciate how those initial standards pay off.

  12. chet Says:

    Jeff,

    Just to be clear, the limit on object names in Oracle is 30 characters, not 32 as mentioned in a couple of different places (someone else made the point but didn’t specifically refute the original assertion).

    I think we are on the same page on most of your points though we would probably have minor disagreements on specifics. 😉

    As to schooling, I have found the same, most schools that I have investigated don’t have a very comprehensive database side of things until you get into the graduate level courses. Just about every job out there is data driven and for those that do pure programming, there is still some level of persistent data (usually in the form of config files or something).

    chet

  13. jeffspost Says:

    Thanks for the correction. Caught me working from memory instead of checking my facts! I’ve corrected the post. I guess my school (http://computing.southern.edu/) is unusual in offering (in fact requiring) a very solid database class as part of the CS major. Unfortunately, most students (including me) don’t really realize the gold mine of info a class like that provides. . . hopefully some of you younger ones out there reading this will benifit from the experience of us “old” (is early 30’s old?) folks!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


%d bloggers like this: