Should I have a log table?

This should be asked as “Should I have a _______ table?”
Usually if it involves high volume of traffic or data, I say yes.

Some of the most efficient database designs com from tried and true large volume accounting packages. They mix the normalized with the de-normalized. This is done to freeze history, and to reduce the number of look up calls and joins for reporting.

“This sounds like you are questioning if I should normalize or not.”

Yes, I am. Some of this comes from the question of normalization.

Here is the pattern. Working data is normalized. Historical data is not. The reason for this is to keep history from being changed without a record of the change.

Case 1: The retail merchant.

Customer buys part number 123, which is a red widget.
Purchasing goes in and edits part number 123, because the vendor re-used the sku for a blue widget.
Customer wants to return the red widget, but can’t because his record on the computer says he bought a blue widget, and he is trying to return a red one.

This happens more times than you may imagine. Especially if it was the son/daughter/cousin/neighbor of the shop owner who wrote the software. We however are the professionals of this industry. So if you are reading this, then I know YOU have never caused this problem. Actually, yes you have, and so have I, but not to this degree. Many times the customer adopts to the situation and we either never hear about it, or it is put so low on the priority list that it is forgotten.

Case 2: High volume site keeps getting bogged down when simultaneous user count exceeds n.

Who knows or cares what n is. All your customer cares about, is that with all of the optimizations, and the webfarm, the application is still bogging down.

I am assuming that you have already put in the connection pooling, data/application/form caching. You are already using the no-locking hint, and you are properly disposing of all of your objects, so that the CG does not need to kick in. (Yes, I know. OOPS, I didn’t think of that. Use FXCop, it will help you fix some of the misses.)

What is next? Well, look at your SQL queries. I bet you are waisting a lot of time of those joins, just to run a report, or populate a grid. This is a sign that you need to de-normalize your beautifully sculpted DB design. Not completely though. Be intelligent in your queries and your reporting, and that should resolve the issue.

That takes us back to “Do I need a log table?” Yes. Yes you do. But only if there is something to track. Most business applications, except for word processing, there is.

OK, I know that is not completely right. In word processing you need to track the edits, but not in a log table, unless you have to run a report on the details of the edits.

Let me give you a little known fact about log tables. Did you know that MMORPGs like World of Warcraft and Star Wars Galaxies use log files extensively. For them it is auditing for cheaters and to find out what users respond well to compared to what they respond adversely to.

“So, what should I log?”

Logs are history. Record enough data, that if your look up tables get edited, you do not lose you history.

“But my look up tables never get deleted or edited? Should I still be concerned about the loss of history?”

You never enter data wrong into a look up, just to find the problem after you have used it? Are you concerned about reports matching the history after look up data has been changed? What does your CPA say about this?

Also there is a data management issue here. Many companies, that I know, only keep about three or five years of data in their database. If you don’t de-normalize, then you may find yourself in a case where you cannot do a clean wipe of old data.

Look at your look up data. How old are some of the discontinued items? You can speed up your queries be reducing the amount of data that you are filtering through when running a select. Far from common thought practice, where statements in a select to take processing time to manage. What is faster, to query 10,000 items to get 20, or 100 for the same 20? Keep your data small and fast, and you will go far.

This is also why some programs use a product’s sku as their DB row ID. That way you don’t have to store both, one is universal. The downfall is now you need a way to make sure that the sku is truly unique, or that your description between the red widget and blue widget does not cause an inventory or customer service headache.

You do not want to see as an inventory description “PN 123, Blue widget, may be a red widget if sold before such and such date.”

“So, should I even have normalized data?”

Yes! That was a stupid question!

Actually it wasn’t. Think of it this way. Non-committed working data, that is subjectable to change, should be normalized, so that it is always up to date with changes to its parts. Historical data, should be frozen. So it should be de-normalized enough to prevent incidental changes.

Yes, I know I keep bouncing between log and other issues, but I hope you can see that they are all interrelated.

Side note: Some clients will volunteer to pay extra, if you can run meaningful reports off of those log tables.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.