Knowledge Management Magazine

Eretailing 2000 - europe's premier online retailing conference featuring shop.org

Tools of the Trade: Legacy databases: don. t let your inheritance drag you down


A Campbell McCracken is a writer and independent consultant based in Fife


Kay Hammer


Peter McCann


Peter Weston

It has been estimated that, on average, more than two thirds of a company's IT budget is spent on the maintenance of legacy databases. This money is used for a variety of things, including the computers that the databases reside on, and the patching of software to cater for circumstances that were unforeseen when the software was originally written. One topical example of the latter, of course, is the millennium bug which was caused by software systems using a two-digit representation for the year being used for much longer than they were originally intended.

It has also been estimated by the Gartner Group that more than 35 per cent of all programming effort is taken up in developing and maintaining programs that link databases and transfer information between them. So, if legacy databases are such a huge drain on financial resources, what benefits do they give back in return and what are the problems that you. ll encounter on the way to reaping these benefits?

Legacy databases for operational use

Although relational databases have been around for a number of years, the majority of companies, especially large companies, still run their operations using flat databases known as legacy databases. The term legacy is apt for two reasons. Many of the databases are passed on, like an inheritance, from one management regime to the next. Others are inherited when one company takes over another, or when two companies merge.

One problem with legacy databases is that they are usually geared up for operational business and transactional processing, handling short transactions hundreds and thousands of times a day. This means that although they work well in the operational side of businesses such as manufacturing systems, order processing and billing systems, and so on, they are not suited to the decision-processing needs required for business intelligence decisions such as data warehousing.

Although many companies would like to replace their legacy databases and systems, simply moving the data to a more modern structure does not add value in itself. Many have taken the

opportunity of the need to do something about the millennium bug to upgrade to modern enterprise resource planning (ERP) systems. However there are alternatives, such as consultants SEEC and Progeni who offer Year 2000 conversion solutions, especially for COBOL applications.

Legacy databases can contain valuable information either on their own, or when several databases are brought together. One reason for linking the disparate databases from separate business units of a company or from newly merged companies into a data warehouse is that hitherto unseen customer buying trends can be identified and used as a marketing tool.

For example, Somerfield (one of the UK. s largest supermarket chains) has built an information warehouse based on Oracle's Database Server which lets them know what a customer bought, what time they bought it, how they paid for it, and what other items they bought with it. It helps them to understand their customers, their habits, what they don. t buy and so on (see the PMPL and Norwich Direct case study on page 31).

Hidden Problems

There are a number of problems that come to light when you try to merge or migrate several legacy databases to a new database. The first problem is that legacy databases tend to be fragmented within a company's structure and rarely talk to each other. Because of this they are rarely co-ordinated. You would generally find that the same customer is represented differently in each database. For example, a well-known company could be represented in three different databases as . Digital Equipment Corporation. , . and . DEC. .

This means that, before the data from the various databases can be brought together, it has to be cleaned to improve the quality: it is extremely hard to look for interesting elements within your data unless you clean up the inconsistencies. There are tools available that will help tie these records together. For example, in addition to uncovering information buried in free-form fields, the INTEGRITY Data Re-engineering Environment from Vality identifies relationships between data values, and discovers metadata. For the cases where the match is not exact, INTEGRITY includes data analysis, cleansing and unique probabilistic and fuzzy matching capabilities and would indicate that, say, there was a 95 per cent probability that the two records were the same.

A second problem arises because some of the inconsistencies within the database could have been put there deliberately. These are known as . planned inconsistencies. . Although IT departments frequently have to update their database systems because their business needs have changed or because they need to store a different kind of information about their customers, they rarely want to rebuild the database when they make the changes. So they trick the database into thinking that nothing has happened. For example they might have declared that a department record is 200 bytes long. By switching different copy-books, in and out, they can make these 200 bytes mean different things. However these tricks are rarely documented adequately and they generally have to be discovered when, for example, the legacy database is loaded into a data warehouse and bogus data is discovered.

Other problems arise from the use of keys to represent information in databases. If two or more databases are being merged into a data warehouse then it is likely that different keys have been used to represent, say, vendors. So, for example, there could be several different vendors each with key 100. The default approach is that new keys are created for everything. If the data warehouse is updated on a batch basis from the legacy databases then the key correspondence will have to be kept around to allow the conversion to take place each time. "The whole issue of getting the data in the databases sorted and consistent is extremely computer-intensive if you are dealing with anything but a really simple environment," says ETI's Kay Hammer.

What are the pitfalls?

Most consultants and data migration specialist organisations predictably warn of the dangers of attempting to perform a migration project either manually or unaided. But perhaps they have good reason to. Both the Gartner Group and the Standish Group put the rate of failure in large data migration projects at around 30 per cent. Some of the pitfalls already mentioned could account for some of the failures. Other reasons include companies rushing into the projects with poor planning. Steve Moran of IT Consultancy CITL says: "Far too many IT managers are simply unaware that migrating data is not just another development project. It demands specific skills, processes and methods." Simon Jennings, Managing Director of ETI in the UK says: "Data migration projects tend to be very complex and, even if a company is aware of automated migration techniques, managers often underestimate costs and risks." However, in some cases where the approach was wrong the project can be made successful by applying the right tools.

Danny Lonighan, Senior Sales Manager at ETI, also suggests that if a customer is planning to handle the project by himself, then the training of the implementation team should not be overlooked. "To avoid failure of a data warehouse, you should get the project started off in the right environment," says Lonighan. "This typically means 20 to 30 days training plus the services of a consultant for six months to a year." This is a small investment when you consider that large migration projects can have a budget of several hundreds of thousand of pounds, if not millions.

Conclusion

Although legacy databases may appear to be just a massive drain on financial resources, they should not be regarded as something to get rid of. They can . either on their own or when merged with other databases . be made to yield valuable information. If you are contemplating a migration or merging project, don. t let the apparent simplicity of the underlying database structure fool you. A large percentage of such projects fail because they are not treated with the seriousness they deserve. Make sure that either you employ skilled consultants or that your implementation team gets the necessary training.

Metadata

The integrating with, or migration of, a database will be a difficult process unless you have some sense of the context of the information, known as the metadata. A database's metadata describes, for example, how fields A and B are related to field C. It has been described as being like the addressing and postmark information on an envelope, with the envelope contents being the data which is generated by an application or tool. Usually a programming language such as SQL, C++, COBOL or VB Scripts is used to describe the metadata. However, if different databases and tools use different languages then it means that there is no standard way of representing this form of functional logic.

This lack of a consistent approach makes database tool inter-operability hard. "Unless you have some sense of the context, just grabbing a piece of code and understanding what it is all about is not that easy," says ETI's CEO Kay Hammer. "If you used an ETI tool to build a data warehouse then it would give you all the information about the source-to-target mapping, and all the business rules about what's been done to the data. You don. t want to have to re-enter that data into your end-user query tools. Youreally like them to suck the metadata into their system."

With this aim in mind, a group of approximately 50 vendors and users got together in 1995 and formed the Meta Data Coalition (MDC) to push for a metadata inter-operability standard. ETI. s Kay Hammer is also the co-chair of the MDC. In July of this year, the MDC accepted the Open Information Standard, a technology-independent and vendor neutral metadata standard based on technology developed by Microsoft with over 20 leading industrial partners.

In addition, ETI have made the grammars of their software tools available to the Meta Data Coalition, and gave them the ownership of extending the grammars. The grammars consist of a menu system that says what the legal choices are in defining the metadata and, because they lead a programmer through a set of legal choices, s/he can only choose the legal options. It also means that s/he doesn. t need to know any programming language in particular.

An added benefit of this approach is that the tools will create a metadata audit trail that is . English-like. , so that an average person can read it and see what has been done to the data values. "If you don. t learn from history, you. re destined to repeat it," says Hammer, "and metadata is your history." She advocates that when you are choosing an infrastructure product to solve one of your problems you should look for the vendors. strategies for metadata exchange, even if they are not the MDC standard. "You. ve got to have a strategy because you. ll need an audit trail of what you. ve done."

Contact the author at freelance@madscotsman.freeserve.co.uk

Case Studies

PMPL at Norwich Direct
ETI and The Complexities of Database Mapping
Cognos and Using Business Intelligence to Improve Customer Service & Profitability
Oracle at Conoco

Top Ten Tips

How To Run A Successful Integration Project

Contacts

CITL  http://www.citl.co.uk/
Cognos  http://www.cognos.com/
ETI  http://www.eti.com/
Meta Data Coalition http://www.mdcinfo.com/
Oracle  http://www.oracle.com/
PMPL  http://www.pmpl.co.uk/
Progeni  http://www.progeni.com/
SEEC  http://www.seec.com/

 © 1999 Learned Information Europe Ltd

Previous Page  Top of page