26 May 2011

A small history of Data Warehousing

Small history of Data Warehousing Architecture:











Some of the notable events/papers/books/definitions of the different stages of evolution of the Kimball architectural approach are:

• 1992 – Kimball Stage 1 – simple dimensional model phase
• Formation of Ralph Kimball Associates
• “a data warehouse is a union of all its data marts”
• THE DATA WAREHOUSE TOOLKIT, 1998


• 2002 – Kimball Stage 2 – conformed dimension/master conformed dimension phase
• DATA WAREHOUSE TOOLKIT: THE COMPLETE GUIDE TO DIMENSIONAL MODELLING, 2002
• Kimball Group/Kimball University: Kimball Design tip #48, De-Cluster with Junk (Dimension), Aug 7, 2003


2007 – Kimball Stage 3 – MDM phase
• Intelligent Enterprise: Kimball University, Pick The Right Approach To MDM – Feb 2007
• The Need For Master Data
• The Conformed Data Warehouse
• The MDM Integration Hub
• The Enterprise MDM System
• Four Steps to MDM

THE EVOLVING KIMBALL ARCHITECTURE

There is a certain irony here. Compare the predicted Kimball Stage 4 hub and spoke architecture with the corporate information factory architecture that was published by Inmon a decade earlier and it is seen that they in fact are the same. The emphasis for the predicted Kimball Stage 4 hub and spoke architecture is now on integrated data, not on speed of development.
The next irony is that the predicted Kimball Stage 4 hub and spoke architecture cannot be created quickly and easily. There has been a change in emphasis from Kimball Stage 1 architecture to the predicted Kimball Stage 4 architecture. In Kimball Stage 1 the emphasis was on speed of development. But in the predicted Kimball Stage 4 with the need for true enterprise development and the creation of the “golden record”, building the Kimball Stage 4 environment is no longer speedy. The emphasis on the Stage 1 Kimball architecture is on a few legacy systems. The emphasis on the Kimball Stage 4 architecture is on the enterprise. The emphasis for the predicted Stage 4 Kimball model – the need for integration across the enterprise - was the one that Inmon recognized 10 years earlier.

PREDICTED KIMBALL STAGE 4 = CORPORATE INFORMATION FACTORY

The predicted Kimball Stage 4 architecture has evolved (and is still evolving) to the Inmon Corporate Information Factory. The Kimball Stage 3 architecture and the predicted Kimball Stage 4 hub and spoke architecture is being discussed in 2010. And the Inmon Corporate Information Factory was created in the 1990’s, more than a decade earlier.
Over time, the basic Kimball dimensional architecture has undergone several major intellectual revolutions, all started by the realization that the basic dimensional architecture did not work in the face of large scale systems and that the simple dimensional model was not a true enterprise solution. That intellectual evolution is depicted by Figure below:
.

First there was the dimensional architecture. Then there was the conformed dimension. Then there was the master conformed dimension. Then there was MDM. Finally there is the predicted Kimball Stage 4 hub and spoke architecture .
Throughout the renditions of the Kimball Stage 1 – Stage 4 approach to data warehousing, the Kimball approach has been particularly popular with software vendors. In particular the Business Intelligence data mart software vendors have been drawn to the original Kimball Stage 1 simple dimensional architecture. There is a reason why data mart and Business Intelligence vendors are drawn to the Kimball Stage 1 simple dimensional architecture. That reason is the Business Intelligence and data mart vendors care most of all about making a sale. Consider the sales cycle for the data mart vendor in the face of an Inmon style corporate information factory architecture. In the Inmon architecture before the data mart can be built, a data warehouse has to be built. But building the Inmon style data warehouse is going to take a while. Therefore, building an Inmon style data warehouse gets in the way of the data mart vendor making a fast sale. On the other hand, with a Kimball dimensional model approach, the data mart is needed almost immediately. Is it any wonder then that the data mart, Business Intelligence vendors gave all their support to Kimball? It was in their own best interest to do so. Stated differently, the data mart, Business Intelligence vendors cared nothing for the long term architectural interests of their customers. All the data mart, Business Intelligence vendors cared for was their own immediate bottom line – making a quick sale, at the expense of their customers long term architecture. The Kimball dimensional Stage 1 simple dimensional architecture was a natural fit for the fast building of data marts.

FITTING THE TWO ARCHITECTURES TOGETHER

It is seen that there is a significant architectural difference between the Inmon corporate information factory “single version of the truth” architecture and the Kimball Stage 1 simple dimensional architecture. Despite the differences, there is a juxtaposition of the two architectures that makes sense. Figure below shows this arrangement.


The figure shows that in the center of the hub is the Inmon corporate information factory. In the Inmon corporate information factory is the ”single version of the truth”. The data here is granular, historical and integrated. The data here is cast in the form of the relational model.
Surrounding the “single version of the truth” are the data marts. The data marts are cast in the form of the Kimball star schema architecture. In the star schema architecture, each data mart is optimized to meet the analytical needs of the end user. The source of data for each data mart is the data warehouse.
The basic architecture seen in Figigure meets the needs for a single version of the truth and for the different analytical needs of the different departments. And the architecture seen in Figure blends the Inmon and Kimball architecture, taking the best features of each architecture.
However, the architecture seen in Figure has been extended over the years into a much more robust, much more sophisticated architecture. The architecture seen in Figure has been extended into what can be called DW 2.0.

DW 2.0

Over the decade between the creation of the corporate information factory and DW 2.0, the Inmon corporate information factory architecture has evolved as well. Today the Inmon architecture is best described by the body of work known as DW 2.0. Written in 2007, DW 2.0 is described in a book entitled DW 2.0 – ARCHITECTURE FOR THE NEXT GENERATION OF DATA WAREHOUSING. The essence of the DW 2.0 architecture is depicted in the Figure below.


The DW 2.0 architecture contains many different architectural components that have been added on to the basic corporate information factory. Some of the more salient aspects of the DW 2.0 architecture include:

- Unstructured data as an essential and granular ingredient in the data warehouse.
- An exploration warehouse
- Near line (or alternate) storage
- An archival component
- Oper marts
- An ODS
- Metadata as an essential component of the architecture
- Taxonomies
- Changed data capture
- Recognition of the life cycle of data within the data warehouse.

The DW 2.0 architecture then represents the evolving architecture for data warehouse. It contains the best features of the Inmon architecture and the Kimball architecture can be combined very adroitly. DW 2.0 represents a long term architectural blueprint to meet the needs of modern corporations and modern organizations.
Share:

2 comments:

  1. Hello,
    Palmer Leasing Inc offers one of the largest fleets of Quality Mobile Storage, Transportation and Logistics equipment for rent or lease - ready for your use, without the expense, exposure or hassle of ownership and always at competitive rates.

    ReplyDelete
  2. For me this is not a short history about date warehousing but a complete guide about this concept. I find this post an excellent source to learn and understand all about data warehousing technique. You have described the overall concept in a very simple way that is easy to understand. Thanks for sharing this informative post.
    sap testing

    ReplyDelete