In this modern era with the continuous and very fast growth of technology, each one of us are now surrounded by devices (e.g. cellphone,laptop,computer etc) which is affecting the world of data. I am sure that we all might have heard this terminology Data Warehouse which we are going to see in detail today.
- In technical terminology we may define a Data Warehouse (DWH) basically as system which is a central repository of very well organized, integrated, historical (and current) data which is isolated from several disparate sources at a place with the purpose to support decision making by creating analytical reports(reporting and data analysis) for management (knowledge workers) for the benefit of an enterprise.
- Data warehousing concept was intended to provide an architectural model for the flow of data from operational systems(OLTP) to decision support environments(DSS).
- A data warehouse maintains a copy of information from the source transaction systems.
The concept of warehouse has been originated at early 1960s. Below are important years in history which became milestones for data warehouse concept.
- 1960s — General Mills and Dartmouth College, in a joint research project, develop the terms Dimensions and Facts.
- 1970s —Bill Inmon begins to define and discuss the term: Data Warehouse.
- 1975 — Sperry Univac introduces MAPPER (Maintain, Prepare and Produce Executive Reports) is a database management and reporting system that includes the world’s first 4GL. First platform designed for building Information Centers (a forerunner of contemporary Enterprise Data Warehousing platforms)
Why do we need a Data Warehouse?
Below are couple of thoughts coming in my mind to answer this question.
- Provides a central view of data of interest regardless of data sources across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
- Mitigate the problem of database isolation level lock contention in transaction processing systems(OLTP) caused by attempts to run large, long running, analysis queries in transaction processing databases.
- Integrate data from multiple sources into a single database so a single query engine can be used to present data.
- Maintains data history, even if the source transaction systems do not.
- Restructure data to provide more generic view and analyze so that it delivers excellent query performance, even for complex analytic queries, without impacting the operational systems and it makes more sense to the business users.
- Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
- Present the organization’s information consistently.
- Make decision–support queries easier to write and hence analyze.
Humm, after going through this article I am sure that there would be a question in your mind that if this concept is going to help you in taking decision in your business then how can we implement the same in our organization. Below are few requirements that you need to fulfill before implementing Data Warehouse.
- Data source systems(OLTP) that provide data to the warehouse.
- Data integration technology and processes that are needed to prepare the data for use.
- Different architectures for storing data in an organization’s data warehouse.
- Different tools and applications for the variety of users.
- Metadata, data quality, and governance processes must be in place to ensure that the warehouse meets its purposes.
In modern days only those organizations are successful which can respond quickly and flexibly to market changes and opportunities. A key to this response is the effective and efficient use of data and information by analysts and managers.
Components Of Data Warehouse:
There are basically 3 main core components of Data Warehouse which are described in detail below.
- Staging Layer: The staging layer or staging database stores raw data extracted (E of ETL process) from each of the disparate source data systems.
- Data Integration Layer: This layer mainly Integrates the disparate data sets by transforming the data from the staging layer often storing this transformed data in an operational data store (ODS) database. We perform Transformation(T) of ETL process in this section.
- Access Layer: The integrated data are then moved/loaded to another database, often called the data warehouse database, where the data is arranged into hierarchical groups often called dimensions and into facts and aggregate facts. We perform L(loading) of ETL process in this section.The access layer helps users retrieve data.The combination of facts and dimensions is sometimes called a star schema.
This definition of the data warehouse focuses on data storage. The main source of the data is cleaned, transformed, cataloged and made available for use by business professionals to provide insite to enterprise management using data mining, online analytical processing, market research and decision support. However, the means to retrieve and analyze data, to extract, transform and load data, and to manage the data dictionary are also considered essential components of a data warehousing system.
Feedback and suggestions are always welcome.