A lot of new technologies are emerging now a days into data world as the volume and complexity of data is rapidly growing within an organization. As the volume of data is growing, the way we were handling data need to be changed effectively as well. So the concept of data warehouse came into picture to handle and use very huge amount of data for our benefit.
A data warehouse can be defined in very simple terminology as below-
A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management’s decision making process.
Below is an explanation of each term used to define warehouse:-
- Subject-Oriented: A data warehouse can be used to analyze or focus a particular subject area. For example, “sales” can be a particular subject.
- Integrated: A data warehouse integrates data from multiple data sources. For example, source A and source B may have different ways of identifying a product, but in a data warehouse, there will be only a single way of identifying a product.
- Time-Variant: Historical data is kept in a data warehouse. For example, one can retrieve data from 3 months, 6 months, 12 months, or even older data from a data warehouse. This contrasts with a transactions system, where often only the most recent data is kept. For example, a transaction system may hold the most recent address of a customer, where a data warehouse can hold all addresses associated with a customer.
- Non-volatile: Once data is in the data warehouse, it will not change. So, historical data in a data warehouse should never be altered.
Ralph Kimball who is original architects and an author data warehousing concept and business intelligence. He is best known for his long-term convictions that data warehouses must be designed to be understandable and fast. His methodology, also known as dimensional modeling or the Kimball methodology, has become and is standard in the area of decision support.
Ralph Kimball provided a more concise definition of a data warehouse:
A data warehouse is a copy of transaction data specifically structured for query and analysis.