Data Warehouse Tuning: The Differences of Data Warehouses
Hits: 1681
Description: Data warehouses differ from transactional databases in three main ways.
+ They are bigger terabytes instead of megabytes or gigabytes.
+ They change less often, often daily or at most hourly. If online changes are allowed, they are normally appends.
+ Queries use aggregation or complex WHERE clauses.
The implications of these three points are surprising.
+ Scanning all the data is too slow. Redundant information is needed in the form of special indexes (such as bitmaps or R-trees) or in the form of structures that hold aggregate information. In Chapter 4, we discussed holding aggregate information about total store sales in one table and total sales per vendor in another table. Data warehouses raise such tricks to high art or try to.
+ There is time to build data structures because the data changes slowly or large parts of the data (e.g., all old data) change slowly.
+ Queries that perform aggregates benefit from structures that hold aggregate information. Queries having complex WHERE clauses benefit from query processing engines that can exploit multiple indexes for a single table.
+ Because of the large variety of data warehouse applications (aggregate rich, complex WHERE clause, massive joins), many technologies have survived and have found niches.