What is the difference between a database and a data warehouse?

168 views

So I understand that a data warehouse is used for historic analysis and performance reporting but I don’t really know what it actually is or what the difference is to a database

In: 6

3 Answers

Anonymous 0 Comments

Source: am idiot.
Think of it like a Costco. A database is like a pallet of records. While the whole Costco is the data warehouse.

Anonymous 0 Comments

A database in the very generic sense means any system for storing and retrieving data. So, a data warehouse is technically a kind of database.

In common usage, database refers to the central store of data that an application uses to be able to run. So Reddit, for example, stores posts, comments, users, upvotes and everything else into its database and constantly updates it in real time as people use the site. When you browse the front page, the content is pulled from that database.

Data warehouse is a storage system that is specifically used for data analysis. It takes inputs from various different sources, which could range from application data to logs to telemetry to just about any other part of the business. End users will never directly interact with this system. Instead employees can run large scale queries against the warehouse to generate reports and charts for their own use. So if a Reddit executive wants to know the average daily users the site had over the last year, they will turn to the warehouse.

Why do you need two different systems at all? A database is optimized for very fast insert and update operations. This is necessary because you don’t want your site’s performance to be slow. A data warehouse on the other hand is able to handle much larger quantities of data, but queries take a lot longer to run.

Anonymous 0 Comments

There’s considerable overlap but:

Databases are often for production purposes, where they get lots of small queries and inserts. For example:

Query: Get me order history for this customer

Insert: Create order for this customer

This database will have been designed around delivering the transaction performance required for the application which may be high frequency small transactions.

You *could* ask this database a big question like “what are the total sales by year?” – however that requires a lot of processing and may affect the performance of the system perhaps even crashing it, so such a question is often forbidden since the production system requires consistent performance. However you still need to answer these questions and that’s where a warehouse comes in.

A data warehouse on the other hand, is all about *querying* the data in *bulk* for analysis purposes – and may not be the absolute newest data. Generally you’d take your production data from many databases and perhaps upload new data to a single warehouse every midnight (when the load on the production servers is low.) Most databases don’t allow you to query across databases, but warehouses do, allowing you to ask things like “what are total yearly sales and join that to the locations of our shops (usually in a different database) to get sales by year by location”. It doesn’t matter if this takes an hour and it can’t affect the performance of the production system.