What Is a Data Warehouse?

What Is a Data Warehouse?

A data warehouse is a system that is designed to store data for an organisation in one place. These digital storage systems can connect and integrate data from numerous sources. They exist to supply the data for business intelligence (BI) reporting, as well as supporting regulatory requirements and producing analytics. Organisations use them to create insights from their data, which helps them to make smarter, faster decisions.

Data warehouses today can use both structured and unstructured data (including images, videos, and data from sensors). Some can even work with in-memory database technology (where the dataset is kept in computer memory instead of disk space) so they can gather real-time access to important data. Data warehousing makes it possible to combine data from diverse sources, in the right format, enabling short and long-term views of data.

The sources that feed into a data warehouse are many and varied. Internal sources may include operational systems (such as Customer Relationship Management or Enterprise Resource Planning systems) and databases. External sources could be devices connected to the Internet of Things, social media, apps and other partner systems. Whereas in the past the data for these warehouses was stored on an organisation’s premises, nowadays the data is likely to be stored in a combination of private and public clouds as well as physical premises.

Data Warehousing – the Advantages

The most successful organisational analytics or BI reports are based on great data warehouses. They quickly become indispensable, helping you to make informed decisions across your business, and produce the reports to back up those choices. With data warehousing you can:

  • Interrogate data quicker: you won’t need support from IT or other departments in order to query large amounts of data – data warehouses are designed to retrieve and analyse datasets quickly.
  • Spot trends over time: learn from an array of historical data – predictions are easier and your informed decisions will enable your business to improve continually.
  • Improve business analytics: decision makers have access to data from a range of sources – incomplete information is a thing of the past.
  • Increase quality of data: you can ensure that only data which has been cleansed and processed gets loaded into the data warehouse – meaning data is in a uniform format and decisions are based on high-quality, easily readable data.

What to Store in a Data Warehouse

Early data warehouses mainly stored structured data, about transactions, products and people. As business technology developed, the concept of unstructured data became more and more valuable. This included images, emails, social media data and data from sensors. There was now a need to store it, retrieve it quickly and easily, analyse it and draw conclusions from it.

Today, a business can combine structured and unstructured data in their data warehouses, allowing them to get a complete overview of their business and its processes – and enabling them to make better judgements.

What Does a Data Warehouse Consist Of?

The four component parts of a data warehouse work together to enable you to get results quickly and analyse them easily. They are: (1) the central database, (2) ETL tools (extract, transform, load), (3) access tools, and (4) metadata. Let’s look at them in more detail:

  • Central database: In the past this would have been a standard relational database housed on the premises. With the advent of Big Data, it’s now more likely to be an in-memory database, giving real-time insights and utilising increasingly cheap RAM.
  • ETL tools: data is extracted, transformed and then loaded. In other words, data is retrieved from source systems, and manipulated to allow for easy consumption. Other approaches such as bulk-load processing and real-time data replication and data enrichment services may also be used.
  • Access tools: tools such as application development tools, query/reporting tools, OLAP tools (Online Analytical Processing) and data mining tools allow users to interact with data in their warehouses.
  • Metadata: this can be defined as ‘data about data’. There are two main types: business metadata, useful for providing context for your data; and technical metadata, which tells you where your data is kept and in what form or structure.

Data Warehousing Best Practices

To get the best time and money savings, you need to follow best practice guidelines when planning and setting up your data warehouse. Here we look at the Business Practice steps, and also IT Best Practices. You’ll develop your own best practices as you work with your data and service partners.

Business Best Practices:

  1. Develop a project plan: Create a realistic blueprint and a schedule with your team – and make sure you regularly communicate status updates.
  2. Team building: map out which executives and managers will use the information and note which key performance indicators they’ll need.
  3. Define your information requirements: what data do you need, and which sources will you use to find it? Suppliers, customers or trade bodies may be able to suggest helpful data.
  4. Record where and how your data is stored: this helps you pinpoint gaps in the data and any restrictions which could impact your transformation of data.
  5. Prioritise data warehouse applications: begin with a pilot project which has achievable requirements and tangible value to the business.
  6. Choose the right data tech partner: make sure they have the experience and implementation capabilities required for your project. They should support both your cloud service and on-premise options.

IT Best Practices:

  1. Ensure quality of data, governance and structure: as new sources of data become available, it’s important to manage them all consistently and stick to implemented procedures such as those for defining data and cleaning it.
  2. Agile architecture: a flexible platform can support your increasing data warehouse needs, whereas a limited platform will restrict you.
  3. Monitor security and performance: balance ease of access with protection of data. Ensure that the data warehouse system is performing well by monitoring its usage.
  4. Be strategic when using the cloud: recognise that different departments have different needs. Use on-premise systems for some things and cloud warehousing for reducing cost, as well as phone or tablet access.
  5. Automate processes: machine learning doesn’t just add value to business intelligence – it can also automate warehouse maintenance to keep the system fast and reduce costs.

Cloud Data Warehouses – the Top Benefits

Why should you choose warehouses based in the cloud, rather than on-premise data warehouses?

  • Security: Often, data is held more securely in the cloud than it would be when held physically on the premises. It’s also more likely to be backed up so you don’t lose anything in the event of a disaster.
  • Flexibility: cloud data warehouses allow you to be agile and scale up or down as required. These cloud-based, highly distributed arenas can combine agility with the ability to manage huge amounts of data.
  • Low cost: You only have to pay for what you need, when you need it – this is known as a data warehouse-as-a-service model (DWaaS). In this way, you avoid having to pay upfront for pricey hardware, maintenance or server rooms, and you can separate the computing pricing from the storage pricing to reduce costs.
  • Speed of deployment: you can buy massive computing power and storage capacity very quickly – have your own data marts (separate subsections within a data warehouse for a specific part of the business), sandboxes and warehouses set up in minutes.
  • Users are empowered: with a single view of data across the organisation from a variety of sources, employees are empowered to use a diverse set of tools to easily work with data. And there’s no need for IT when connecting new sources or apps.
  • Latest technology: New technology like machine learning is easily integrated, meaning business users can have a guided experience, with decision support via suggested questions.
  • Real-time data: In-memory databases mean you can get very quick real-time data for immediate situational awareness.

Conclusion

Modern data warehouse technologies allow businesses to improve both their profitability and their decision making, but they should always start small and expand as necessary.

Data warehouses offer data mining at speed and need not disrupt other business systems and their performance. Tools such as KPIs, alerts and dashboards can support all levels of staff requirements, in addition to customer and supplier needs.

Increasingly, data warehousing is becoming a vital part of all successful digital transformation projects – especially so when they combine internal data with new and insightful information from external organisations.