When’s the right time for startups to get a data warehouse?

We cover the quantitative (the IPO model) and qualitative factors to help with your team’s data warehouse decision.

Narayan Subramoniam
Dropbase
Published in
4 min readNov 8, 2021

--

Consider the following scenarios:

  • You’re in a fast-growing startup fresh out of funding and you now have to tighten up on reporting to your angel investors.
  • Your company is thinking of an expansion to a new location and wants to consolidate its data. Or,
  • The lean startup you’re in spends way too much time manually collecting, cleaning, and visualizing data from your various analytics platforms … valuable time away from polishing your product

In which of these situations do you need a data warehouse and when can you continue using your workflow?

In this article, we’re going to help you identify quantitative and qualitative reasons that’ll let you answer these questions for your specific company. It will allow you to diagnose major pain points and justify the reasons for implementing a data warehouse.

And if you decide that you do want to go the data warehouse route, we’ve got you covered here to help pick which one.

Quantitative factors for the data warehouse decision — IPO Model

We’ll be using a variation of the Input-Process-Output model for analyzing team performance to help guide the decision.

Input factors

Any data storage technology has an obvious input: data. What separates a data warehouse from a database is that a warehouse integrates data from many different sources. This, in the long run, saves your team time, especially if you have a high update frequency. This gives us two quantifiable input factors:

  • Data source count: If your team solely uses your application’s database as an input, then you likely don’t need a warehouse. But if you use your app’s database, CRM data, and product/marketing analytics then you have a high data source count. The higher the data source count, the higher potential need for a data warehouse.
  • Total time spent updating: This factor is a combination of the time it takes to update one data source times how often you update it within some time period (usually a year). e.g., let’s say you load data from Salesforce monthly and your suppliers once a week to your database. They each take 10 mins and 20 mins to give a total of 19.3 hours/year. You can then multiply this with your wage rate to understand the cost of manual data updates.

Having overall high input factors increases the need for a data warehouse. Of course, “high” depends on your company’s budget/bandwidth.

Process factors

Once your data is in and you’ve spent time loading it in, process factors help break down important dimensions to consider in your decision.

  • Query complexity: If your business is relatively simple and the answers you want from your data are not complicated, then you have low query complexity. For example, if you want to know the number of units shipped by supplier.

However, if you want to answer questions that require wrangling data from many sources, transforming them, and then creating new metrics from them, then you have high query complexity. A good heuristic is the processing time behind your queries. If it takes you hours, or even days to find business answers then you likely have high query complexity.

  • Data type: For our purposes, there are two types of data. Transactional data are your everyday, operations related data. e.g. Sales broken down by channel, supply confirmations, number of log-ins, etc. Analytical data are an aggregation of transactional data to help drive insights. This could be linking sales broken by channel to user log-ins on mobile or anything that helps connect your data.

The reason why this is important is that data warehouses excel at allowing you to create/update analytical data without slowing down your transactional data backups. Analytical queries take up processing power and time. Time away from carefully maintaining your transactional data!

So, the more you create analytical data, the more you need a data warehouse.

Output factor

This is the factor to consider in your data warehouse decision. Who is the end user of your reporting? If the end users are just your small team, you can perhaps get away with using your current stack. On the other hand, if you’re just out of Series B funding (congrats!) and have to report to investors then you most likely need a data warehouse to consistently create professional reports.

Qualitative factors for the data warehouse decision

Besides the IPO model, there may be qualitative reasons to consider a data warehouse. Your team may want to go a step further and set up a data warehouse now so that you don’t have to worry about it down the road. There may be intense pressure for reporting because of a recent funding round and you want to get a warehouse to alleviate that stress. Or, your engineering team just hates the manual process of updating files and it’s now affecting performance. All these reasons are hard to measure but shouldn’t be ignored to arrive at the right decision for your team.

In Summary

We covered quantitative and qualitative reasons to help with the data warehouse decision. If you do conclude that you need a data warehouse, there are still more decisions to be made. A big one is choosing among Snowflake, Redshift, or BigQuery. You could also decide to use a turnkey data integration tool like Dropbase to help collect, clean, and centralize all your flat Excel and .csv files. The choice is yours and happy deciding!

--

--