Selecting the right data warehouse can be a boon for long-range, comprehensive analytics. For companies looking to have a positive impact on everything from daily operations to customer interactions, a data warehouse is a wise investment.
Conversely, choosing the wrong data warehouse can be just as detrimental for business. According to recent estimates, bad data costs the U.S. economy $3.1 trillion dollars each year and affects the bottom line of 88% of all American companies.
It follows that when you are ready to implement a data warehouse at your organization, you need to be picky and take into consideration several important factors. Otherwise, you could join the ranks of businesses beset by flawed data.
While cloud computing has utterly transformed IT architecture in recent years, most organizations are still not properly ingesting, storing, and analyzing data. The majority still rely on fragmented, inconsistent and outdated data in multiple databases.
A data warehouse helps companies to achieve a single version of the truth by consolidating the most accurate and current data from the most reliable systems. While companies should seriously consider a data warehouse for their business, they should also know how to choose the right one for their needs. Just as a professional bodybuilder and marathon runner both want peak performance differently, yet use different training regimens, so too should organizations view the data warehouse landscape.
Take an enterprise telco for example. Such an organization will want to the detailed steps of every customer journey, collecting browsing history, in-app actions, device-activity, GPS location, and more. Because they are dealing with big data, or extremely large data sets that may be analyzed to reveal patterns, trends, behaviors, and interactions, a data lake architecture may better serve their needs, as the ingest rate and CPU will strain an on-premise data warehouse at a massive scale. That is, companies embracing big data need to prepare for the increase in data with an environment that can handle future workloads.
For an SMB e-commerce platform, however, a cloud data warehouse might do the trick. Perhaps all their customer data is stored on cloud applications like Salesforce, HubSpot, and NetSuite. What they need is to unify this data into a data warehouse that feeds their reports, dashboards, and analytics.
For these reasons, an SMB will want to steer clear of on-premise systems, since these are cumbersome and expensive to maintain for a smaller organization. Just the upfront costs of building a data warehouse can run into tens of thousands of dollars. And such costs don’t even take into account the extensive administration and management costs of keeping these systems functioning.
Alternatively, a cloud data warehouse offers a good array of vendors to choose from. Bedrock Data Fusion’s cloud data warehouse, for example, is a MySQL database hosted on AWS RDS. It allows SMBs to connect cloud applications, then automates the ETL and data warehouse building portion on your behalf.
As you go about choosing your data warehouse, it’s also important to consider your stakeholders — be they customers, the board, shareholders, or employees. By brainstorming how your warehouse will serve each stakeholder, you’ll likely find multiple purposes for your data: marketing may want to use data to monitor or modify campaigns; sales to improve customer relationships; business operations to keep company data neat and organized; product managers to analyze usage, needs, and pain points.
Then there’s size, speed, and scale to consider. Running with our first example, an enterprise telco, it’s likely such a company will process a mind-boggling amount of data — hundreds of thousands of transactions per second, hundreds of millions of rows, and hundreds of data sources. In all likelihood it will harness big data tools such as Hadoop, Spark, NoSQL, and massively parallel databases. To do so it will also need to invest in full-stack engineers with experience in AI APIs, deep learning, Node.js, Hadoop, Spark, Hive, Pig, Presto, Distributed Architecture, Raspberry Pi, and other advanced technologies. The bottom line: enterprises may not need a data warehouse. Instead they might need a data lake and an army of engineers to optimize for performance characteristics like cost, ingest speed, concurrency, and latency.
Such an architecture would be overkill for an SMB trying to consolidate a few data sources or structure raw data for analytics. Although the average SMB uses 18+ SaaS applications, most will want just a handful of SaaS applications which store their customer data for analytics purposes. In any event, implementing a cloud data warehouse makes much more sense. And it’s also much more straightforward. With cloud applications, it’s far easier to ETL (extract, transform, load) the data, rather than have to make sense of a lot of different data sources such as curated data sets from APIs, the IoT, server logs, web crawlers, and so on. Still, since none of this data comes in a structured form, you need to look for a data warehouse solution that can move, transform, distill, and combine data, all while addressing any quality issues and delivering it to a repository used for reporting and analytics.
In other words, an SMB needs a data warehouse that also automates ETL. Because without fast and accurate ETL, companies will find themselves with a lot of data they cannot use. For ETL to be valuable, you’ll want to choose a data warehouse that runs ETL as a continuous process. In this way, traditional SQL ETL and reporting tools will help your business to track outcomes, such as sales, payments (or other transactions), account sign-ups, and unsubscribes, aggregating the data into models that reflect the business processes in a way that ensures the warehouse is accurate and up-to-date. Also, if a data source changes over time, automated ETL can update the warehouse instantaneously to ensure your data is accurate and up-to-date. And you don’t need to redesign the entire data warehouse.
For SMBs, then, automating the ETL bottleneck is the key to fast data scalability. By getting data out of sources in an easy, cost-effective, simple and flexible way, you can use as many cloud services as you want. And with a cloud data warehouse, you don’t need to prop up servers, hardware, or fund IT work or operational costs. With cloud services, it is much easier to connect to other cloud services, which means that it’s easier to digest, store and file data. Typical cloud providers stay hyper up-to-date with security patches and protocols to keep their host of customers safe and happy. With a strong provider, cloud solutions can have a very high up-time, which makes them reliable no matter what.
Another point to consider is security. At the most basic level, a secure data warehouse prevents unauthorized users from accessing, modifying, stealing, and/or selling data. Due to their relative maturity, data warehouses are generally more secure than data lakes. A secure data warehouse is thus only available to certain user roles. To be secure, it must keep a record of activities performed by each user and have multiple security checkpoints, built on a scalable technology stack, that trained employees may continuously monitor and patch to stay ahead of digital threats. Above all, however, data warehouses are also only secure insofar as their security has been measured, certified, and tested on a regular basis.
Bedrock Data takes security very seriously. Traffic between customers and Bedrock Data, for example, is protected with highly secure in-transit encryption using only the most secure TLS protocols and ciphers, along with 2048-bit encryption keys that rotate regularly. API endpoints are also protected by a valid SSL certificate. Data at rest should be encrypted at multiple levels, including on the physical disk and by the logical storage subsystem. At Bedrock, we use AES-128 and AES-256 and are SOC 2, Type II certified.
Lastly, we have structured versus unstructured data. In general, a data warehouse is much more than a repository for storing data. It is a refinery for consolidating and purifying data sourced from multiple heterogeneous operational systems. This is why it’s so important to choose a data warehouse that provides structured columnar stores that allow you to build an end-to-end data pipelines to drive real-time BI dashboards. To drive value across your organization, it's important to connect your data to BI dashboards. Since the data being streamed is valuable, you want to make sure everyone in your organization has in-depth visibility to the data. On this note, remember that to create this value, you'll want to ensure that your data warehouse is compatible with the BI tool(s) your team currently uses.
Want to automate data prep for business intelligence? Accelerate your time to analytics? Try Fusion for free today.