🎉 Bedrock Data is now Formstack Sync! · Learn More 🎉

Jeremy Martin | July 10, 2018

How and Why to Automate Your Data Pipeline

Introduction: The Opportunity of SaaS

Small to medium-sized businesses use, on average, fifteen SaaS applications to boost productivity and open new engagement channels with customers.

When data spread across multiple systems and departments, organizations struggle to achieve a 360-degree view of their customers.

Conditioning data for analysis often prompts ETL projects which cost time and money.

Companies that automate their SaaS data pipelines therefore have a huge leg up. They save months in labor, hundreds of thousands of dollars, accelerate the time to analytics, and discover new insights once hidden from view.

But before we explore the opportunity of SaaS, let us first look at the challenges of unifying customer data, and how to overcome them.


What is a Virtual Data Pipeline?

A virtual data pipeline is a series of repeatable actions, or jobs, that automate the movement and transformation of data across systems, without any physical movement of data out of the locations from which they originate.

Data are extracted, usually from various SaaS applications; transformed, so their formats are the same, fields de-duplicated, and, once merged; loaded into a single place, like a cloud data warehouse.

And at the end of the pipeline, data are in more robust shape for analytics and insights.


Why You Need a Virtual Data Pipeline

A virtual data pipeline helps IT teams shift from being a reactive resource to putting a process in place to help businesses perform more effectively around their data process.

Beyond the pure technical benefits, these are some of the key business wins team can achieve from a virtual data pipeline: 

1. Get a 360-Degree Customer View

Everyone wants a 360-degree customer view. But only 10 percent of businesses have one, as unifying data is time-consuming and complex, even for IT. A virtual data pipeline will extract and align data from multiple source of customer data that otherwise would be sitting in separate silos.

360 Customer View

2. Align Teams

Often, teams debate data rather than let it solve their problems. When sales, marketing, support, and finance frequently lack a master data set, they rely on analyzing their own data in silos. A virtual data pipeline combines information so the entire business can rally around a single, trusted source of information.

Align Teams Icon

3. Improve Customer Experience

When teams are aligned with how to interact with customers, customers enjoy an amazing experience. But if your sales rep is unaware of a customer’s support ticket; if your marketing team can’t tell in which products customers have expressed interest; if chat system asks the customer who they are every time they visit your website, the customer you invested in suffers. By implementing a virtual data pipeline, you can fuse many customer experiences into one view that every department can appreciate. 

4. Turn Investments Into Outcomes

A business tracks investments and business outcomes. If this tracking happens in discrete systems, the factors driving growth remain vague. By integrating data, marketing can know how their investment in leads converted to better sales outcomes, for instance. You can explain which activities and milestones led to success, and how.


5. Ensure Compliance

Centralized customer data helps your business comply with the law. If your data are spread across many systems, you run the risk of violating The General Data Protection Regulation (GDPR), for example. Connecting applications to a centralized data warehouse helps solidify which systems house customer data and automate how you meet protocol requirements.


Automating the data pipeline is a superior solution in every sense. First, the aforementioned challenges no longer exist, freeing stakeholders to focus on analyzing data rather than debating its accuracy. Second, the entire organization – marketing, sales, support, customer success, analysts, executives – benefits from having unified customer data. Below are just a few examples of the kinds of questions departments can begin to answer.


Opportunities from Automating the Data Pipeline

1. Marketing

Most marketers want to measure campaign performance and gauge how well their dollars convert leads into valuable customers. They want to know, in real-time:

  • How much have I invested in targeting a particular account?

  • Which customers should I engage?

  • How many interactions did it take for an account to become a customer?

  • What is the optimal sequence of interactions to acquire a new customer?

  • Who are my most profitable leads?

  • How should I spend my marketing budget?

  • Where do my best customers come from?

  • Can I test my future marketing plans to see the likely outcomes?

2. Sales

Similarly, sales teams are looking to reach out to the strongest prospects. With an automated data pipeline, they can:

  • Build dashboards of which sales reps have interacted with customers.

  • Measure sales performance.

  • See if sales reps have a predilection for certain customers and optimize for their strengths.

  • See how customer and prospect behavior correlates to their purchase habits by linking Activity and Contact tables. (For instance, did attending a webinar lead a prospect to purchase a product or service?)

3. Support & Customer Success

For support and customer success teams, ensuring that customers are using the product well and satisfied with their service is paramount for reducing churn. With unified customer data, they can roll up account-based data and answer questions like:

  • What are the average number of tickets per account per month?

  • What are the Company and Contact properties for each ticket profile?

  • What is your combined Satisfaction Rating?

  • What is the average number of tickets or conversations for a new contact?

4. Analysts

Free from having to build data pipelines, set up servers, write code, analysts will find data much easier to maintain. If one department decides to swap out their marketing automation system, the change will not break dashboards or subvert months of toil. With an automated data pipeline, deleting SaaS connectors simply means rebuilding the cloud data warehouse. There is no need to read API docs, start a new integration, or rigorously test such deployments in QA. Whereas many data pipelines require system downtime, with an automated pipeline the database is always available.

On the reporting side, SQL queries run faster and are transferable between multiple SaaS applications. Analysts write dashboards once, share with teams, and can refresh whenever they wish.

5. Executives

Since time immemorial, CEOs, CTOs, and SVPs have wanted the most recent core KPI data lightning fast. With an automated data pipeline, executives are no longer stymied by IT or waiting for their analyst to get acquainted with a new API. They can just save reports and update dashboards on a daily basis. Also, if teams want to change systems, executives can finally support this decision, without worrying about their dashboards breaking. In an automated data pipeline, SaaS systems are fungible.


The Challenges of Unifying Customer Data

Bringing customer data into a single, holistic source has long thwarted organizations. While extracting, preparing, warehousing, and feeding data into analytics tools is very complex, here are six major obstacles IT must overcome in order to unify customer data.

1. Duplicate Data

Eliminating duplicates takes time and resources. Duplicate data brings the risk of the consolidated dataset being inaccurate if like accounts or contacts aren’t consolidated into unique records.

2. Conflicting Data

One of the greatest benefits of SaaS solutions is that many users and business processes are contributing to a database powering the application. A reality of this, though, is that different applications will end up with different data on the same customers. If a customer is recorded in two different systems as having two different accounts, knowing which is correct blocks the path of your analysis with undue obstacles. Particularly since resolving such conflicts “by hand” is very impractical. Just a single update can spangle multiple rows, tables, even databases, with conflicts.

3. Inconsistent Formats

Whereas conflicts put the accuracy of your data in jeopardy, inconsistent formats pit values against one another. That is, even if data are correct, one SaaS system may format your dates as DD-MM-YYYY, another as YYYY-MM-DD. The information is correct, but a pigsty to query. From phone numbers and states, to booleans and capitalization, when imposing consistent standards on your data, there are an endless number of fields for which you could update formats.

4. Disparate Schemas

When SaaS solutions get built and deployed in isolation their related objects may differ substantially. Related objects cover a range of data tied to a given contact, including their account, opportunities, activities across all departments, support tickets and more. Much of this related data is lost upon the act of extracting data, compromising the completeness of consolidated datasets. In one system, a Contact object may be connected to a Company and Deal. In another cloud application, a Contact object might be related to a Lead, Activity, and Opportunity. When related objects are managed differently in discrete systems, their schemas are inconsistent, which makes unifying related objects even more convoluted than had their schemas been identical.

5. Continuously Updating Data in Source Applications

Data also continuously updates, which means the consolidated data sets you manufacture could become obsolete because some of the source data has changed. Keeping data continuously updated can be a pain. The moment connected data sources get out of sync, the data you use to feed dashboards and other business intelligence tools is less reliable for reports. Querying siloed systems for the most recent data is tedious to repeat every time data inputs change. Analyzing multiple data sets, drawing conclusions, and dispensing recommendations to the rest of your organization -- are a better use of your time than worrying about data concurrency.

6. Belated Analysis

Data scientists spend, on average, 80% of their time ingesting data, and about 20% mapping and modeling trends. Even after analysts have wrangled and formatted data, there is still no guarantee that queries will be swift or easy to run. When executives ask analysts to generate key reports, the latter must repeat this tedious exercise and be intimately familiar with the nuances of querying distinct databases. Moreover, many SaaS applications limit the number of API calls. This means that refreshing dashboards is often slow and the volume of data they can be downloaded, limited. The gestalt of these challenges is that analysts spend more time formatting data than performing analysis.


How a Fused Database Automates Your Data Pipeline with an Always-on, Unified Warehouse

A Fused Database is a technical breakthrough to accelerate analytics and business intelligence. It acts as an always-on, unified data warehouse. Here’s how.

To create a Fused Database, you must first fuse the raw records in the source applications. So if you use Salesforce for leads, Marketo for marketing automation, and Zendesk for support, you would have at least three Contact records for a single customer.


1. Matching Identities

As mentioned, these raw records may contain conflicts, duplicates, and disparate formats. The first step in merging and standardizing these raw records is matching identities, or recognizing that two or more records exist in different software systems or databases, but relate to the same real-world entity (e.g. a person, a company, etc.).

2. Type Translation

The second step is type translation. If two or more values (e.g. DD-MM-YYYY, another as YYYY-MM-DD) refer to the same entity (e.g. date), type translation expresses these values into a standard format. So if an email is recorded as johndoe@acmecorp.com in Salesforce and JohnDoe@acmecorp.com in HubSpot, standardizing for consistency might translate all upper-case types into lower-case format. The major achievement of this step is standardizing all tabular data so that the resulting relational database is devoid of duplicates and format issues. 

To de-duplicate records, Fusion also matches objects in the applications you connect to. Such objects include:

  • Contact
  • Company
  • Opportunity
  • List
  • Event
  • Owner/Employee
  • Support Ticket
  • Activity
  • Product
  • Order
Fusion uses these fields to match and de-dupe records. It will soon allow you to customize de-dupe keys.


3. Data Transformation

Lastly, data transformation, interprets the unique data models of two or more SaaS applications (e.g. HubSpot and Salesforce) to collapse them into a more efficient, standardized database schema. By mapping objects, you can tell where and how an object is connected to another object. If a lead in System A converts, the mapping determines where and how the lead will sync in System B. As each object has a relationship to other objects, Fusion preserves relationships between applications in a standardized, unified schema.

This trifecta – identity matching, type translation, and data transformation – has spawned a technical breakthrough: automating the data pipeline so analysts have a single source of truth for their dashboards, BI tools and reports. Analysts can query and visualize one dataset while accessing data from all of their SaaS applications. By accessing a Fused Database, you can generate complex reports using any analytics tool. Which task is vastly easier than if the data had still remained siloed within their original SaaS applications. 

Use Cases

1. Trusted Resource for Customer Contact/Account Data Across Systems

By combining data from all your systems into a single dataset, it’s now possible to analyze all Contact/Account data in one place.

At Bedrock Data, for example, we use HelpScout for support, Salesforce for CRM, and HubSpot for marketing automation. With fused data, we can analyze contacts/accounts data from all three applications, but just one database. The beauty here is we’re able in our BI tool to fasten our support cases to accounts. This is something neither HelpScout nor our CRM can do on their own. If we want to know the average number of tickets per month and tickets per account, we can. And that allows us focus on more important issues, like which accounts might be more prone to churn based on their relative incident rates.

A single resource for Contact/Account data lets us roll up Activities, too. So when a user submits a form, clicks an element, visits the pricing page, or clicks a new product announcement email, we can see all these Activities by Contact/Account. Our analysis tells a coherent story, not a haphazard one.

2. Consolidated Dashboard of KPIs Across Systems 

Even as the most basic type of metrics, high-level customer KPIs are still notoriously hard for executives to get their hands on, in real-time.

By unifying data across systems, Fusion frees up C-levels and SVPs to get the answers they need to make the right decisions when they want. With KPIs related to revenue, growth, churn, customer acquisition cost (CAC), customer lifetime value (LTV), daily active users (DAU) and monthly active users (MAU) at their fingertips, executives can save $15,000 per month in wasted analyst time and over $100,000 per month on projects, software, and hardware.

A consolidated dashboard of KPIs across systems confers the added advantage of understanding and measuring the customer experience in near real-time.

3. Performance, Attribution, Closed Loop Reporting

Despite its immense value, the 360-degree view of the customer, so vital for onboarding and retaining customers, has long eluded marketers and sales teams alike. With leads data all in one Fused Database, measuring how effective campaigns have been at driving them is now possible. You can also attribute campaign performance by channel or call to action, revealing how prospects convert into marketing qualified leads, sales qualified leads, opportunities, and closed/won deals at the end of the pipeline.

4. Faster Ad Hoc Reporting

Speed is everything in a fast-growth SaaS model. It allows you to be proactive, not reactive, about metrics related to marketing spend, sales priorities, and staff investment. With fused data, you can report fast and on- demand, something that wasn’t possible with siloed databases.


Next Steps

Bedrock Data invites you to try Fusion to automate your data pipeline. Here is a quick overview of how it works.


Subscribe Here!