From the get-go, we knew this conference would cover a broad spectrum of topics. On the docket was everything from sessions about data architecture design to making the shift from relational DBA to NoSQL. Gathered were the acolytes of IoT, bigwigs of Big Data, and the humans behind AI.
We had only two days and would thus have to pick and choose whether to learn about Spark or Blockchain technology. It was still impossible to not be enthralled at how DBTA had managed to pack so much great content within such a short span of time — just two days — what with intro workshops on how Hadoop transforms large-scale computing, seminars about how real-time fraud engines ingest complex datasets, and why data quality is the lifeblood of machine learning.
With that disclaimer out of the way, here’s what we learned on Day 1.
Streaming & Change Data Capture
To support a modern data architecture and approach to analytics, data integration strategies now support on-prem, cloud and hybrid deployments. Meanwhile, streaming architectures featuring change data capture (CDC) can identify changes and sync an incremental audit trail of those changes using techniques like date_modified, diff, triggers, and log-based CDC. The audit trail may subsequently be used for other uses, like to update a data warehouse or run analyses across the changes to identify patterns of changes.
Streaming and CDC are very important for person-to-person payment (P2P) systems. Capital One and other big banks in the U.S., for instance, must conduct fraud analysis for each payment request. To detect fraud, these systems have grown increasingly reliant on distributed data streaming application frameworks such as Apache Flink, in conjunction with machine-learning technologies, in order to detect, and in many cases even predict, fraud and identity theft.
Also prominent at the 2018 Data Summit was talk about the widespread adoption of data lakes. A Hadoop data lake, for example, is a data management platform comprising one or more Hadoop clusters which process and store non-relational data — log files, internet clickstream records, sensor data, JSON objects, images, social media posts, and so on.
As a low-cost and efficient solution for storing large volumes of data, data lakes are no panacea. Data still remain in the original format. So while data lakes may eliminate the upfront costs of data ingestion and transformation, if you use multiple sources, want continual updates in real-time, and/or are moving high volumes of data to the data lake, the nearly unanimous consensus is that data lakes’ unstructured format render them unwieldy for analysis.
ETL on its last legs, Kafka on the rise
Several recent data trends are driving a dramatic change in Extract-Transform-Load (ETL) architecture. There are now more types of data sources, and stream data is increasingly ubiquitous. This led to a difficult choice with data integration in the old world: real-time but not scalable, or scalable but batch.
Apache Kafka is an open source streaming platform that enables the building of streaming data pipelines from “source” to “sink” through the Kafka Connect API and the Kafka Streams API. Logs unify batch and stream processing so that a log can be consumed via batched “windows”, or in real time, by examining each element as it arrives.
Automated Data Pipelines + On-Demand Data Warehouses = Success
Owing to the messy nature of data lakes, to truly gain a 360° view, to really realize your data’s true value, companies collecting vast quantities of customer data should invest in a structured data warehouse.
This takeaway is particularly relevant to what we do at Bedrock. Popular stats say that small to medium-sized businesses use, on average, 12-15 SaaS applications to boost productivity and open new engagement channels with customers. And according to HubSpot’s founder and CTO, Dharmesh Shah, that number has grown to about 18 cloud applications for SMBs, and anywhere from 70 to 100 for enterprises.
The preponderance of cloud applications has created big problems with regards to the storage and analysis of customer data. For when data spread across multiple systems and departments, organizations big and small fail to achieve a 360-degree view of customers.
And not to sound like a broken record, but ETL is old hat. ETL projects cost time and gobs of money. Companies that want a huge leg up on the competition should therefore seek to automate their SaaS data pipelines. They’ll save months in labor, hundreds of thousands of dollars, accelerate the time to analytics, and discover new insights hitherto hidden from view.
Customer 360° Analytics
The subject of automated data pipelines made for a great segue to Customer 360° Analytics. Before a room packed with BI developers, database engineers, and thought leaders in the mar-tech community, Bedrock Data VP of Marketing, Zak Pines, addressed this issue head-on.
“The first step to getting a unified view of customers,” Zak said, “is overcoming data silos.”
Using the NBA as an example, he then outlined the issues analysts encounter while collecting data. He also emphasized to avoid siloed questions like which marketing channels generate the most leads and customers and to “instead ask: Which of my marketing channels generates the most successful customers? Which of my product use cases generates the most profitable customers?”
If the problem with silos is the different data schemas, the path to success is standardizing the schema in a central warehouse. “It’s all about how to take all interactions, make sense, and roll it up at the account level.”
With the rise of artificial intelligence, good clean data is very important. Just like you shouldn’t train without having had anything to eat or drink before, you shouldn’t train machine learning algorithms on bad data. Deep neural networks learn faster and more effectively when data is in one place, in one format. Semantic, mathematic, and visual patterns are much easier to recognize. And the better job cognitive computing does, the more its impact will be felt across multiple industries, from healthcare and financial services, to manufacturing and education. In fact, cognitive computing is estimated to grow to $46 billion in a just a few years.
Performing specific, human-like tasks in an intelligent way is still far from easy. It requires complex connections to multiple data sources and types, processing power and storage networks that can cost effectively support the high-speed exploration of huge volumes of data, and the incorporation of various analytics and machine learning techniques to deliver insights that can be acted upon.
There are also many ethical concerns to AI. When applications predict outcomes, they can determine them as well. Bots can create social media crises. But who's responsible for the repercussions of a flawed algorithm? Does cognitive computing create more problems (identity theft, election results, spamming, fake product reviews, paid advertisements disguised as content, hacking, data breaches, etc.) than it solves?
The votes aren’t in yet. Algorithms learn and codify the current state of the world and thereby make it harder to change. Information asymmetry has also long been exploited for the advantage of some, to the disadvantage of others. And as Big Data and sophisticated analyses increase asymmetry in favor of those with ability to acquire and access data, the importance of an ethical framework increases.
A New Wave of Connectors
As data architects ambled about with fresh cups of coffee in hand, they would often stop by our booth asking which connectors we had in our library.
We were very excited to share with them a new wave of Fusion connectors:
- QuickBooks, Xero and NetSuite for finance
- Zuora & Recurly for subscriptions.
And as CTOs, dressed in sharp button-ups with lanyards hanging, paused with intrigue at our booth, we explained how expanding our number of connectors gives Fusion customers the ability to unify customer data across even more customer data sources.
Keynote by Bedrock Data CEO, Taylor Barstow
On the second day, Bedrock Data CEO, Taylor Barstow, told a story about, of all things, his leg…
Months back, he’d discovered a growth on it. And when Doctor Google said he had three months to live, he was feeling pretty scared and went to see his primary care doctor, who’d just updated his medical system.
So, of course, Taylor had to answer a litany of questions: age, family medical history, allergies – all the stuff his PCP already had on file but wanted in the new system.
After checking him out, Taylor’s doctor referred him to a dermatologist. And what happened? Dermatologist had to enter the same info into his own system: Age? Weight? Height? Family medical history? Allergies? Major surgeries? Alcohol or drug abuse? Eating habits? Favorite star wars character?
“It was a big waste of everybody’s time, mine and my doctors’,” Taylor said.
“And we see this problem everywhere. Get on the phone with your bank’s customer service. They transfer you 3 or 4 times. And before you know it you’ve told your name, address, up-to-date phone number, and so on to 3 or 4 different people. In 2018!”
Shortly after having the growth removed, Taylor reflected about the data chaos from which nearly every modern business suffers.
“Fast forward a couple months,” Taylor went on. “It’s 4 pm on a Friday and I’m feeling pretty good about being the CEO of the company I started. Feeling like Bedrock was doing a great job at helping marketers and other business people with their data problems by keeping their cloud applications in sync.”
“Then I ask my Director of Customer Success, Luke, to run a report that lets us see support tickets by account. This report was great to run because usually there’s more than one contact per account and our help desk system lacks these account-based capabilities.”
Luke groaned. “Can it wait? These reports take 3 to 4 hours.”
“3-4 hours,” Taylor said, not without surprise. “You’re kidding.”
“How often do you do this?”
“About once a week.”
“Why does it take so long?”
“Well I have to pull all of the relevant data into Excel first. Then mash it up to get the charts and graphs and metrics we want.”
“It was a wake up call,” Taylor told the audience. “Despite my frustration with doctors, bankers, with customer service folks on the phone, here I was, CEO of a data company, making the same mistakes for the very problem I’d set out to solve. We, too, were living in data chaos.”
“Sure, our apps were integrated. Data was flowing back and forth. But the data was still locked in the original SaaS systems. There had to be a better way.”
With Fusion, Taylor explained, Luke’s life got a whole lot easier. He can now run his tickets-by-account report – and many others like it – without wasting 4 hours a week (times 16 when you think about all of the other reports).
“Now we can run each of those in seconds. All you do is log into Fusion. Connect whatever cloud apps you use. Log into each system you want to connect to Fusion. And when your Fused Data Warehouse is ready, you get an email telling you all your accounts across all of your systems have been matched and brought together so you can report on them in one place, without having to do tons of work in Excel to get the data into the right shape every time you want to run a report.”
And the best part is that you get a really simple interface to that data. It’s called SQL. Just about every BI tool on the planet uses it. And you can write killer customer reports in whatever tool you want, on top of a 360º customer data set.
Of course, getting cloud data into SQL and BI tools isn’t that special in and of itself. There are lots of ETL tools out there.
But Fusion goes beyond ETL. And this is where the concept of Fused Data comes in.
First, Bedrock translates and standardizes your data. So dates and booleans and all the other field types are in a standard format. And this is all pre-baked – you don’t have to tell the system how to do this or spend hours editing config files.
Then, beyond simple translation, we transform your data, reconciling all of the different data schemas out there. And when every cloud application has a different schema, it’s a real pain to analyze. You still have to do tons of data prep and data wrangling to get data ready for BI.
Most people we’ve talked think data prep is about 80% of an analytics project. Well, Fusion takes all of that disparate data and passes it through a master data layer, bringing everything into a universal schema.
So at the end of the day, you get a unified relational data set that’s ready for high level analysis without any of the data wrangling pain. This is what we call a “Fused Record” – a contact or a company with all of the differences between the schemas and the data values themselves ironed out and standardized.
What this means is connecting data across silos no longer requires manual data matching by an analyst — it’s automatically unified by Fusion. Your Fused Database is an on-demand SQL warehouse that enables Customer 360 analytics, business intelligence, and real-time dashboards by eliminating the delays of data preparation.
The newly released connectors (QuickBooks, Xero and NetSuite for finance, Zuora & Recurly for subscriptions) open new possibilities. Fusion customers using CRM software such as Salesforce and Microsoft Dynamics and marketing automation software such as HubSpot and Marketo can now connect to even more SaaS applications.
So if you want to save yourself and your company countless hours of repetitive, error prone work, sign up for a free trial at fusion.bedrockdata.com.