March 2021 - Str-Headless

Announcing Our $50m Series C Round Led by Goldman Sachs Growth

Posted on March 30, 2021 by Ali Kutay | 4 min read | 2 views

Scaling The Adoption of our Enterprise Data Platform

I am very excited to officially announce that Striim has just received $50 million in Series C financing. I’m particularly pleased that it was led by Goldman Sachs, and also includes our existing investors including Summit Partners, Atlantic Bridge Ventures, Dell Ventures, and Bosch Ventures. Goldman has the most sophisticated technology radar of anyone in the finance industry, and what they saw in Striim was a next generation technology that could enable the cloud wave that’s breaking all over the world and disrupting essentially every enterprise in every industry vertical. All the credit goes to our team members, our customers and our partners for this recognition.

While many companies may use cloud for services like IaaS, backup storage, or web operations, only a small share of enterprises have moved their core workloads to the cloud. We expect that many more will be doing so over the next several years, with Striim as the engine driving and accelerating that digital transformation. Striim provides the ability to make real time data driven decisions by collecting, moving and processing data from multiple sources and delivering it to various cloud targets with low latency. Crucially, Striim can do this in an automated, scalable, secure, and reliable way, with very little configuration effort and no coding.

What this new infusion of financing will do is to allow Striim to strengthen its go-to-market capabilities to meet the exploding global demand for transitioning data to the cloud. That means significantly expanding the teams and building an expanded presence in EMEA, APAC, ANZ, and South America, as well as collaborating even more closely with our strategic partners, including Google and Microsoft.

This new financing will also help with an important new expansion of Striim’s product portfolio . Right now, the Striim platform is provided with OOTB features to build data pipelines – deployed on-premise, through cloud marketplaces, or as a containerized cloud solution. However, later this year we will be launching multiple products that provide fully managed cloud services, to comprehensively handle cloud data flows with zero administration, and in a highly available, and managed fashion for companies ranging from small businesses to large enterprises.

In addition to new funds, we are delighted to have a new board member, Bob Kelly, an operating partner for Goldman Sachs. Before he joined Goldman, Bob was an executive at Microsoft where he worked on the development of Microsoft’s cloud, Azure. He brings a great deal of experience to this area and he absolutely understands the potential significance of Striim’s technology in transforming the world’s economy into a fully digital one. Goldman’s participation in moving Striim forward is a huge vote of confidence in our people and what we are doing.

In this economy, the only constant is change. We are in a very dynamic market, serving a spectrum of similarly dynamic industries. It is crucial for these businesses to have accurate and timely information in order to make critical business decisions. One of the key aspects of our real-time data integration is a technology called Change Data Capture. Providing reliable change data capture that supports sustained high volumes across multiple databases is not an easy task, and typically was only supported by database replication technologies. However, we have coupled this with real-time ETL and streaming capabilities providing a unique combination that has been instrumental in our customers’ success as well as driving our partnerships with Google Cloud and Microsoft Azure to provide on-premise migrations to those cloud technologies.

We believe that Striim’s platform with our real-time data integration technology, and upcoming cloud services will provide the rarely-seen infrastructure enabling digital transformation of businesses in essentially every sector worldwide. And we are tremendously gratified by the vote of confidence that Goldman Sachs, Summit Partners, Atlantic Bridge Ventures, Dell Ventures and Bosch Ventures have shown in the work we are doing. Let’s go!

A Guide to Building a Multi-Cloud Strategy: Up Your Data Game

Posted on March 18, 2021 by John Kutay | 5 min read | 2 views

Using two or more public cloud providers has become almost a norm over the past few years. Gartner predicts that over 75% of midsized and large companies will deploy a multi-cloud and/or hybrid IT strategy by 2021. A multi-cloud strategy enables companies to use the best possible cloud for specific tasks and to more effectively store, compute, and analyze data.

But from security concerns to data integration needs, using multiple clouds is riddled with challenges. To help companies execute their IT initiatives, we’ll examine different aspects of a multi-cloud strategy and data integration in a multi-cloud environment.

What is a multi-cloud strategy

A multi-cloud strategy involves using multiple cloud providers to host data, run apps, build infrastructure, and deliver IT services. Multi-cloud typically means using more than one of the big three cloud providers (Amazon Web Services, Microsoft Azure, and Google Cloud), as well as other, smaller providers. Users can deploy both public and private clouds.

The end goal is to have providers play to their strengths. A company may find, for instance, that a specific cloud platform is more suitable for bare metal compute in the cloud, another is stronger for cloud data warehousing, while another platform is better equipped to handle machine learning. Using several clouds to handle different workloads has become the best practice for many companies.

Cloud vendors are well aware of the rise of multi-cloud, and they adjust their products accordingly. Google, for instance, now offers BigQuery Omni, a multi-cloud version of its popular analytics tool. Users of this software can now connect to their data stored on Google Cloud, AWS, and Azure without moving or copying data sets.

What is the difference between hybrid cloud versus multi-cloud

Multi-cloud sometimes gets confused with hybrid cloud. Multi-cloud means using multiple cloud providers, while hybrid cloud is about combining various cloud and on-premises systems.

Think of multi-cloud as a strategy for gaining efficiency through using public and private clouds from different cloud vendors. Companies opt for this approach to meet specific technical or business requirements.

And think of hybrid cloud as infrastructure that consists of on-premises servers, private clouds, and public clouds. For instance, Hess Corporation, an energy business, is using hybrid cloud. The company has migrated its IT infrastructure to the AWS Cloud but runs parts of its core businesses using on-premises systems.

How a multi-cloud strategy elevates your data game

Adopting a multi-cloud strategy elevates your data game by enabling your IT team to accomplish the following:

1. Increase productivity: Using top resources from various cloud vendors allows you to be more productive. One vendor could efficiently handle your large data transfers, while the other could excel in deep learning capabilities.

2. Increase flexibility: Apart from productivity, working with different cloud providers offers more flexibility. IT teams could face unique challenges when deploying certain apps and may need to use AWS to store data and Azure for data processing. Having access to multiple clouds makes this possible.

3. Cut costs: Companies can move their workloads between different cloud applications and take advantage of dynamic pricing. Having nodes in several clouds thus helps organizations to cut cloud costs.

4. Avoid vendor lock-in: A multi-cloud strategy ensures companies aren’t tied to a single cloud provider and its protocols, proprietary systems, and pricing. Companies can avoid costly lock-ins and explore other providers when needed.

5. Recover from disasters: If your primary cloud fails, you can move data, workflows, and systems to a backup cloud. A multi-cloud strategy is a failover solution that ensures that your mission-critical apps are always available.

6. Improve response time: Working with multiple cloud vendors allows companies to store data in data centers closest to their customers. Such proximity to end-users reduces latency and improves the response times of cloud services.

7. Comply with laws: Data privacy and governance regulations, such as the General Data Protection Regulation (GDPR), often require that sensitive data be held in specific jurisdictions. A multi-cloud strategy provides companies with different options on where to store their data.

Challenges of a multi-cloud strategy

Using a multi-cloud strategy to increase the value of your cloud environments requires overcoming the following challenges:

1. Architectural complexity: Migrating to a multi-cloud environment typically requires making changes to data architecture, particularly if an organization has vast on-premises architecture.

2. Extra agility: A multi-cloud strategy may offer more flexibility, but IT teams need to be extra agile with managing nodes in multiple cloud applications and shifting between them when required.

3. Security concerns: Managing and moving data in various clouds require that organizations defend a wider attack surface and deal with more security threats.

4. Data governance: Regulations such as the GDPR hold both users and providers of cloud services accountable for privacy breaches. A multi-cloud strategy means organizations are responsible for data governance in multiple clouds.

Key considerations when choosing a multi-cloud integration platform

Integrating data, apps, and other assets in a multi-cloud environment requires the use of integration platforms. When deciding on which platform to use, make sure it can

work with on-premises and cloud systems;
transform data into a consumable format;
manage and monitor data streams in a single console; and
deliver data to multiple cloud targets without needing separate integration apps.

Without multi-cloud integration, each cloud will become siloed. Data sharing will be limited, and organizations won’t get value from their data.

The importance of a robust data integration platform thus can’t be overstated. It allows companies the flexibility to use new cloud solutions without manually setting up point-to-point integrations. On top of that, change data capture can be used to sync data across clouds in near real-time. Monitoring data in a single console helps data teams manage a complex IT environment more easily, while having a single solution for extracting data simplifies integration architecture complexity.

Extracting more value from data

A multi-cloud strategy is a default option for many midsized and large companies. Working with multiple cloud vendors enables businesses to use the best possible solution for different workloads. Data teams can be more productive and flexible. As a result, companies get more value from their data and are more likely to achieve a competitive advantage.

ETL vs ELT: Key Differences and Latest Trends

Posted on March 5, 2021 by John Kutay | 12 min read | 2 views

ETL vs ELT Infographic — An overview of ETL vs ELT. Both ETL and ELT enable analysis of operational data with business intelligence tools. In ETL, the data transformation step happens before data is loaded into the target (e.g. a data warehouse). In ELT, data transformation is performed after the data is loaded into the target.

Overview

ETL (extract, transform, load) has been a standard approach to data integration for decades. But the rise of cloud computing and the need for self-service data integration has enabled the development of new approaches such as ELT (extract, load, transform).

In a world of ever-increasing data sources and formats, both ETL and ELT are essential data science tools. But what are the differences? Is it simply semantics? Or are there significant advantages to taking one approach over the other?

To help you decide on which data integration method to use, we’ll explore ETL and ELT, their strengths and weaknesses, and how to get the most out of both technologies. You’ll learn why ETL is a great choice if you need transformations with business-logic, granular compliance on in-flight data, and low latency in the case of streaming ETL. And we’ll also highlight how ELT is a better option if you require rapid data loading, minimal maintenance, and highly automated workflows.

We’ll also discuss how you can leverage both ETL and ELT for the best of both worlds. Regardless, you will want to select a modern, scalable solution compatible with cloud platforms.

What is ETL? An Overview of the ETL Process

ETL is a data integration process that helps organizations extract data from various sources and bring it into a single target database. The ETL process involves three steps:

Extraction: Data is extracted from source systems-SaaS, online, on-premises, and others-using database queries or change data capture processes. Following the extraction, the data is moved into a staging area.
Transformation: Data is then cleaned, processed, and turned into a common format so it can be consumed by a targeted data warehouse, database, or data lake for analysis by a business intelligence platform (Tableau, Looker, etc)
Loading: Formatted data is loaded into the target system. This process can involve writing to a delimited file, creating schemas in a database, or a new object type in an application.

Advantages of ETL Processes

ETL integration offers several advantages, including:

Preserves resources: ETL can reduce the volume of data that is stored in the warehouse, helping companies preserve storage, bandwidth, and computation resources in scenarios where they are sensitive to costs on the storage side. Although with commoditized cloud computing engines, this is less of a concern.
Improves compliance: ETL can mask and remove sensitive data, such as IP or email addresses, before sending it to the data warehouse. Masking, removing, and encrypting specific information helps companies comply with data privacy and protection regulations such as GDPR , HIPAA, and CCPA.
Well-developed tools: ETL has existed for decades, and there is a range of robust platforms that businesses can deploy to extract, transform, and load data. This makes it easier to set up and maintain an ETL pipeline.

Drawbacks of ETL Processes

Companies that use ETL also have to deal with several drawbacks:

Legacy ETL is slow: Traditional ETL tools require disk-based staging and transformations.
Frequent maintenance: ETL data pipelines handle both extraction and transformation. But they have to undergo refactors if analysts require different data types or if the source systems start to produce data with deviating formats and schemas.
Higher Upfront Cost: Defining business logic and transformations can increase the scope of a data integration project.

How to Modernize ETL with Streaming

The venture capital firm Andreessen Horowitz (a16z) published a piece in which it portrays ETL processes as “brittle,” while ELT pipelines are hailed as more flexible and modern. However there is innovation being delivered in the ETL space as well. Modern streaming ETL platforms can deliver real-time data integration leveraging a technology called in-memory stream processing . Data is loaded in real-time while transformation logic is compiled and processed in-memory (faster than disk-based processing), scaled across multiple nodes to handle high data volumes at sub-second speeds.

In a streaming ETL platform, transformation logic is processed in-memory, scaling horizontally to handle large volumes of data at sub-second speeds.

Companies are leveraging tools like Apache Kafka and Spark Streaming to implement streaming ETL pipelines . Products like Striim also offer streaming ETL as more of a holistic, real-time data integration platform .

As an example, Macy’s built a cloud replication solution that supported streaming ETL with transformations on in-flight data to detect and resolve mismatched timestamps before replicating it into Google Cloud. This helped them deliver applications that could absorb peak Black Friday workloads using horizontally scalable compute. This is a scenario where a modern, streaming ETL platform outperforms legacy ETL where latency would be too high and data would likely be stale in the target system as a result.

Macy's ETL Database replication Architecture — Macy’s uses Striim’s streaming ETL platform to perform scalable, in-flight transformations that delivers data to Google Cloud targets with sub-second latency (<200 ms latency during peak Black Friday loads).

What is ELT? An overview of the ELT process

ELT is a data integration process that transfers data from a source system into a target system without business logic-driven transformations on the data. The ELT process involves three stages:

Extraction: Raw data is extracted from various sources, such as applications, SaaS, or databases.
Loading: Data is delivered directly to the target system – typically with schema and data type migration factored into the process.
Transformation: The target platform can then transform data for reporting purposes. Some companies rely on tools like dbt for transformations on the target.

An ELT pipeline reorders the steps involved in the integration process with the data transformation step occurring at the end instead of in the middle of the process.

James Densmore – Director of Data Infrastructure at Hubspot – pointed out another nuance of ELT . While there’s no expression of business logic-driven transformations in ELT, there’s still some implicit normalization and conversion of data to match the target data warehouse. He refers to that concept as EtLT in his book on data pipelines .

What Led to the Recent Popularity of ELT

ELT owes its popularity in part to the fact that cloud storage and analytics resources have become more affordable and powerful. This development had two consequences. One, bespoke ETL pipelines have become ill-suited to handle an ever-growing variety and volume of data created by cloud-based services. And second, companies can now afford to store and process all of their unstructured data in the cloud. They no longer need to reduce or filter data during the transformation stage.

Analysts now have more flexibility in deciding how to work with modern data platforms like Snowflake that are well-suited to transform and join data scale.

Advantages of ELT Processes

ELT offers a number of advantages:

Fast extraction and loading: Data is delivered into the target system immediately with minimal processing in-flight.
Lower upfront development costs : ELT tools are typically adept at simply plugging source data into the target system with minimal manual work from the user given that user-defined transformations are not required.
More flexibility: Analysts no longer have to determine what insights and data types they need in advance but can perform transformations on the data as needed in the warehouse with tools like dbt

For instance, in database to data warehouse replication scenarios, companies such as Inspyrus use Striim for pure ELT-style replication to Snowflake in concert with dbt for transformations that trigger jobs in Snowflake to normalize the data. This enabled Inspyrus to take a workload that used to take days/weeks and turned it into a near-real-time experience .

Inspyrus ELT architecture — Inspyrus uses Striim for near real-time ELT-style replication to Snowflake.

Challenges of ELT Processes

ELT is not without challenges, including:

Overgeneralization: Some modern ELT tools make generalized data management decisions for their users – such as rescanning all tables in the event of a new column or blocking all new transactions in the case of a long-running open transaction. This may work for some users, but could result in unacceptable downtime for others.
Security gaps: Storing all the data and making it accessible to various users and applications come with security risks. Companies must take steps to ensure their target systems are secure by properly masking and encrypting data.
Compliance risk: Companies must ensure that their handling of raw data won’t run against privacy regulations and compliance rules such as HIPAA, PCI, and GDPR.
Increased Latency: In cases where transformations with business logic ARE required in ELT, you must leverage batch jobs in the data warehouse. If latency is a concern, ELT may slow down your operations.

ETL vs ELT Comparison

Differences of ETL versus ELT are evident in a number of parameters. And we summarized some of the key differences between the two data integration approaches in the table below.

.texttable td{padding:10px}

Parameters	ETL	ELT
Order of the Process	Data is transformed at the staging area before being loaded into the target system	Data is extracted and loaded into the target system directly. The transformation step(s) is/are handled in the target.
Key Focus	Loading into databases where compute is a precious resource. Transforming data, masking data, normalizing, joining between tables in-flight.	Loading into Data Warehouses. Mapping schemas directly into warehouse. Separating load from transform and execute transforms on the warehouse.
Privacy Compliance	Sensitive information can be redacted before loading into the target system	Data is uploaded in its raw form without any sensitive details removed. Masking must be handled in target system.
Maintenance Requirements	Transformation logic and schema-change management may require more manual overhead	Maintenance addressed in the data warehouse where transformations are implemented
Latency	Generally higher latency with transformations, can be minimized with streaming ETL	Lower latency in cases with little-to-no transformations
Data flexibility	Edge cases can be handled with custom rules and logic to maximize uptime	Generalized solutions for edge cases around schema drift and major resyncs – can lead to downtime or increased latency in not carefully planned
Analysis flexibility	Use cases and report models have to be defined beforehand	Data can be added at any time with schema evolution. Analysts can build new views off the target warehouse.
Scale of Data	Can be bottlenecked by ETL if it is not a scalable, distributed processing system	Implicitly more scalable as less processing takes place in the ELT tool

Operationalize Your Data Warehouse with “Reverse ETL”

Data warehouses have become the central source of truth, where data from disparate sources is unified to gain business insights. However, data stored in a data warehouse is typically the domain of data analysts who perform queries and create reports. While reports are useful, customer data has even more value if it is immediately actionable by the teams who work with leads and customers (sales, marketing, customer service).

Reverse ETL platforms aim to solve this problem by including connectors to many common sales and marketing applications (such as Zendesk, Salesforce, and Hubspot). They enable real-time or periodic synchronization between data warehouses and apps. Use cases of reverse ETL include:

Pushing product usage information (e.g. reaching the a-ha moment during a product trial) into Salesforce and creating a task for a sales rep to reach out. Additionally, product usage data can be pushed into Hubspot to add users to a highly-relevant, automated drip campaign.
Syncing sales activities with Hubspot or Intercom to create personalized email or chat bot flows
Creating audiences for advertising campaigns based on product usage data, sales activities, and more.

Reverse ETL is the latest trend to emerge in its current state with the modern data ecosystem – however its conceptually not a new category. Data teams have been building applications to operationalize data from OLAP systems before the new ‘Reverse ETL’ stack became popular.

On the other hand, ‘Reverse ETL’ tools are novel in how they integrate with the wave of applications that are designed to leverage 3rd party integrations and a single source of truth of the customer.

For example, in a survey of marketers who switched MarTech SaaS tools, data centralization and integration was one of the leading drivers of changing their MarTech stack.

No matter whether you choose ETL or ELT, once the data is in your data warehouse, reverse ETL allows you to plug analytical data into operational applications such as Salesforce, Marketo, and Hubspot.

ETL or ELT?

Every data team needs to make trade-offs that are very specific to their own operations. Yet choosing a platform that supports both modern ETL and ELT constructs can allow maximum flexibility in your implementation. You may find that ELT is the right choice to get you started with a low friction, automated solution for data integration. Yet that same topology may require ETL in the future once you discover some in-line transformations that need to be implemented for new use cases and non-data warehousing targets (message busses, applications).

ETL vs ELT Companies — Striim is a flexible platform that enables real-time ETL and ELT from a wide range of sources including on-prem and cloud databases, IoT, messaging systems, network protocols, files and more.

Using Data to Achieve Business Goals

Whether you’re working on data warehousing, machine learning, cloud migration, or other data projects, choosing a data integration approach is of vital importance. ETL is a legacy solution that got upgraded with real-time data integration capabilities. But the power of the cloud has made ELT an exciting option for many companies.

Choosing an appropriate method also depends on your storage technology, data warehouse architecture, and the use of data in day-to-day operations. Knowing the pros and cons of both of these technologies will help you make an informed decision. And armed with powerful data integration solutions, you can more easily harness the power of data and achieve business goals.