Real-Time Data Integration for Snowflake with Striim on Partner Connect

With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service – Harsha Kapre, Director of Product Management at Snowflake

Data Integration for Snowflake: Announcing Striim on Partner Connect

At Striim, we value building real-time data integration solutions for cloud data warehouses. Snowflake has become a leading Cloud Data Platform by making it easy to address some of the key challenges in modern data management such as

  • Building a 360 view of the customer
  • Combining historical and real-time data
  • Handling large scale IoT device data
  • Aggregating data for machine learning purposes

It only took you minutes to get up and running with Snowflake. So, it should be just as easy to move your data into Snowflake with an intuitive cloud-based data integration service.

To give you an equally seamless data integration experience, we’re happy to announce that Striim is now available as a cloud service directly on Snowflake Partner Connect.

“Striim simplifies and accelerates the movement of real-time enterprise data to Snowflake with an easy and scalable pay-as-you-go model,” Director of Product Management at Snowflake, Harsha Kapre said. “With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service.”

John Kutay, Director of Product Growth at Striim, highlights the simplicity of Striim’s cloud service on Partner Connect: “We focused on delivering an experience tailored towards Snowflake customers; making it easy to bridge the gap between operational databases and Snowflake via self-service schema migration, initial data sync, and change data capture.

A Quick Tutorial

We’ll dive into a tutorial on how you can use Striim on Partner Connect to create schemas and move data into Snowflake in minutes. We’ll cover the following in the tutorial:

  • Launch Striim’s cloud service directly from the Snowflake UI
  • Migrate schemas from your source database. We will be using MySQL for this example, but the steps are almost exactly the same other databases (Oracle, PostgreSQL, SQLServer, and more).
  • Perform initial load: move millions of rows in minutes all during your free trial of Striim
  • Kick off a real-time replication pipeline using change data capture.
  • Monitoring your data integration pipelines with real-time dashboards and rule-based alerts

But first a little background!

What is Striim?

At a high level, Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling real-time data integration from popular databases such as Oracle, SQLServer, PostgreSQL and many others.

In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to enrich and normalize data before it’s written to Snowflake.

Our focus on usability and scalability has driven adoption from customers like Attentia, Belgium-based HR and well-being company, and Inspyrus, a Silicon Valley-based invoice processing company, that chose Striim for data integration to Snowflake.

What is Change Data Capture?

While many products focus on batch data integration, Striim specializes in helping you build continuous, real-time database replication pipelines using change data capture (CDC).This keeps the target system in sync with the source database to address real-time requirements.

Before we dive into an example pipeline, we’ll briefly go over the concept of Change Data Capture (CDC). CDC is the process of tailing the database’s change logs, turning database events such as inserts, updates, deletes, and relevant DDL statements into a stream of immutable events, and applying those changes to a target database or data warehouse.

Change data capture is also a useful software abstraction for other software applications such as version control and event sourcing.

Event as a Change to an Entry in a Database
Imagine Each Event as a Change to an Entry in a Database

Striim brings decades of experience delivering change data capture products that work in mission-critical environments. The founding team at Striim was the executive (and technical) team at GoldenGate Software (now Oracle GoldenGate). Now Striim is offering CDC as an easy-to-use, cloud-based product for data integration.

Migrating data to Snowflake in minutes with Striim’s cloud service

Let’s dive into how you can start moving data into Snowflake in minutes using our platform. In a few simple steps, this example shows how you can move transactional data from MySQL to Snowflake. Let’s get started:

1. Launch Striim in Snowflake Partner Connect

In your Snowflake UI, navigate to “Partner Connect” by clicking the link in the top right corner of the navigation bar. There you can find and launch Striim.

2. Sign Up For a Striim Free Trial

Striim’s free trial gives you seven calendar days of the full product offering to get started. But we’ll get you up and running with schema migration and database replication in a matter of minutes.

3. Create your first Striim Service.

A Striim Service is an encapsulated SaaS application that dedicates the software and fully managed compute resources you need to accomplish a specific workload; in this case we’re creating a service to help you move data to Snowflake! We’re also available to assist with you via chat in the bottom right corner of your screen.

 

4. Start moving data with Striim’s step-by-step wizards.

In this case, the MySQL to Snowflake is selected. As you can see, Striim supports data integration for a wide-range of database sources – all available in the free trial.

 

 

5. Select your schemas and tables from your source database

 

6. Start migrating your schemas and data

After select your tables, simply click ‘Next’ and your data migration pipeline will begin!

 

snowflake pc

7. Monitor your data pipelines in the Flow Designer

As your data starts moving, you’ll have a full view into the amount of data being ingested and written into Snowflake including the distribution of inserts, updates, deletes, primary key changes and more.


For a deeper drill down, our application monitor gives even more insights into low-level compute metrics that impact your integration latency.

Real-Time Database Replication to Snowflake with Change Data Capture

Striim makes it easy to sync your schema migration and CDC applications.

While Striim makes it just as easy to build these pipelines, there are some prerequisites to configuring CDC from most databases that are outside the scope of Striim.

To use MySQLReader, the adapter that performs CDC, an administrator with the necessary privileges must create a user for use by the adapter and assign it the necessary privileges:

CREATE USER 'striim' IDENTIFIED BY '******';
GRANT REPLICATION SLAVE ON *.* TO 'striim';
GRANT REPLICATION CLIENT ON *.* TO 'striim';
GRANT SELECT ON *.* TO 'striim';</code>

The MySQL 8 caching_sha2_password authentication plugin is not supported in this release. The mysql_native_password plugin is required. The minimum supported version is 5.5 and higher.

The REPLICATION privileges must be granted on *.*. This is a limitation of MySQL.

You may use any other valid name in place of striim. Note that by default MySQL does not allow remote logins by root.

Replace ****** with a secure password.

You may narrow the SELECT statement to allow access only to those tables needed by your application. In that case, if other tables are specified in the MySQLReader properties, Striim will return an error that they do not exist.

MYSQL BINARY LOG SETUP
MySQLReader reads from the MySQL binary log. If your MySQL server is using replication, the binary log is enabled, otherwise it may be disabled.

For MySQL, the property name for enabling the binary log, its default setting, and how and where you change that setting vary depending on the operating system and your MySQL configuration, so see the documentation for the version of MySQL you are running for instructions.

If the binary log is not enabled, Striim’s attempts to read it will fail with errors such as the following:

2016-04-25 19:05:40,377 @ -WARN hz._hzInstance_1_striim351_0423.cached.thread-2
com.webaction.runtime.Server.startSources (Server.java:2477) Failure in Starting
Sources.
java.lang.Exception: Problem with the configuration of MySQL
Row logging must be specified.
Binary logging is not enabled.
The server ID must be specified.
Add --binlog-format=ROW to the mysqld command line or add binlog-format=ROW to your
my.cnf file
Add --bin-log to the mysqld command line or add bin-log to your my.cnf file
Add --server-id=n where n is a positive number to the mysqld command line or add
server-id=n to your my.cnf file
at com.webaction.proc.MySQLReader_1_0.checkMySQLConfig(MySQLReader_1_0.java:605) ...</code>

Once those prerequisites are completed, you can run the MySQL CDC wizard and start replicating data from your database right where your schema migration and initial load left off.

Maximum Uptime with Guaranteed Delivery, Monitoring and Alerts

Striim gives your team full visibility into your data pipelines with the following monitoring capabilities:

  • Rule-based, real-time alerts where you can define your custom alert criteria
  • Real-time monitoring tailored to your metrics
  • Exactly-once processing (E1P) guarantees

Striim uses a built-in stream processing engine that allows high volume data ingest and processing for Snowflake ETL purposes.

Conclusion

To summarize, Striim on Snowflake Partner Connect provides an easy-to-use cloud data integration service for Snowflake. The service comes with a 7-day free trial, giving you ample time to begin your journey to bridge your operational data with your Snowflake Data Warehouse.

Visit our Striim for Snowflake webpage for a deeper dive into solutions such as

  • How to migrate data from Oracle to Snowflake using Change Data Capture
  • How to migrate data from SQL Server to Snowflake using Change Data Capture
  • Teradata to Snowflake migrations
  • AWS Redshift to Snowflake migrations
  • Moving IoT data to Snowflake (OPCUA, MQTT)
  • Moving data from AWS S3 to Snowflake

As always, feel free to reach out to our integration experts to schedule a demo.

Online Enterprise Database Migration to Google Cloud

Migrate to cloud

Migrating existing workloads to the cloud is an formidable step in the journey of digital transformation for enterprises. Moving an enterprise application from on premises to run in the cloud, or modernizing with the best use of cloud-native technologies, is only part of the challenge. A major part of this task is to move the existing enterprise databases while business continuously operate at full speed.

Pause never

How the data is extracted and loaded into the new cloud environment plays a big role in keeping the business critical systems performant. Particularly for enterprise databases supporting mission-critical applications, avoiding downtime is a must-have requirement during migrations to minimize both the risk and operational disruption.

For business critical applications, the acceptable downtime precipitously approaches zero. All the while, moving large amounts of data, and essential testing of the business critical applications can take days, weeks, or even months.

Keep running your business

The best practice in enterprise database migration, to minimize and even altogether eliminate the downtime, is to use online database migration that keeps the application running.

In the online migration, changes from the enterprise source database are captured non-intrusively as real-time data streams using Change Data Capture (CDC) technology. This capability is available for most major databases, including Oracle, Microsoft SQL Server, HPE NonStop, MySQL, PostgreSQL, MongoDB, and Amazon RDS, but has to be harnessed in the correct way.

In online database migration, first, you initially load the source database to the cloud. Then, any changes in the source database that have happened since you were executing the initial load are applied to the target cloud database continuously from the real-time data stream. The source and target databases will remain up to date until you are ready to completely cut over. You will also have the option to fallback to the source all along, further minimizing risks.

Integrate continuously

Online database migration also provides essential data integration services for the new application development in the cloud. The change delivery can be kept running while you develop and test the new cloud applications. You may even choose to keep the target and source databases in sync indefinitely typically for continuous database replication in hybrid or multi-cloud use cases.

Keep fresh

Once the real-time streaming data pipelines to the cloud are set up, businesses can easily build new applications, and seamlessly adopt new cloud services to get the most operational value from the cloud environment. Real-time streaming is a crucial element in all such data movement use cases, and it can be widely applied to hybrid or multi-cloud architectures, operational machine learning, analytics offloading, large scale cloud analytics, or any other scenario where having up-to-the-second data is essential to the business.

Change Data Capture

Striim, in strategic partnership with Google Cloud, offers online database migrations and real-time hybrid cloud data integration to Google Cloud through non-intrusive Change Data Capture (CDC) technologies. Striim enables real-time continuous data integration from on-premises and other cloud data sources to BigQuery, Cloud Spanner, Cloud SQL for PostgreSQL, for MySQL, and for SQL Server, as well as Cloud Pub/Sub and Cloud Storage as well as other databases running in the Google Cloud.

Replicate to Google Cloud

In addition to data migration, data replication is an important use case as well. In contrast to data migration, data replication continuously replicates data from a source system to a target system “forever” without the intent to shut down the source system.

An example target system in the context of data replication is BigQuery. It is the data analytics platform of choice in Google Cloud. Striim supports continuous data streaming (replication) from an on-premises database to BigQuery in Google Cloud in case the data has to remain on-premises and cannot be migrated. Striim bridges the two worlds and makes Google Cloud data analytics accessible by supporting the hybrid environment.

Transform in flight

Data migration and continuous streaming in many cases transports the data unmodified from the source to the target systems. However, many use cases require data to be transformed to match the target systems, or to enrich and combine data from different sources in order to complement and complete the target data set for increased value and expressiveness in a simple and robust architecture. This method is frequently referred to as Extract Transform Load, or ETL.

Striim provides a very flexible and powerful in-flight transformation and augmentation functionality in order to support use cases that go beyond simple one-time data migration.

More to migrate? Keep replicate!

Enterprises in general have several data migration and online streaming use cases at the same time. Often data migration takes place for some source databases, while data replication is ongoing for others.

A single Striim installation can support several use cases at the same time, reducing the need for management and operational supervision. The Striim platform supports high-volume, high velocity data with built-in validation, security, high-availability, reliability, and scalability as well as backup-driven disaster recovery addressing enterprise requirements and operational excellence.

The following architecture shows an example where migration and online streaming is implemented at the same time. On the left, the database in the Cloud is migrated to the Cloud SQL database on the right. After a successful migration the source database is going to be removed. In addition, the two source databases on the left in an on-premises data center are continuously streamed (replicated) to BigQuery for analytics and Cloud Spanner for in-Cloud processing.

Keep going

In addition, Striim as the data migration technology is implemented in a high-availability configuration. The three servers on Compute Engine form a cluster, and each of the servers is executing in a different zone, making the cluster highly available and protecting the migration and online streaming from zone failures or zone outages.

Accelerate Cloud adoption

As organizations modernize their data infrastructure, integrating mission-critical databases is essential to ensure information is accessible, valuable, and actionable. Striim and Google Cloud’s partnership supports Google customers with a smooth data movement and continuous integration solutions, accelerating Google Cloud adoption and driving business growth.

Learn more

To learn more about the enterprise cloud data integration questions, feel free to reach out to Striim and check out these references:?

Google Cloud Solution Architecture: Architecting database migration and replication using Striim

Blog: Zero downtime database migration and replication to and from Cloud Spanner

Tutorial: Migrating from MySQL to BigQuery for Real-Time Data Analytics

 

Striim 3.10.1 Further Speeds Cloud Adoption

 

 

We are pleased to announce the general availability of Striim 3.10.1 that includes support for new and enhanced Cloud targets, extends manageability and diagnostics capabilities, and introduces new ease of use features to speed our customers’ cloud adoption. Key Features released in Striim 3.10.1 are directly available through Snowflake Partner Connect to enable rapid movement of enterprise data into Snowflake.

Striim 3.10.1 Focus Areas Including Cloud Adoption

This new release introduces many new features and capabilities, summarized here:

3.10.1 Features Summary

 

Let’s review the key themes and features of this new release, starting with the new and expanded cloud targets

Striim on Snowflake Partner Connect

From Snowflake Partner Connect, customers can launch a trial Striim Cloud instance directly as part of the Snowflake on-boarding process from the Snowflake UI and load data, optionally with change data capture, directly into Snowflake from any of our supported sources. You can read about this in a separate blog.

Expanded Support for Cloud Targets to Further Enhance Cloud Adoption

The Striim platform has been chosen as a standard for our customers’ cloud adoption use-cases partly because of the wide range of cloud targets it supports. Striim provides integration with databases, data warehouses, storage, messaging systems and other technologies across all three major cloud environments.

A major enhancement is the introduction of support for the Google BigQuery Streaming API. This not only enables real-time analytics on large scale data in BigQuery by ensuring that data is available within seconds of its creation, but it also helps with quota issues that can be faced by high volume customers. The integration through the BigQuery streaming API can support data transfer up to 1GB per second.

In addition to this, Striim 3.10.1 also has the following enhancements:

  • Optimized delivery to Snowflake and Azure Synapse that facilitates compacting multiple operations on the same data to a single operation on the target resulting in much lower change volume
  • Delivery to MongoDB cloud and MongoDB API for Azure Cosmos DB
  • Delivery to Apache Cassandra, DataStax Cassandra, and Cassandra API for Azure Cosmos DB

  • Support for delivery of data in Parquet format to Cloud Storage and Cloud Data Lakes to further support cloud analytics environments

Schema Conversion to Simplify Cloud Adoption Workflows

As part of many cloud migration or cloud integration use-cases, especially during the initial phases, developers often need to create target schemas to match those of source data. Striim adds the capability to use source schema information from popular databases such as Oracle, SQL Server, and PostgreSQL and create appropriate target schema in cloud targets such as Google BigQuery, Snowflake and others. Importantly, these conversions understand data type and structure differences between heterogeneous sources and targets and act intelligently to spot problems and inconsistencies before progressing to data movement, simplifying cloud adoption.

Enhanced Monitoring, Alerting and Diagnostics

On-going data movement between on-premise and cloud environments for migrations, or powering reporting and analytics solutions, are often part of an enterprise’s critical applications. As such they demand deep insights into the status of all active data flows.

Striim 3.10.1 adds the capability to inherently monitor data from its creation in the source to successful delivery in a target, generate detailed lag reports, and alert on situations where lag is outside of SLAs.

End to End Lag Visualization

In addition, this release provides detailed status on checkpointing information for recovery and high availability scenarios, with insight into checkpointing history and currency.

Real-time Checkpointing Information

Simplifies Working with Complex Data

As customers work with heterogeneous environments and adopt more complex integration scenarios, they often have to work with complex data types, or perform necessary data conversions. While always possible through user defined functions, this release adds multiple commonly requested data manipulation functions out of the box. This simplifies working with JSON data and document structures, while also facilitating data cleansing, and regular expression operations.

On-Going Support for Enterprise Sources

As customers upgrade their environments, or adopt new technologies, it is essential that their integration platform keeps pace. In Striim 3.10.1 we extend our support for the Oracle database to include Oracle 19c, including change data capture, add support for schema information and metadata for Oracle GoldenGate trails, and certify our support for Hive 3.1.0

These are a high level view of the new features of Striim 3.10.1. There is a lot more to discover to aid on your cloud adoption journey. If you would like to learn more about the new release, please reach out to schedule a demo with a Striim expert.

Getting Started with Real-Time ETL to Azure SQL Database

Running production databases in the cloud has become the new norm. For us at Striim, real-time ETL to Azure SQL Database and other popular cloud databases has become a common use case. Striim customers run critical operational workloads in cloud databases and rely on our enterprise-grade streaming data pipelines to keep their cloud databases up-to-date with existing on-premises or cloud data sources.

Striim supports your cloud journey starting with the first step. In addition to powering fully-connected hybrid and multi-cloud architectures, the streaming data integration platform enables cloud adoption by minimizing risks and downtime during data migration. When you can migrate your data to the cloud without database downtime or data loss, it is easier to modernize your mission-critical systems. And when you liberate your data trapped in legacy databases and stream to Azure SQL DB in sub-seconds, you can run high-value, operational workloads in the cloud and drive business transformation faster.

Streaming Integration from Oracle to Azure SQL DBBuilding continuous, streaming data pipelines from on-premises databases to production cloud databases for critical workloads requires a secure, scalable, and reliable integration solution. Especially if you have enterprise database sources that cannot tolerate performance degradation, traditional batch ETL will not suffice. Striim’s low-impact change data capture (CDC) feature minimizes overhead on the source systems while moving database operations (inserts, updates, and deletes) to Azure SQL DB in real time with security, reliability, and transactional integrity.

Striim is available as a PaaS offering in major cloud marketplaces such as Microsoft Azure Cloud, AWS, and Google Cloud. You can run Striim in the Azure Cloud to simplify real-time ETL to Azure SQL Database and other Azure targets, such as Azure Synapse Analytics, Azure Cosmos DB, Event Hubs, ADLS, and more. The service includes heterogeneous data ingestion, enrichment, and transformation in a single solution before delivering the data to Azure services with sub-second latency. What users love about Striim is that it offers a non-intrusive, quick-to-deploy, and easy-to-iterate solution for streaming data integration into Azure.

To illustrate the ease of use of Striim and to help you get started with your cloud database integration project, we have prepared a Tech Guide: Getting Started with Real-Time Data Integration to Microsoft Azure SQL Database. You will find step-by-step instructions on how to move data from an on-premises Oracle Database to Azure SQL Database using Striim’s PaaS offering available in the Azure Marketplace. In this tutorial you will see how Striim’s log-based CDC enables a solution that doesn’t impact your source Oracle Database’s performance.

If you have, or plan to have, Azure SQL Databases that run operational workloads, I highly recommend that you use a free trial of Striim along with this tutorial to find out how fast you can set up enterprise-grade, real-time ETL to Azure SQL Database. On our website you can find additional tutorials for different cloud databases. So be sure to check out our other resources as well. For any streaming integration questions, please feel free to reach out.

Mitigating Data Migration and Integration Risks for Hybrid Cloud Architecture

 

Cloud computing has transformed how businesses use technology and drive innovation for improved outcomes. However, the journey to the cloud, which includes data migration from legacy systems, and integration of cloud solutions with existing systems, is not a trivial task. There are multiple cloud adoption risks that businesses need to mitigate to achieve the cloud’s full potential.

 

Common Risks in Data Migration and Integration to Cloud Environments

In addition to data security and privacy, there are additional concerns and risks in cloud migration and integration. These include:

Downtime: The bulk data loading technique, which takes a snapshot of the source database, requires you to lock the legacy database to preserve the consistent state. This translates to downtime and business disruption for your end users. While this disruption can be acceptable for some of your business systems, the mission-critical ones that need modernization are typically the ones that cannot tolerate even planned downtime. And sometimes, planned downtime extends beyond the expected duration, turning into unplanned downtime with detrimental effects on your business.

Data loss: Some data migration tools might lose or corrupt data in transit because of a process failure or network outage. Or they may fail to apply the data to the target system in the right transactional order. As a result, your cloud database ends up diverging from the legacy system, also negatively impacting your business operations.

Inadequate Testing: Many migration projects operate under tense time pressures to minimize downtime, which can lead to a rushed testing phase. When the new environment is not tested thoroughly, the end result can be an unstable cloud environment. Certainly, not the desired outcome when your goal is to take your business systems to the next level.

Stale Data: Many migration solutions focus on the “lift and shift” of existing systems to the cloud. While it is a critical part of cloud adoption, your journey does not end there. Having a reliable and secure data integration solution that keeps your cloud systems up-to-date with existing data sources is critical to maintaining your hybrid cloud or multi-cloud architecture. Working with outdated technologies can lead to stale data in the cloud and create delays, errors, and other inefficiencies for your operational workloads.

 

Upcoming Webinar on the Role of Streaming Data Integration for Data Migration and Integration to Cloud

Streaming data integration is a new approach to data integration that addresses the multifaceted challenges of cloud adoption. By combining bulk loading with real-time change data capture technologies, it minimizes downtime and risks mentioned above and enables reliable and continuous data flow after the migration.

Striim - Data Migration to Cloud

In our next live, interactive webinar, we dive into this particular topic; Cloud Adoption: How Streaming Data Integration Minimizes Risks. Our Co-Founder and CTO, Steve Wilkes, will present the practical ways you can mitigate the data migration risks and handle integration challenges for cloud environments. Striim’s Solution Architect, Edward Bell, will walk you through with a live demo of zero downtime data migration and continuous streaming integration to major cloud platforms, such as AWS, Azure, and Google Cloud.

I hope you can join this live, practical presentation on Thursday, May 7th 10:00 AM PT / 1:00 PM ET to learn more about how to:

  • Reduce migration downtime and data loss risks, as well as allow unlimited testing time of the new cloud environment.
  • Set up streaming data pipelines in just minutes to reliably support operational workloads in the cloud.
  • Handle strict security, reliability, and scalability requirements of your mission-critical systems with an enterprise-grade streaming data integration platform.

Until we see you at the webinar, and afterward, please feel free to reach out to get a customized Striim demo for data migration and integration to cloud to support your specific IT environment.

 

Streaming Data: The Nexus of Cloud-Modernized Analytics

 

 

On April 9th I am going to be having a conversation with Andrew Brust of GigaOm about the role of streaming integration in digital transformation initiatives, especially cloud modernization and real-time analytics. The format of this webinar is light on power-point, rich on lively discussion and interaction — so we hope you can join us.

Streaming Data: The Nexus of Cloud-Modernized Analytics

APR 9, 2020- 10:00 AM PDT/ 1:00 PM EDT

Digital transformation is the integration of digital technology into all areas of a business resulting in fundamental changes to how the businesses operate and how they deliver value to customers. Cloud has been the number one driving technology in a majority of such transformations. It could be you have a cloud-first strategy, with all new applications being built in the cloud, or you may need to migrate online databases without taking downtime. You may want to take advantage of cloud-scale for infinite data storage, coupled with machine learning to gain new insights and make proactive decisions.

In all cases, the key component is data. The data for your new applications, cloud analytics, or your data migration could originate on-premise, in another cloud or be generated from millions of IoT devices. It is essential that this data can be collected, processed, and delivered rapidly, reliably and at scale. This is why streaming data is the key major component of data modernization, and why streaming integration platforms are vital to the success of digital transformation initiatives.

In a modern data architecture, the goal is to harvest your existing data sources and enable your analysts and data scientists to provide value in the form of applications, visualizations, and alerts to your decision makers, customers, and partners.

In this webinar we will discuss the key aspects of this architecture, including the role of change data capture (CDC) and IoT technologies in data collection, options for data processing, and the differing requirements for data delivery. You will also learn how streaming integration platforms can be utilized for cloud modernization, large scale and stream analytics, and machine learning operationalization, in a reliable and scalable way.

I hope you can join us on April 9th, and see why streaming integration is the engine of data modernization for digital transformation.

 

Striim 3.9.8 Adds Advanced Security Features for Cloud Adoption

 

 

We are pleased to announce the general availability of Striim 3.9.8 with a rich set of features that span multiple areas, including advanced data security, enhanced development productivity, data accountability, performance and scalability, and extensibility with new data targets.

The new release brings many new features that are summarized here:

Let’s review the key themes and features of the new release starting with the security topic.

Advanced Platform and Adapter Security:

With a sharp focus on business-critical systems and use cases, the Striim team has been boosting the platform’s security features for the last several years. However, in version 3.9.8, we introduced a broad range of advanced security features to both the platform and its adapters to provide users with robust security for the end-to-end solution, and higher control for managing data security.

The new platform security features include the following components:

  • Striim KeyStore, which is a secured, centralized repository based on Java Keystore, for storing passwords and encryption keys, streamlines security management across the platform.
  • Ultra-secure algorithms for user password encryption across all parts of the platform reducing platform’s vulnerabilities to external or internal breaches.
  • Stronger encryption support for inter-node cluster communication with internally generated, long string password and unified security management for all nodes and agents.
  • Multi-layered application security via advanced support for exporting and importing pipeline applications within the platform. In Striim, all password properties of an application are encrypted using their own keys. When exporting applications containing passwords or other encrypted property values, you can now add a second level of encryption with a passphrase that will be required at the time of import, to strengthen the application security.
  • Encryption support using customer provided key for securing permanent files, via the File Writer, and for the intermediate temporary files via the Google Cloud Storage Writer. Supported encryption algorithm types include RSA, AES and PGP. You can generate keys for encrypting by multiple tools available online or using in house Java program and easily configure the encryption settings of the adapters via the Encryption policy property on the UI.

Overall, these new security features enable:

  • Enhanced platform and adapter security for hybrid cloud deployments and mission-critical environments
  • Strengthened end-to-end data protection from ingestion to file delivery
  • Enhanced compliance with strict security policies and regulations
  • Secured application sharing between platform users

Improved Data Accountability:

Striim version 3.9.8 includes an application-specific exception store for storing events discarded by the application, including discarded records. The feature allows viewing discarded records and their details in real time. You can configure this feature with a simple on/off option when building an application. With this feature, Striim improves its accountability for all data passing through the platform and allows users to build applications for replaying and processing discarded records.

Enhanced Application Development Support and Ease of Use

The new release also includes features that accelerate and ease developing integration applications, especially in high-volume data environments.

  • A New Enrichment Transformer: Expanding the existing library of out-of-the-box transformers, the new enrichment transformer function allows you to enrich your streaming data in-flight without any manual coding step. You only need Striim’s drag and drop UI to create a real-time data pipeline that performs in-memory data lookups. With this transformer, you can, for example, add City Name and County Name fields to an event containing Zip Code.

  • External Lookups: Striim provides an in-memory data cache to enrich data in-flight at very high speeds. With the new release, Striim gives you the option to enrich data with lookups from external data stores. The platform can now execute a database query to fetch data from an external database and return the data as a batch. The external lookup option helps users avoid preloading data in the Striim cache. This is especially beneficial for lookups from or joining with large data sets. External lookups also eliminate the need for a cache refresh since the data is fetched from the external database. The external lookups are supported for all major databases, including Oracle, SQL Server, MySQL, PostgreSQL, HPE NonStop.
  • The Option to Use Sample Data for Continuous Queries: With this ability, Striim reduces the data required for computation or displaying results via the dashboards. You can select to use only a portion of your streaming data for the application, if your use case can benefit from this approach. As a result, it increases the speed for computation and displaying the results, especially when working with very large data volumes.
  • Dynamic Output Names for Writers: The Striim platform makes it now easy to organize and consume the files and objects on the target system by giving flexible options for naming them. Striim file and object output names can include data, metadata, and user data field values from the source event. This dynamic output naming feature is available for the following targets: Azure Data Lake Store Gen 1 and Gen 2, Azure Blob Storage, Azure File Storage, Google Cloud Storage, Apache HDFS, Amazon S3.
  • Event-Augmented Kafka Message Header: Starting with Apache Kafka v11, Striim 3.9.8 introduced a new property called MessageHeader that enriches the Kafka message header with a mix of the event’s dynamic and static values before delivering with sub-second latency. With the help of the additional contextual information, downstream consumer application can rapidly determine how to use the messages arriving via Striim.
  • Simplified User Experience: The new UI for configuring complex adapter properties, such as rollover policy, flush policy, encryption policy, speeds new application development.

  • New sample application for real-time dashboards: Striim version 3.9.8 added a new sample dashboarding application that uses real-time data from meetup-website and displays in details of the meet-up events happening around the globe using demonstrates the Vector Map visualization.

Other platform improvements for ease of use and manageability include:

  • The Open Processor component, which allows bringing external code into the Striim platform, can be loaded and unloaded dynamically without having to restart Striim.
  • The Striim REST API allows safely deleting or post-processing the files processed by the Striim File Reader.
  • The Striim REST API for application monitoring reports consolidated statistics of various application components within a specified time range.

Increased Performance and Scalability:

For further improving performance and scalability, we have multiple features, including dynamic partitioning and performance fine-tuning for writers:

  • Dynamic Partitioning with Higher-Level of Control: Partitions allow parallel processing of the events in the stream by splitting them across multiple servers in the deployment. Striim’s partitioning distributes events dynamically at run-time across server nodes in a cluster and enables high performance and easy scalability. In prior releases, Striim used one or more fields of the events in the stream as key for partitioning. In the new release, users have additional, flexible options for distributing and processing large data volumes in streams or windows. Striim 3.9.8 allows partitioning key to be one or more expressions composed with the fields of the events in the stream. Striim’s flexible partitioning enables load-balancing applications that are deployed on multi-node clusters and process large data volumes. Windows-based partitioning enables grouping the events in windows that can, for example, be consumed by specific downstream writers. As a result, you can perform load-balancing across multiple writers to improve writing performance.
  • Writer Fine-Tuning Options: Striim 3.9.8 now offers the ability to configure the number of parallel threads for writing into the target system and simplifies writer configuration for achieving even higher throughput from the platform. The fine-tuning option is available for the following list of writers at this time: Azure Synapse Analytics and Azure SQL Data Warehouse, Google BigQuery, Google Cloud Spanner, Azure Cosmos DB, Apache HBase, Apache Kudu, MapR Database, Amazon Redshift, and Snowflake.

Increased Extensibility with New Data Targets

  • The Striim platform now supports SAP Hana as a target with direct integration. SAP Hana customers can now stream real-time data from a diverse set of sources into the platform with in-flight, in-memory data processing. With the availability of real-time data pipelines to SAP Hana, deployed on-premises or in the cloud, customers can rapidly develop time-sensitive analytics applications that transform their business operations.
  • Expanding the HTTP Reader capabilities to send custom responses back to the requestor. The HTTP Reader can now defer responding until events reach a corresponding HTTP Writer. This feature enables users to build REST services using Striim.

Other extensibility improvements are:

  • Improved support for handling special characters for table names in Oracle and SQL Server databases
  • Hazelcast Writer supports multi-column primary keys to enable more complex Hot Cache use cases
  • Performance improvement options for the SQL Server CDC Reader

These are only a portion of the new features of Striim 3.9.8. There is more to discover. If you would like to learn more about the new release, please reach out to schedule a demo with a Striim expert.

The Top 4 Use Cases for Streaming Data Integration: Whiteboard Wednesdays

Today we are talking about the top four use cases for streaming data integration. If you’re not familiar with streaming data integration, please check out our channel for a deeper dive into the technology. In this 7-minute video, let’s focus on the use cases.

Use Case #1 Cloud Adoption – Online Database Migration

The first one is cloud adoption – specifically online database migration. When you have your legacy database and you want to move it to the cloud and modernize your data infrastructure, if it’s a critical database, you don’t want to experience downtime. The streaming data integration solution helps with that. When you’re doing an initial load from the legacy system to the cloud, the Change Data Capture (CDC) feature captures all the new transactions happening in this database as it’s happening. Once this database is loaded and ready, all the changes that happened in the legacy database can be applied in the cloud. During the migration, your legacy system is open for transactions – you don’t have to pause it.

While the migration is happening, CDC helps you to keep these two databases continuously in-sync by moving the real-time data between the systems. Because the system is open to transactions, there is no business interruption. And if this technology is designed for both validating the delivery and checkpointing the systems, you will also not experience any data loss.

Because this cloud database has production data, is open to transactions, and is continuously updated, you can take your time to test it before you move your users. So you have basically unlimited testing time, which helps you minimize your risks during such a major transition. Once the system is completely in-sync and you have checked it and tested it, you can point your applications and run your cloud database.

This is a single switch-over scenario. But streaming data integration gives you the ability to move the data bi-directionally. You can have both systems open to transactions. Once you test this, you can run some of your users in the cloud and some of you users in the legacy database.

All the changes happening with these users can be moved between databases, synchronized so that they’re constantly in-sync. You can gradually move your users to the cloud database to further minimize your risk. Phased migration is a very popular use case, especially for mission-critical systems that cannot tolerate risk and downtime.

Cloud adoptionUse Case #2 Hybrid Cloud Architecture

Once you’re in the cloud and you have a hybrid cloud architecture, you need to maintain it. You need to connect it with the rest of your enterprise. It needs to be a natural extension of your data center. Continuous real-time data moment with streaming data integration allows you to have your cloud databases and services as part of your data center.

The important thing is that these workloads in the cloud can be operational workloads because there’s fresh information (ie, continuously updated information) available. Your databases, your machine data, your log files, your other cloud sources, messaging systems, and sensors can move continuously to enable operational workloads.

What do we see in hybrid cloud architectures? Heavy use of cloud analytics solutions. If you want operational reporting or operational intelligence, you want comprehensive data delivered continuously so that you can trust that’s up-to-date, and gain operational intelligence from your analytics solutions.

You can also connect your data sources with the messaging systems in the cloud to support event distribution for your new apps that you’re running in the cloud so that they are completely part of your data center. If you’re adopting multi-cloud solutions, you can again connect your new cloud systems with existing cloud systems, or send data to multiple cloud destinations.

Hybrid Cloud ArchitectureUse Case #3 Real-Time Modern Applications

A third use case is real-time modern applications. Cloud is a big trend right now, but not everything is necessarily in the cloud. You can have modern applications on-premises. So, if you’re building any real-time app and modern new system that needs timely information, you need to have continuous real-time data pipelines. Streaming data integration enables you run real-time apps with real-time data.

Use Case #4 Hot Cache

Last, but not least, when you have an in-memory data grid to help with your data retrieval performance, you need to make sure it is continuously up-to-date so that you can rely on that data – it’s something that users can depend on. If the source system is updated, but your cache is not updated, it can create business problems. By continuously moving real-time data using CDC technology, streaming data integration helps you to keep your data grid up-to-date. It can serve as your hot cache to support your business with fresh data.

 

To learn more about streaming data integration use cases, please visit our Products section, schedule a demo with a Striim expert, or download the Striim platform to get started.

 

What is Stream Data Integration?

According to Gartner, “SDI (stream data integration) implements a data pipeline to ingest, filter, transform, enrich and then store the data in a target database or file to be analyzed later.” 1 Further, “For SDI systems, the input event streams are a continuous, unbounded sequence of event records rather than a static snapshot of data at rest in a file or database. The streams are data ‘in motion.’” 1

Gartner-Stream Data Integration
Source: Gartner (March 2019)

Stream data integration ingests event data from across the organization and makes it available in real time to support data-driven decisions to improve customer experience, minimize fraud, and optimize operations and resource utilization. As event streams make up a substantial portion of the data used by the real-time applications and analytics programs that drive business decisions, the value of stream data integration is immense.

According to Gartner, “in our annual survey for the data integration tools market, 47% of organizations reported that they need streaming data to build a digital business platform, yet only 12% of those organizations reported that they currently integrate streaming data for their data and analytics requirements.” 1

At Striim, it is our belief that stream data integration is essential for you to successfully leverage next-generation infrastructures such as Cloud, advanced analytics/ML, real-time applications, and IoT analytics that make it possible to harness the value of event streams in their decision making. Failure to move away from traditional data integration practices to those technologies that support stream data integration can result in valuable opportunities being missed. Batch processing technologies such as ETL simply cannot meet the high volume and low latency requirements of real-time data streams.

Gartner-Stream Analytics
Source: Gartner (March 2019)

As stream data integration becomes a higher priority, you may wish to reconsider how your data management architecture can support your requirements. Research published by Gartner in March 2019 stated that, “By 2023, over 70% of organizations will use more than one data delivery style to support their data integration use cases, resulting in preference for tools than can support the combination of multiple data delivery styles (such as ETL and stream data integration).” 1

We designed the Striim platform specifically for stream data integration, to enable businesses to move to Cloud, easily build real-time applications that use real-time events, and get more operational value from their data. By providing up-to-date data in the format it is needed – on-prem or in the Cloud – Striim supports operational intelligence and other high-value operational workloads.

Striim captures real-time data from a wide variety of sources including databases (using low-impact change data capture), cloud applications, log files, IoT devices, and message queues. With the data is in motion, Striim applies filtering, transformations, aggregations, masking, and enrichment using static or streaming reference data. Users can perform SQL-based streaming analytics and visualize the data flow and the content of data in real time and receive verification of delivery.

The real-time data is then delivered in the required format to the targets including Cloud environments, Kafka and other messaging systems, Hadoop, relational and NoSQL databases, and flat files.

Cloud Adoption and Hybrid Cloud Architecture

As businesses adopt cloud services to modernize their IT environments and transform their business operations, continuous data flow between on-premises systems and cloud solutions becomes imperative. Without having up-to-date data in their cloud solutions, businesses cannot offload high-value, operational workloads, and consequently, restrict the scope of their business transformation. Striim enables streaming data pipelines to major cloud platforms to help seamlessly extend enterprise data centers to the cloud.

The solution also offers cloud-to-cloud integration as more and more businesses adopt multiple cloud vendors for different services. Also, as the initial and crucial step into the cloud journey, the same stream data integration technology enables data migration to cloud without interrupting business systems,. It minimizes risks by allowing thorough testing of the new system without time limitations.

Data Integration for Real-Time Applications

Striim enables users to develop stream data integration pipelines that support their real-time applications quickly and easily with a wizard-based UI and SQL-based language. Should it be required, and before the data is even delivered to the target, Striim can provide a visualization of the data and perform analytics on the data while it is in motion using SQL-based streaming analytics.

Real-Time Integration and Pre-Processing for Advanced Analytics and Machine Learning

Stream data integration from Striim enables users to leverage real-time data from a wide range of sources for operational intelligence solutions. Because the data is pre-processed in-flight to a consumable format, it speeds downstream applications and accelerates insight into operations. Stream data integration enables smart data architecture where only the necessary data is stored in the form that serves the end users.

Striim supports machine learning solutions by pre-processing and extracting suitable features before continuously delivering training files to your analytics environment. After you create ML models, you can bring them to Striim using the open processor component. By applying your ML logic to streaming events, you can gain real-time insights that guide daily operational decision making and truly transform your business. Striim can also monitor model fitness and trigger retraining of models for full automation.

To learn more about our stream data integration capabilities, please visit our Real-time Data Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started.

1 Gartner: Adopt Stream Data Integration to Meet Your Real-Time Data Integration and 
Analytics Requirements, 15 March 2019, Ehtisham Zaidi, W. Roy Schulte, Eric Thoo
Back to top