Striim Team

223 Posts

See it in action: Streaming SQL

Posted on July 7, 2022 by Striim Team | 1 min read | 4 views

Striim is built on a distributed, streaming SQL platform. Run continuous queries on streaming data, join streaming data with historical caches, and scale up to billions of events per minute.

How to Stream Data to Snowflake Using Change Data Capture

Posted on July 6, 2022 by Striim Team | 1 min read | 4 views

Data is the new oil, but it’s only useful if you can move, analyze, and act on it quickly. A Nucleus Research study shows that tactical data loses half its value 30 minutes after it’s generated, while operational data loses half its value after eight hours.

Change data capture (CDC) plays a vital role in the efforts to ensure that data in IT systems is quickly ingested, transformed, and used by analytics and other types of platforms. Striim is a unified data streaming and integration platform that offers non-intrusive, high-performance CDC from production databases to a wide range of targets.

In this live technical demo, we walk you through a use case where data is replicated from PostgreSQL to Snowflake in real time, using CDC. We also show examples of more complex use cases, including a data mesh with multiple data consumers.

The Modern Data Divide with Arpit Choudhury

Posted on June 29, 2022 by Striim Team | 1 min read | 4 views

We host Arpit Choudhury – well known for his work in building data communities such as Astorik.com – to talk about the ‘modern data divide’ and how to overcome friction between data people and non-data people. Arpit also talks about the value of ‘all in one’ tools versus having a multivariate modern data stack. Follow Arpit Choudhury on Linkedin and check out his community of data practitioners at Astorik.com

Oracle Change Data Capture to Databricks

Posted on June 27, 2022 by Striim Team | 7 min read | 4 views

Tutorial

Oracle Change Data Capture to Databricks

Benefits

Migrate your database data and schemas to Databricks in minutes.

Stream operational data from Oracle to your data lake in real-time

Automatically keep schemas and models in sync with your operational database.

On this page

We will go over two ways on how to create smart pipelines to stream data from Oracle to Databricks. Striim also offers streaming integration from popular databases such as PostgreSQL, SQLServer, MongoDB, MySQL, and applications such as Salesforce to Databricks Delta Lake.

In the first half of the demo, we will be focusing on how to move historical data for migration use cases, which are becoming more and more common as many users start moving from traditional on-prem to cloud hosted services.

Striim is also proud to offer the industry’s fastest and most scalable Oracle change data capture to address the most critical use cases.

Striim makes initial load, schema conversion, and change data capture a seamless experience for data engineers.

In a traditional pipeline approach, there are times we would have to manually create the schema either through code or infer the schema from the csv file etc.

And next, configure the connectivity parameters for the source and target.

Striim offers the ability to reduce the amount of time and manual effort when it comes to setting up these connections and also creates the schema at the target with the help of a simple wizard.

Here we have a view of the databricks homepage with no schema or table created in the DBFS.

In the Striim UI, under the ‘Create app’ option, we can choose from templates offered for a wide array of data sources and targets.

With our most recent 4.1 release, we have also support the Delta Lake adapter as a Target datasink.

Part 1: Initial Load and Schema Creation

In this demo, we will be going over on how to move historical data from Oracle to Databrick’s Delta lake.

With the help of Striim’s Intuitive Wizard we name the application,
With the added option to create multiple namespaces depending on our pipelines needs and requirements
First we configure the source details for the Oracle Database.
We can validate our connection details
Next we have to option to choose the schemas and tables that we specifically want to move, providing us with more flexibility instead of replicating the entire database or schema.
Now we can start to configure our target Delta Lake.
Which supports ACID transactions, scalable metadata handling, and unifies streaming and batch data processing.
Striim has the capability to migrate schemas too as part of the wizard which makes it very seamless and easy.
The wizard takes care of validating the target connections, using the oracle metadata to create schema in the target and initiate the historical data push to delta lake as well.
Making the whole end to end operation finish in less then a fraction of the time it would take with traditional pipelines.

Once the schema is created, we can also verify it before we go ahead with the migration to Delta lake
Striim’s unified data integration provides unprecedented speed and simplicity which we have just observed on how simple it was to connect a source and target.
In case, we want to make additional changes to the Fetch size, provide a custom Query. The second half of the demo highlights , how we can apply those changes without the wizard.
We can Monitor the progress of the job with detailed metrics which would help with the data governance to ensure data has been replicated appropriately.

Part 2: Change Data Capture

As part of our second demo, we will be highlighting Striim’s Change data Capture that helps drive Digital transformation and leverage true real time analytics.

Earlier we have gone through how to create a pipeline through the wizard, and Now we will have a look at how we can tune our pipeline without the wizard and use the intuitive drag and drop flow design

From the Striim dashboard , we can navigate the same way as earlier to create An Application from scratch or also import a TQL file if we already have a pipeline created.
From the search bar, we can search for the oracle CDC adapter. The UI is super friendly with an easy drag and drop approach.
We can skip the wizard if we want and go ahead and enter the connection parameters like earlier.
In the additional parameters, we have the flexibility to make any changes to the data we pull from the source.
Lastly, we can create an output stream that will connect to the data sink

We can test connections and validate our connections even without deploying the app or pipeline.
Once the source connection is established , we can connect to a target component, and select the delta Lake adapter from the drop down.
Databricks has a unified approach to its design that allows us to bridge the gap between different types of users ranging from Analysts, Data Scientists, and Machine Learning Engineers.
From the Databricks dashboard, we can navigate to the Compute section to access the cluster’s connection parameters.
Under the advanced settings, select the JDBC/ODBC settings to view the cluster’s Hostname and JDBC URL.
Next, we can go ahead and generate a Personal access token that will be used to authenticate the user’s access to DatabricksFrom the settings, we can navigate to the user’s settings and click on Generate a new token.
After adding the required parameters, we can go ahead and create the directory in DBFS through the following commands in a notebook
Next, we can go ahead and deploy the app and start the flow to initiate the CDC.
We can refresh Databricks to view the CDC data, Striim allows us to view the detailed metrics of a pipeline in real-time.

Tools you need

Striim

Striim’s unified data integration and streaming platform connects clouds, data and applications.

Striim_Partner_Databricks_color

Databricks

Databricks combines data warehouse and Data lake into a Lakehouse architecture

Oracle

Oracle is a multi-model relational database management system.

Delta Lake

Delta Lake is an open-source storage framework that supports building a lakehouse architecture

Conclusion

Managing large-scale data is a challenge for every enterprise. Real-time, integrated data is a requirement to stay competitive, but modernizing your data architecture can be an overwhelming task.

Striim can handle the volume, complexity, and velocity of enterprise data by connecting legacy systems to modern cloud applications on a scalable platform. Our customers don’t have to pause operations to migrate data or juggle different tools for every data source—they simply connect legacy systems to newer cloud applications and get data streaming in a few clicks.

Seamless integrations. Near-perfect performance. Data up to the moment. That’s what embracing complexity without sacrificing performance looks like to an enterprise with a modern data stack.

Use cases

Integrating Striim’s CDC capabilities with Databricks makes it very easy to rapidly expand the capabilities of a Lakehouse with just a few clicks.

Striim’s additional components allow not only to capture real-time data, but also apply transformations on the fly before it even lands in the staging zone, thereby reducing the amount of data cleansing that is required.

The wide array of Striim’s event transformers makes it as seamless as possible with handling any type of sensitive data allowing users to maintain compliance norms on various levels.

Allow high-quality data into Databricks which can then be transformed via Spark code and loaded into Databrick’s new services such as Delta Live tables.

How to Stream Data to Google Cloud with Striim

Posted on June 22, 2022 by Striim Team | 1 min read | 4 views

The move to Google Cloud is an attractive path for data modernization and for achieving a solid foundation for digital transformation. Real-time data integration allows you to run high-value workloads in the cloud and reap the full benefits of your cloud environment to improve your business operations and embrace innovation. As with adopting any new technology, there is complexity in the move and a number of things to consider, especially when dealing with mission-critical systems.

In this on-demand technical demo, Fahad Ansari and Srdan Dvanajscak show you how to stream data from an Oracle database to Google BigQuery and other Google Cloud targets with Striim. They demonstrate how Striim enables you to:

Ingest data from in-production sources with negligible impact
Make your operational data available immediately for applications and services on the Google Cloud
Process and analyze in-flight data using SQL queries and UI-based operators

Migrating from MySQL to BigQuery for Real-Time Data Analytics

Posted on June 20, 2022 by Striim Team | 8 min read | 4 views

Tutorial

Migrating from MySQL to BigQuery for Real-Time Data Analytics

How to replicate and synchronize your data from on-premises MySQL to BigQuery using change data capture CDC)

Benefits

Operational AnalyticsAnalyze your data in real-time without impacting the performance of your operational database.Act in Real TimePredict, automate, and react to business events as they happen, not minutes or hours later.Empower Your TeamsGive teams across your organization a real-time view into operational data
On this page

Overview

In this post, we will walk through an example of how to replicate and synchronize your data from on-premises MySQL to BigQuery using change data capture (CDC).

Data warehouses have traditionally been on-premises services that required data to be transferred using batch load methods. Ingesting, storing, and manipulating data with cloud data services like Google BigQuery makes the whole process easier and more cost effective, provided that you can get your data in efficiently.

Striim real-time data integration platform allows you to move data in real-time as changes are being recorded using a technology called change data capture. This allows you to build real-time analytics and machine learning capabilities from your on-premises datasets with minimal impact.

Step 1: Source MySQL Database

Before you set up the Striim platform to synchronize your data from MySQL to BigQuery, let’s take a look at the source database and prepare the corresponding database structure in BigQuery. For this example, I am using a local MySQL database with a simple purchases table to simulate a financial datastore that we want to ingest from MySQL to BigQuery for analytics and reporting.

I’ve loaded a number of initial records into this table and have a script to apply additional records once Striim has been configured to show how it picks up the changes automatically in real time.

Step 2: Targeting Google BigQuery

You also need to make sure your instance of BigQuery has been set up to mirror the source or the on-premises data structure. There are a few ways to do this, but because you are using a small table structure, you are going to set this up using the Google Cloud Console interface. Open the Google Cloud Console, and select a project, or create a new one. You can now select BigQuery from the available cloud services. Create a new dataset to hold the incoming data from the MySQL database.

Once the dataset has been created, you also need to create a table structure. Striim can perform the transformations while the data flies through the synchronization process. However, to make things a little easier here, I have replicated the same structure as the on-premises data source.

You will also need a service account to allow your Striim application to access BigQuery. Open the service account option through the IAM window in the Google Cloud Console and create a new service account. Give the necessary permissions for the service account by assigning BigQuery Owner and Admin roles and download the service account key to a JSON file.

Step 3: Set Up the Striim Application

Now you have your data in a table in the on-premises MySQL database and have a corresponding empty table with the same fields in BigQuery. Let’s now set up a Striim application on Google Cloud Platform for the migration service.

Open your Google Cloud Console and open or start a new project. Go to the marketplace and search for Striim. A number of options should return, but the option you are after is the first item that allows integration of real-time data to Google Cloud services.

Select this option and start the deployment process. For this tutorial, you are just using the defaults for the Striim server. In production, you would need to size appropriately depending on your load.

Click the deploy button at the bottom of this screen and start the deployment process.

Once this deployment has finished, the details of the server and the Striim application will be generated.

Before you open the admin site, you will need to add a few files to the Striim Virtual Machine. Open the SSH console to the machine and copy the JSON file with the service account key to a location Striim can access. I used /opt/striim/conf/servicekey.json.

You also need to restart the Striim services for these setting and changes to take effect. The easiest way to do this is to restart the VM.

Give these files the right permissions by running the following commands:

chown striim:striim

chmod 770

You also need to restart the Striim services for this to take effect. The easiest way to do this is to restart the VM.

Once this is done, close the shell and click on the Visit The Site button to open the Striim admin portal.

Before you can use Striim, you will need to configure some basic details. Register your details and enter in the Cluster name (I used “DemoCluster”) and password, as well as an admin password. Leave the license field blank to get a trial license if you don’t have a license, then wait for the installation to finish.

When you get to the home screen for Striim, you will see three options. Let’s start by creating an app to connect your on-premises database with BigQuery to perform the initial load of data. To create this application, you will need to start from scratch from the applications area. Give your application a name and you will be presented with a blank canvas.

The first step is to read data from MySQL, so drag a database reader from the sources tab on the left. Double-click on the database reader to set the connection string with a JDBC-style URL using the template:

jdbc:mysql://:/

You must also specify the tables to synchronize — for this example, purchases — as this allows you to restrict what is synchronized.

Finally, create a new output. I called mine PurchasesDataStream.

You also need to connect your BigQuery instance to your source. Drag a BigQuery writer from the targets tab on the left. Double-click on the writer and select the input stream from the previous step and specify the location of the service account key. Finally, map the source and target tables together using the form:

.,.

For this use case this is just a single table on each side.

Once both the source and target connectors have been configured, deploy and start the application to begin the initial load process. Once the application is deployed and running, you can use the monitor menu option on the top left of the screen to watch the progress.

Because this example contains a small data load, the initial load application finishes pretty quickly. You can now stop this initial load application and move on to the synchronization.

Step 4: Updating BigQuery with Change Data Capture

Striim has pushed your current database up into BigQuery, but ideally you want to update this every time the on-premises database changes. This is where the change data capture application comes into play.

Go back to the applications screen in Striim and create a new application from a template. Find and select the MySQL CDC to BigQuery option.

Like the first application, you need to configure the details for your on-premises MySQL source. Use the same basic settings as before. However, this time the wizard adds the JDBC component to the connection URL.

When you click Next, Striim will ensure that it can connect to the local source. Striim will retrieve all the tables from the source. Select the tables you want to sync. For this example, it’s just the purchases table.

Once the local tables are mapped, you need to connect to the BigQuery target. Again, you can use the same settings as before by specifying the same service key JSON file, table mapping, and GCP Project ID.

Once the setup of the application is complete, you can deploy and turn on the synchronization application. This will monitor the on-premises database for any changes, then synchronize them into BigQuery.

Let’s see this in action by clicking on the monitor button again and loading some data into your on-premises database. As the data loads, you will see the transactions being processed by Striim.

Next Step

As you can see, Striim makes it easy for you to synchronize your on-premises data from existing databases, such as MySQL, to BigQuery. By constantly moving your data into BigQuery, you could now start building analytics or machine learning models on top, all with minimal impact to your current systems. You could also start ingesting and normalizing more datasets with Striim to fully take advantage of your data when combined with the power of BigQuery.

To learn more about Striim for Google BigQuery, check out the related product page. Striim is not limited to MySQL to BigQuery integration, and supports many different sources and targets. To see how Striim can help with your move to cloud-based services, schedule a demo with a Striim technologist or download a free trial of the platform.

Tools you need

Striim

Striim’s unified data integration and streaming platform connects clouds, data and applications.

MySQL

MySQL is an open-source relational database management system.

Snowflake

Snowflake is a cloud-native relational data warehouse that offers flexible and scalable architecture for storage, compute and cloud services.

Snowflake Summit & Data In a Recession with Matt Turck

Posted on June 17, 2022 by Striim Team | 1 min read | 4 views

Matt Turck from FirstMark Capital joins us on a live episode of ‘What’s New In Data’ from Snowflake Summit. We recap the summit, talk about the state of the data industry, and look ahead to how a potential recession will play a role.

Unlock Insights to Your Data with Azure Synapse Link

Posted on June 16, 2022 by Striim Team | 1 min read | 4 views

Edward Bell from Striim and Mahesh Prakriya from Microsoft demonstrate the value of Striim and Azure Synapse for Oracle and Salesforce users.

How to Choose the Right Change Data Capture Solution

Posted on June 14, 2022 by Striim Team | 1 min read | 4 views

An introduction to CDC (and pros and cons of different CDC methods) | 5 reasons organizations need CDC | Key features to consider in your CDC solution

Introducing Striim Platform 4.1

Posted on June 14, 2022 by Striim Team | 1 min read | 4 views

Striim is pleased to introduce a broad set of enhancements to our on-premise and cloud marketplace offerings that add additional sources and targets, provide increased manageability, and further enhance performance.