Katherine Rincon

19 Posts

What is Stream Data Integration?

Posted on January 2, 2020 by Katherine Rincon | 5 min read | 3 views

According to Gartner, “SDI (stream data integration) implements a data pipeline to ingest, filter, transform, enrich and then store the data in a target database or file to be analyzed later.”¹ Further, “For SDI systems, the input event streams are a continuous, unbounded sequence of event records rather than a static snapshot of data at rest in a file or database. The streams are data ‘in motion.’”¹

Gartner-Stream Data Integration — Source: Gartner (March 2019)

Stream data integration ingests event data from across the organization and makes it available in real time to support data-driven decisions to improve customer experience, minimize fraud, and optimize operations and resource utilization. As event streams make up a substantial portion of the data used by the real-time applications and analytics programs that drive business decisions, the value of stream data integration is immense.

According to Gartner, “in our annual survey for the data integration tools market, 47% of organizations reported that they need streaming data to build a digital business platform, yet only 12% of those organizations reported that they currently integrate streaming data for their data and analytics requirements.”¹

At Striim, it is our belief that stream data integration is essential for you to successfully leverage next-generation infrastructures such as Cloud, advanced analytics/ML, real-time applications, and IoT analytics that make it possible to harness the value of event streams in their decision making. Failure to move away from traditional data integration practices to those technologies that support stream data integration can result in valuable opportunities being missed. Batch processing technologies such as ETL simply cannot meet the high volume and low latency requirements of real-time data streams.

Gartner-Stream Analytics — Source: Gartner (March 2019)

As stream data integration becomes a higher priority, you may wish to reconsider how your data management architecture can support your requirements. Research published by Gartner in March 2019 stated that, “By 2023, over 70% of organizations will use more than one data delivery style to support their data integration use cases, resulting in preference for tools than can support the combination of multiple data delivery styles (such as ETL and stream data integration).”¹

We designed the Striim platform specifically for stream data integration, to enable businesses to move to Cloud, easily build real-time applications that use real-time events, and get more operational value from their data. By providing up-to-date data in the format it is needed – on-prem or in the Cloud – Striim supports operational intelligence and other high-value operational workloads.

Striim captures real-time data from a wide variety of sources including databases (using low-impact change data capture), cloud applications, log files, IoT devices, and message queues. With the data is in motion, Striim applies filtering, transformations, aggregations, masking, and enrichment using static or streaming reference data. Users can perform SQL-based streaming analytics and visualize the data flow and the content of data in real time and receive verification of delivery.

The real-time data is then delivered in the required format to the targets including Cloud environments, Kafka and other messaging systems, Hadoop, relational and NoSQL databases, and flat files.

Cloud Adoption and Hybrid Cloud Architecture

As businesses adopt cloud services to modernize their IT environments and transform their business operations, continuous data flow between on-premises systems and cloud solutions becomes imperative. Without having up-to-date data in their cloud solutions, businesses cannot offload high-value, operational workloads, and consequently, restrict the scope of their business transformation. Striim enables streaming data pipelines to major cloud platforms to help seamlessly extend enterprise data centers to the cloud.

The solution also offers cloud-to-cloud integration as more and more businesses adopt multiple cloud vendors for different services. Also, as the initial and crucial step into the cloud journey, the same stream data integration technology enables data migration to cloud without interrupting business systems,. It minimizes risks by allowing thorough testing of the new system without time limitations.

Data Integration for Real-Time Applications

Striim enables users to develop stream data integration pipelines that support their real-time applications quickly and easily with a wizard-based UI and SQL-based language. Should it be required, and before the data is even delivered to the target, Striim can provide a visualization of the data and perform analytics on the data while it is in motion using SQL-based streaming analytics.

Real-Time Integration and Pre-Processing for Advanced Analytics and Machine Learning

Stream data integration from Striim enables users to leverage real-time data from a wide range of sources for operational intelligence solutions. Because the data is pre-processed in-flight to a consumable format, it speeds downstream applications and accelerates insight into operations. Stream data integration enables smart data architecture where only the necessary data is stored in the form that serves the end users.

Striim supports machine learning solutions by pre-processing and extracting suitable features before continuously delivering training files to your analytics environment. After you create ML models, you can bring them to Striim using the open processor component. By applying your ML logic to streaming events, you can gain real-time insights that guide daily operational decision making and truly transform your business. Striim can also monitor model fitness and trigger retraining of models for full automation.

To learn more about our stream data integration capabilities, please visit our Real-time Data Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started.

¹ Gartner: Adopt Stream Data Integration to Meet Your Real-Time Data Integration and 
Analytics Requirements, 15 March 2019, Ehtisham Zaidi, W. Roy Schulte, Eric Thoo

How to Migrate Oracle Database to Google Cloud SQL for PostgreSQL with Streaming Data Integration

Posted on September 17, 2019 by Katherine Rincon | 4 min read | 3 views

For those who need to migrate an Oracle database to Google Cloud, the ability to move mission-critical data in real-time between on-premises and cloud environments without either database downtime or data loss data is paramount. In this video Alok Pareek, Founder and EVP of Products at Striim demonstrates how the Striim platform enables Google Cloud users to build streaming data pipelines from their on-premises databases into their Cloud SQL environment with reliability, security, and scalability. The full 8-minute video is available to watch below:

Easy to Use

Striim offers an easy-to-use platform that maximizes the value gained from cloud initiatives; including cloud adoption, hybrid cloud data integration, and in-memory stream processing. This demonstration illustrates how Striim feeds real-time data from mission-critical applications from a variety of on-prem and cloud-based sources to Google Cloud without interruption of critical business operations.

Visualize Your Data

Through different interactive views, Striim users can develop Apps to build data pipelines to Google Cloud, create custom Dashboards to visualize their data, and Preview the Source data as it streams to ensure they’re getting the data they need. For this demonstration, Apps is the starting point from which to build the data pipeline.

There are two critical phases in this zero-downtime data migration scenario. The first involves the initial load of data from the on-premise Oracle database into the Cloud SQL Postgres database. The second is the synchronization phase, achieved through specialized readers to keep the source and target consistent.

Oracle database to Google Cloud — Striim Flow Designer

The pipeline from the source to the target is built using a flow designer that easily creates and modifies streaming data pipelines. The data can also be transformed while in motion, to be realigned or delivered in a different format. Through the interface, the properties of the Oracle database can also be configured – allowing users extensive flexibility in how the data is moved.

Once the application is started, the data can be previewed, and progress monitored. While in-motion, data can be filtered, transformed, aggregated, enriched, and analyzed before delivery. With up-to-the-second visibility of the data pipeline, users can quickly and easily verify the ingestion, processing, and delivery of their streaming data.

During the time of initial load, the source data in the database is continually changing. Striim keeps the Cloud SQL Postgres database up-to-date with the on-premises Oracle database using change data capture (CDC). By reading the database transactions in the Oracle redo logs, Striim collects the insert, update, and delete operations as soon as the transactions commit, and makes only the changes to the target, This is done without impacting the performance of source systems, while avoiding any outage to the production database.

By generating DML activity using a simulator, the demonstration shows how inserts, updates, and deletes are managed. Running DMLS operations against the orders table, the preview shows not only the data being captured, but also metadata including the transaction ID, the system commit number, the table name, and the operation type. When you log into the orders table, the data is present in the table.

The initial upload of data from the source to the target, followed by change data capture to ensure source and target remain in-sync, allows businesses to move data from on-premises databases into Google Cloud with the peace of mind that there will be no data loss and no interruption of mission-critical applications.

Additional Resources

To learn more about Striim’s capabilities to support the data integration requirements for a Google hybrid cloud architecture, check out all of Striim’s solutions for Google Cloud Platform.

To read more about real-time data integration, please visit our Real-Time Data Integration solutions page.

To learn more about how Striim can help you migrate Oracle database to Google Cloud, we invite you to schedule a demo with a Striim technologist.

Microsoft SQL Server CDC to Kafka

Posted on June 11, 2019 by Katherine Rincon | 3 min read | 3 views

By delivering high volumes of data using Microsoft SQL Server CDC to Kafka, organizations gain visibility of their business and the vital context needed for timely operational decision making. Getting maximum value from Kafka solutions requires ingesting data from a wide variety of sources – in real time – and delivering it to users and applications that need it to take informed action to support the business.

Traditional methods used to move data, such as ETL, are just not sufficient to support high-volume, high-velocity data environments. These approaches delay getting data to where it can be of real value to the organization. Moving all the data, regardless of relevance, to the target creates challenges in storing it and getting actionable data to the applications and users that need it. Microsoft SQL Server CDC to Kafka minimizes latency and prepares data so it is delivered in the correct format for different consumers to utilize.

In most cases, the data that resides in transactional databases like Microsoft SQL Server is the most valuable to the organization. The data is constantly changing reflecting every event or transaction that occurs. Using non-intrusive, low-impact change data capture (CDC) the Striim platform moves and processes only the changed data. With Microsoft SQL Server CDC to Kafka users manage their data integration processes more efficiently and in real time.

Using a drag-and-drop UI and pre-built wizards, Striim simplifies creating data flows for Microsoft SQL Server CDC to Kafka. Depending on the requirements of users, the data can either be delivered “as-is,” or in-flight processing can filter, transform, aggregate, mask, and enrich the data. This delivers the data in the format needed with all the relevant context to meet the needs of different Kafka consumers –with sub-second latency.

Striim is an end-to-end platform that delivers the security, recoverability, reliability (including exactly once processing), and scalability required by an enterprise-grade solution. Built-in monitoring also compares sources and targets and validates that all data has been delivered successfully.

In addition to Microsoft SQL Server CDC to Kafka, Striim offers non-intrusive change data capture (CDC) solutions for a range of enterprise databases including Oracle, Microsoft SQL Server, PostgreSQL, MongoDB, HPE NonStop SQL/MX, HPE NonStop SQL/MP, HPE NonStop Enscribe, and MariaDB.

For more information about how to use Microsoft SQL Server CDC to Kafka to maintain real-time pipelines for continuous data movement, please visit our Change Data Capture solutions page.

If you would like a demo of how Microsoft SQL Server CDC to Kafka works and to talk to one of our technologists, please contact us to schedule a demo.

Real-Time Data Ingestion – What Is It and Why Does It Matter?

Posted on June 4, 2019 by Katherine Rincon | 2 min read | 3 views

The integration and analysis of data from both on-premises and cloud environments give an organization a deeper understanding of the state of their business. Real-time data ingestion for analytical or transactional processing enables businesses to make timely operational decisions that are critical to the success of the organization – while the data is still current.

Transactional and operational data contain valuable insights that drive informed and appropriate actions. Achieving visibility into business operations in real time allows organizations to identify and act on opportunities and address situations where improvements are needed. Real-time data ingestion to feed powerful analytics solutions demands the movement of high volumes of data from diverse sources without impacting source systems and with sub-second latency.

Using traditional batch methods to move the data introduces unwelcome delays. By the time the data is collected and delivered it is already out of date and cannot support real-time operational decision making. Real-time data ingestion is a critical step in the collection and delivery of volumes of high-velocity data – in a wide range of formats – in the timeframe necessary for organizations to optimize their value.

The Striim platform enables the continuous movement of structured, semi-structured, and unstructured data – extracting it from a wide range of sources and delivering it to cloud and on-premises endpoints – in real time and available immediately to users and applications.

The Striim platform supports real-time data ingestion from sources including databases, log files, sensors, and message queues and delivery to targets that include Big Data, Cloud, Transactional Databases, Files, and Messaging Systems. Using non-intrusive Change Data Capture (CDC) Striim reads new database transactions from source databases’ transaction or redo logs and moves only the changed data without impacting the database workload.

Real-time data ingestion is critical to accessing data that delivers significant value to a business. With clear visibility into the organization, based on data that is current and comprehensive, organizations can make more informed operational decisions faster.

To read more about real-time data ingestion, please visit our Real-Time Data Integration solutions page.

To have one of our experts guide you through a brief demo of our real-time data ingestion offering, please schedule a demo.

Kafka to MySQL

Posted on May 29, 2019 by Katherine Rincon | 3 min read | 3 views

The scalable and reliable delivery of high volumes of Kafka data to enterprise targets via real-time Kafka integration gives organizations current and relevant information about their business. Loading data from Kafka to MySQL enables organizations run rich custom queries on data enhanced with pub/sub messaging data to make key operational decisions in the timeframe for them to be most effective.

To get optimal value from the rich messaging data generated by CRM, ERP, and e-commerce applications, large data sets need to be delivered from Kafka to MySQL with sub-second latency. Integrating data from Kafka to MySQL enhances transactional data – providing greater understanding of the state of operations. With access to this data, users and applications have the context to make decisions and take essential and timely action to support the business.

Using traditional batch-based approaches to the movement of data from Kafka to MySQL creates an unacceptable bottleneck – delaying the delivery of data to where it can be of real value to the organization. This latency limits the potential for this data to make critical operational decisions that enhance customer experiences, optimize processes, and drive revenue.

ETL methods move the data “as is” – without any pre-processing. However, depending on the requirements not all the data may be needed and the data that is necessary may need to be augmented with other data to make it useful. Ingesting high volumes of raw data creates additional challenges when it comes to storage and getting high value actionable data to users and applications.

Building real time data pipelines from Kafka to MySQL, Striim allows users to minimize latency and support their high-volume, high-velocity data environments. Striim offers real time data ingestion with in-flight processing including filtering, transformations, aggregations, masking, and enrichment to deliver relevant data from Kafka to MySQL in the right format and with full context.

Striim also includes built-in security, delivery validation, and additional features to essential for the scalability and reliability requirements of mission-critical applications. Real time pipeline monitoring detects any patterns or anomalies as the data is moving from Kafka to MySQL. Interactive dashboards provide visibility into the health of the data pipelines and highlight issues with instantaneous alerts – allowing for timely corrective action to be taken on the results of comprehensive pattern matching, correlation, outlier detection, and predictive analytics.

For more information about gaining timely intelligence from integrating high volumes of rich messaging data from Kafka to MySQL, please visit our Kafka integration page at: https://www.striim.com/blog/kafka-stream-processing-with-striim/

If you would like a demo of real time data integration from Kafka to MySQL, and to talk to one of our experts, please contact us to schedule a demo.

Data Pipeline to Cloud

Posted on May 10, 2019 by Katherine Rincon | 2 min read | 3 views

Building a streaming data pipeline to cloud services is essential to moving enterprise datain real time between on-premises and cloud environments.

Extending data infrastructure to hybrid and multi-cloud architectures enables businesses to scale easily and leverage a variety of powerful cloud-based services. Data must be a key consideration when migrating applications to the cloud, to ensure that services have access to the data they need, when they need it, and in the format required.

Although adopting a cloud architecture offers significant benefits in terms of savings and flexibility, it also creates challenges in managing data across different locations. Using traditional approaches to data movement introduces latency for applications that demand up-to-the-second information. Batch ETL methods are also constrained by the number of sources and targets that can be supported.

The Striim platform simplifies the building of a streaming data pipeline to cloud, allowing organizations to leverage fully connected hybrid cloud environments across a variety of use cases. Examples include offloading operational workloads, and extending a data center to the cloud, as well as gaining insights from cloud-based analytics.

Taking advantage of Striim’s easy-to-use wizards to build and modify a highly reliable and scalable data pipeline to cloud environments, data can be moved continuously and in real time from heterogenous on-premises or cloud-based sources – including transactional databases, log files, sensors, Kafka, Hadoop, and NoSQL databases – without slowing down source systems. Using non-intrusive, real-time change data capture (CDC) ensures continuous data synchronization by moving and processing only changed data.

Striim feeds real-time data with full-context via the data pipeline to cloud and other targets, processing and formatting it in-memory. Filtering, transforming, aggregating, enriching, and analyzing data all occurs while the data is in-flight, before delivery of the relevant data sets to multiple endpoints.

Built-in data pipeline to cloud monitoring via interactive dashboards and real-time alerts allows users to visualize the data flow and the content of datain real time. With up-to-the-second visibility of the data pipeline to cloud infrastructure, users can quickly and easily verify the ingestion, processing, and delivery of their streaming data.

To read more about building a real time data pipeline to cloud using Striim, please go to: https://www.striim.com/use-case/real-time-analytics/

If you would like to see how a data pipeline to cloud is built, please schedule a demo with one of our technologists.

What is Streaming SQL?esdfgv

Posted on April 19, 2019 by Katherine Rincon | 3 min read | 3 views

Streaming SQL has become essential to real-world, real-time data processing solutions. But before examining what it is and how it works, we need to take a brief look back.

With the continuous and staggering growth of data volumes over the years, and the rising demands for analysis of data, Structured Query Language, or SQL, has become an essential component of data management and business analytics.

Because databases store data before it’s available for querying, however, this data is invariably old by the time it’s queried. Today, many organizations need to analyze data in real-time, which requires the data to be streamed. As a result of this shift, there’s a need for a new version of SQL that supports stream processing.

Enter Streaming SQL. Streaming SQL is similar to the older version of SQL, but it differs in how it addresses stored and real-time data. Streaming SQL platforms are continuously receiving flows of data. It’s this continuous nature of streaming that gives the technology its true value compared with traditional SQL solutions.

A key part of streaming SQL are windows and event tables, which trigger actions when any kind of change occurs with the data. When a window is updated, aggregate queries recalculate, and this provides results such as sums over micro-batches.

Streaming systems allow organizations to input huge volumes of data—including reference, context, or historical data—into event tables from files, databases, and various other sources. These tools enable users to write SQL-like queries for streaming data without the need to write code.

With Streaming SQL, queries are often highly complex, using case statements and pattern-matching syntax. These solutions make it easy for organizations to ingest, process, and deliver real-time data across a variety of environments—whether they are in the cloud or on-premises.

This helps enterprises quickly adopt a modern data architecture, creating streaming data pipelines to public cloud environments such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform, as well as to Kafka, Hadoop, NoSQL, and relational databases.

It’s important to realize that Streaming SQL is not something that should be used to run on all data, such as massive databases with a billion rows. That’s not what it’s designed for. It’s better suited for working on smaller subsets of data, when there is a need to get quick results and immediately identify value in new data that’s being created.

One of the strengths of Streaming SQL comes from its ability to transform, filter, aggregate, and enrich data. It has the ability to combine all these functions together to enable organizations to get maximum value from the data constantly streaming into their systems.

To learn more about the power of streaming SQL, visit Striim Platform Overview product page, schedule a demo with a Striim technologist, or download a free trial of the platform and try it for yourself!

Kafka to HDFS

Posted on April 11, 2019 by Katherine Rincon | 2 min read | 2 views

The real-time integration of messaging data from Kafka to HDFS augments transactional data for richer context. This allows organizations to gain optimal value from their analytics solutions and achieve a deeper understanding of operations – essential to establishing and sustaining competitive advantage.

To truly leverage the high volumes of data residing in Kafka stores, companies need to be able move it, process it, and deliver it to a variety of on-premises and cloud systems with sub-second latency. It also needs to be integrated with operational data from a wide variety of sources.

Traditional batch-based solutions are not designed for situations where data is time-sensitive – they are simply too slow. To allow organizations use their data to enhance operations, tailor services, and improve customer experiences, data delivery from Kafka to HDFS systems needs to be scalable and in real time.

Continuously Deliver Data

With Striim, companies can continuously deliver data in real time from Kafka to HDFS, as well as to a wide range of targets including Hadoop and cloud environments. Depending on the requirements of the organization, all the Kafka data can be written to a number of different targets simultaneously. In use cases where not all the data is required, data can be matched to specific criteria to deliver a highly relevant subset of data to the target.

Striim can create data flows to deliver the data from Kafka to HDFS in milliseconds, “as-is.” However, depending on how the data is going to be utilized, the user may require the data to be processed, prepared, and delivered in the right format. Striim supports continuous queries to filter, transform, aggregate, enrich, and analyze the data in-flight before delivering it with sub-second latency.

Analyze Data In-Flight

By analyzing the data in-flight, Kafka users can capture time-sensitive information as the data is flowing through the data stream. Striim pushes insights and alerts to interactive dashboards highlighting real-time data and the results of pattern matching, correlation, outlier detection, predictive analytics, and further enables drill-down and in-page filtering.

Learn more about integrating and processing Kafka to HDFS in real-time, please visit our Kafka integration page.

Our experts can show you how to get maximum value from your analytics solutions using Striim for real-time data integration from Kafka to HDFS. Please contact us to schedule a demo.

Oracle CDC to Postgres

Posted on April 10, 2019 by Katherine Rincon | 3 min read | 3 views

Real-Time Data Movement with Oracle CDC to Postgres

As an open source alternative, Postgres offers a lower total cost of ownership and the ability to store structured and unstructured data. Real-time movement of transactional data using Oracle CDC to Postgres is essential to creating a rich and up-to-date view of operations and improving
customer experiences.

IDC projects that by the year 2025, 80% of all data will be unstructured. Emails and social media posts are good examples of unstructured data. The ability to integrate unstructured, semi-structured and structured data from transactional databases into the enterprise is vital for timely and relevant analysis. To get a deep understanding from all the data an organization captures and records and to get the most value from it, it must be in the right place and in the right format – in real time.

Continuous movement of transactional data using Oracle CDC to Postgres ensures the organization is utilizing the real-time information from on-prem transactional databases and other data stores that is needed to make decisions that optimize user experience and drive higher revenue.

Moving data from enterprise databases to Postgres using traditional ETL processes introduces latency. Delays incurred while the data is being migrated or updated results in an out-of-date picture of the business, and limits the extent to which decisions can have any significant impact. Organizations also face a series of challenges managing storage and accessing the actual data that can produce real value to the organization if they move all the data as is.

How Striim Simplifies Oracle CDC to Postgres

Striim enables organizations to generate real value from the transactional data residing in their existing Oracle databases. Using non-intrusive change data capture (CDC), Striim enables continuous data ingestion from Oracle to Postgres with sub-second latency. Users can easily set up ingestion via Striim’s pre-configured CDC wizards, and drag-and-drop UI.

Moving and processing data in-flight, Striim filters data that is not required and delivers what is important to Postgres – in real time. The data can also be transformed and enriched so it is delivered in the format required. Oracle CDC to Postgres allows organizations gain access to critical insights sooner and make more informed operational decisions faster.

Once the real-time data pipelines are built and the initial data load using Oracle CDC to Postgres has been performed, continuous updating with every new database transaction ensures that analytics applications have the most up-to-date information. Built-in monitoring continuously compares the source and target, validating database consistency and providing assurance that the replicated environment is completely up-to-date with the on-prem Oracle instance.

For more information on real-time data integration and processing using Striim’s Oracle CDC to Postgres solution, please visit our Change Data Capture page.

To see first-hand how easy it is to move data to Postgres using Striim’s Oracle CDC to Postgres functionality, please schedule a demo with one of our technologists.

Striim Announces Real-Time Data Migration to Google Cloud Spanner

Posted on April 2, 2019 by Katherine Rincon | 2 min read | 3 views

Google Cloud Marketplace

The Striim team has been working closely with Google to deliver an enterprise-grade solution for online data migration to Google Cloud Spanner. We’re happy to announce that it is available in the Google Cloud Marketplace. This PaaS solution facilitates the initial load of data (with exactly once processing and delivery validation), as well as the ongoing, continuous movement of data to Cloud Spanner.

The real-time data pipelines enabled by Striim from both on-prem and cloud sources are scalable, reliable and high-performance. Cloud Spanner users can further leverage change data capture to replicate data in transactional databases to Cloud Spanner without impacting the source database, or interrupting operations.

Google Cloud Spanner is a cloud-based database system that is ACID compliant, horizontally scalable, and global. Spanner is the database that underlies much of Google’s own data collection, and it has been designed to offer the consistency of a relational database with the scale and performance of a non-relational database.

Migration to Google Cloud Spanner requires a low-latency, low-risk solution to feed mission-critical applications. Striim offers an easy-to-use solution to move data in real time from Oracle, SQL Server, PostgreSQL, MySQL, and HPE NonStop to Cloud Spanner while ensuring zero downtime and zero data loss. Striim is also used for real-time data migration from Kafka, Hadoop, log files, sensors, and NoSQL databases to Cloud Spanner.

While the data is streaming, Striim enables in-flight processing and transformation of the data to maximize usability of the data the instant it lands in Cloud Spanner.

Learn More

To learn more about Striim’s Real-Time Migration to Google Cloud Spanner, read the related press release or provision Striim’s Real-Time Data Integration to Cloud Spanner in the Google Cloud Marketplace.