Streaming Data Integration for Hadoop

 

 

With Striim’s streaming data integration for Hadoop, you can easily feed your Hadoop and NoSQL solutions continuously with real-time, pre-processed data from enterprise databases, log files, messaging systems, and sensors to support operational intelligence.

Ingest Real-time, Pre-Processed Data for Operational Intelligence

Striim is a software product that continuously moves real-time data from a wide range of sources into Hadoop, Kafka, relational and NoSQL databases — on-prem or in the cloud — with in-line transformation and enrichment capabilities. Brought to you by the core team behind GoldenGate Software, Striim offers a non-intrusive, quick-to- deploy solution for streaming integration so your Hadoop solution can support a broader set of operational use cases.

With the following capabilities, Striim’s streaming data integration for Hadoop enables a smart data architecture that supports use-case-driven analytics in enterprise data lakes:

  • Ingests large volumes of real-time data from databases, log files, message systems, and sensors
  • Collects change data non-intrusively from enterprise databases such as Oracle, SQL Server, MySQL, HPE NonStop, MariaDB, Amazon RDS
  • Delivers data in milliseconds to Hadoop (HDFS, HBase, Hive, Kudu), Kafka, Cassandra, MongoDB, relational databases, cloud environments, and other targets
  • Supports mission-critical environments with end-to-end security, reliability, HA, and scalability

Benefits

  • Uses low-latency data for operational use cases
  • Accelerates time to insight with a continuous flow of transformed data
  • Ensures scalability, security, and reliability for business-critical solutions
  • Achieves fast time-to-market with wizards-based UI and SQL-based language

Key Features

  • Enterprise-grade and fast-to-deploy streaming integration for Hadoop
  • Real-time integration of structured and unstructured data
  • In-flight filtering, aggregation, transformation, and enrichment
  • Continuous ingestion and processing, at scale
  • Integration with existing technologies and open source solutions

Striim enables businesses to get the maximum value from high-velocity, high-volume data by delivering it to Hadoop environments in real-time and in the right format for operational use cases.

Real-time, Low-impact Change Data Capture

Striim ingests real-time data from transactional databases, log files, message queues, and sensors. For enterprise databases, including Oracle, Microsoft SQL Server, MySQL, and HPE NonStop, Striim offers a non-intrusive change data capture (CDC) feature to ensure real-time data integration has minimal impact on source systems and optimizes the network utilization by moving only the change data.

In-Flight Data Processing

As data volumes continue to grow, having the ability to filter out and aggregate the data before analytics becomes a key way to manage the limited storage resources. Striim enables in-flight data filtering
and aggregation before it delivers to Hadoop to reduce data storage footprint. By performing in-line transformation (such as denormalization) and enrichment with static or dynamically changing data in memory, Striim feeds large data volumes in the right format without introducing latency.

Enterprise-grade Solution

Striim is designed to meet the needs of mission-critical environments with end-to-end security and reliability — including out-of-the-box exactly once processing — high-performance, and scalability. Users can focus on the application logic knowing that from ingestion to alerting and delivery, the platform is bulletproof to support the business as required.

Fast Time to Market

Intuitive development experience with drag-and-drop UI along with prebuilt data flows for multiple Hadoop targets from popular sources allow fast deployment. Striim uses an SQL-based language that requires no special skills to develop or modify streaming applications.

Operationalizing Machine Learning

Striim can pre-process and extract features suitable for machine learning before continually delivering training files to Hadoop. Once data scientists build their models using Hadoop technologies, these can be brought into Striim, using the new open processor component, so real-time insights can guide operational decision making and truly transform the business. Striim can also monitor model fitness and trigger retraining of models for full automation.

Differences from ETL

Compared to traditional ETL offerings that use bulk data extracts, Striim enables continuous ingestion of structured, semi-structured, and unstructured data in real time delivering granular data flow for richer analytics. By performing in-memory transformations on data-in-motion using SQL-based continuous queries, Striim avoids adding latency and enables real-time delivery. While ETL solutions are optimized for database sources and targets, Striim provides native integration and optimized delivery for Hadoop, Kafka, databases, and files, on-prem or in the cloud. Striim also offers stream analytics and data visualization capabilities within the same platform, without requiring additional licenses.

To learn more about streaming data integration for Hadoop, visit our Hadoop and NoSQL Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started!

Move Real-Time Data to Cloudera Using the Striim Platform

In this blog post, we’re going to take a look at how you can use the Striim platform to move real-time data to Cloudera from a variety of sources.

The Striim platform provides an enterprise-grade streaming integration solution for moving real-time change data from a wide variety of sources to Cloudera distributions of Apache Kafka, Apache Kudu, and Apache Hadoop, without impacting source systems. With support for hybrid IT infrastructures, Striim complements Cloudera solutions by enabling organizations to use full breadth and depth of their data in real time in order to gain a complete and up-to-date view into their operations.

Benefits

  • Ingest real-time data into CDK (Kafka), Kudu, Hadoop with low impact
  • Continuously collect data from databases, logs, messaging, sensors, and more
  • Process data in-flight without extensive coding
  • Get immediate insights and alerts
  • Use low-latency data in Cloudera for operational decision making

Why Striim?

  • Real-time data integration from a wide variety of data sources
  • Designed for high-volume, high-velocity data
  • Non-intrusive CDC from databases with event guarantees
  • Built-in security, scalability, and reliability
  • In-flight enrichment via built-in cache
  • Quick to deploy via SQL-like queries and wizards-based UI

Non-intrusive, Real-time Data Ingestion

The Striim platform continuously ingests real-time data from a variety of sources out-of-the-box – including databases, cloud applications, files, message queues, and devices – on-premises or in the cloud. For enterprise databases such as Oracle, SQL Server, MySQL, HPE NonStop, and MariaDB, the platform offers non-intrusive change data capture (CDC) to minimize the impact on source systems. Striim supports major data formats, including JSON, XML, AVRO, delimited binary, free text, and change records.

With a drag-and-drop UI and wizards, Striim simplifies creating data flows from popular sources to move data to Cloudera solutions including CDK (Kafka), Hadoop, HBase, Hive, and Kudu. The data can be delivered “as-is,” or be put through a series of in-flight transformations and enrichments. By using real-time, pre-processed data – especially in Kudu, Impala, and Kafka – customers can rapidly gain timely, operational intelligence from their Cloudera applications.

Delivery to Cloudera, On-premises or Cloud

The Striim platform can continuously apply pre-processed, streaming data to Cloudera solutions with sub-second latency. With parallelization capabilities, Striim offers optimized loading to Cloudera solutions. Striim can also deliver real-time data to other targets such as databases and files.

Built-in Stream Processing and Monitoring

Through SQL-based continuous queries, the Striim platform filters, aggregates, transforms, joins, and enriches multiple streams of real-time data in-memory to rapidly prepare the data for different downstream users before delivering to Cloudera environments. 

Striim also comes with built-in validation and monitoring capabilities. The platform enables users to continuously monitor the health of the data pipelines via real-time dashboards and alerts.

Enterprise-grade Modern Streaming Integration

Striim is designed form the ground up to support high-volume, high velocity data with built-in validation, security, high-availability, reliability, and scalability to support mission-critical applications.

Unlike traditional ETL solutions, Striim continuously ingests granular and larger data sets for richer analytics. It does so without impacting source systems, and processes the data in-memory, while it is streaming, to enable sub-second latency. Striim also differs from traditional logical replication tools with its optimized support for a wide range of data types, data sources, and targets, and its out-of-the-box comprehensive stream processing capabilities.

To learn more about how you can utilize the Striim platform to move data to Cloudera, please reach out to schedule a demo with a Striim expert or download the platform and try it for yourself.

Moving Real-Time Data to the Google Cloud Platform

Unedited Transcript:

but don’t do the Google cloud platform is important to your business. Well, why are realtime data movement change data capture the stream processing necessary parts of this process? You’ve already decided that you want to adopt the Google play platform. This could be Google big query and pubsub Pie sinkhole type data protocol. Any number of other technologies you may want to migrate existing applications to the cloud scale elastically as necessary or you say five analytics on machine learning, but running applications in the cloud as vms or containers is only part of the problem. You also need to consider how to move data to the cloud and share your applications or analytics always up to date and make sure the data is in the right format to be valuable. The most important starting point is ensuring you can stream data to the cloud in real time. Batch data movement can cause unpredictable load on the play of targets and has a high latency meaning.

Speaker 2: 00:59 The data is often how as all for modern applications have up to the second inflammation is essential. For example, to provide current customer information, accurate business reporting or for real time decision making, streaming data from on premise to the Google play platform because making use of appropriate data collection technologies for example, change that to capture or CDC dark thing continuously intercepted database activity and collects all the inserts, updates and deletes as events as they happen. Mt Data. It requires file taping which reads at the end of one of our file and potentially multiple machines and streams. The latest record society ratio. Other sources like Iot data or third party SAS applications also require specific treatment. You know that’s what your daddy can street in real time, which it has streaming data. The next consideration is what processing is necessary to make the data valuable for your specific bouquet destination.

Speaker 2: 01:57 And this depends on the use case for database migration and the elastic scalability use cases where the targets here might have similar to the source, maybe rule change data from on premise databases to Google cloud sequel may be sufficient. However, there are real time applications sourcing from Google pubsub. Well analytics use cases built on Google. Big Query of data pro, it may be necessary to perform street processing before that data is delivered to the cloud. This processing can transform the data structure and then enrich it with additional context information while the data is in flight, adding value to the data and optimizing the industry and analytics stream streaming integration platform. We continuously collect data from on premise or private databases and deliver to only go Google cloud endpoints. Street could take care of initial leverage as well as CDC for the continuous application of change. And these data flows can be created rapidly and monitored and validated continuously through our intuitive UI and stream your cloud migration, scaling and analytics. Can we build an iterative download speed of your business and should be? Your data is always way warranted when you

 

What Is Streaming Data Integration?

 

 

Streaming data integration is a fundamental component of any modern data architecture. Increasingly, companies need to make data-driven decisions – regardless of where data resides, when it matters most – immediately. Streaming data integration is one of the first steps in being able to leverage the next-generation infrastructures such as Cloud, Big Data, real-time applications, and IoT that underlie these decisions.

In this post, we’re going to take a look at how the Striim platform was built from the ground up for streaming data integration, and how organizations are benefitting from it. Striim enables businesses to move to Cloud, easily build real-time applications, and get more value from Hadoop solutions.

Striim is patented, enterprise-grade software for streaming data integration, which offers continuous data collection, stream processing, pipeline monitoring, and real-time delivery with verification across heterogeneous systems. Striim provides up-to-date data in a consumable form in Kafka, Hadoop, and databases — on-prem or in the Cloud — to support operational intelligence and other high-value workloads.

Core Platform Capabilities

  • Continuous, Structured, and Unstructured Data Collection: Striim captures real-time data from a wide variety of sources including databases (using low-impact chance data capture), cloud applications, log files, IoT devices, and message queues.
  • SQL-based Stream Processing: Striim applies filtering, transformations, aggregations, masking, and enrichment using static or streaming reference data.
  • Pipeline Monitoring and Alerting: Striim allows users to visualize the data flow and the content of data in real time, and offers delivery validation.
  • Real-Time Delivery: Striim distributes real-time data in a consumable form to all major targets including Cloud environments, Kafka and other messaging systems, Hadoop, relational and NoSQL databases, and flat files.

Key Platform Differentiators

  • Streaming data integration with intelligence via an in-memory platform
  • Real-time data movement across on-prem and cloud environments
  • Low-impact CDC for Oracle, SQL Server, HPE NonStop, and MYSQL
  • In-flight filtering, aggregation, transformation, and enrichment using SQL
  • Quick-to-deploy and easy-to-integrate via drag-and-drop UI
  • Continuous data pipeline monitoring and built-in delivery validation
  • Integration with existing technologies and open source solutions

Common Use Cases

Here are just a few of the most common ways Striim customers leverage its patented software to solve critical enterprise challenges:

Hybrid Cloud Integration

Striim eases cloud adoption by continuously moving real-time data from on-premises and cloud sources to Microsoft Azure, Amazon Web Services (AWS), and Google Cloud Platform environments. Many Striim customers use pre-built data pipelines to feed their cloud solutions from their on-premises databases, files, messaging systems, and sensors to enable operational workloads in the cloud. By filtering, aggregating, transforming, and enriching the data-in-motion before delivering to the cloud, Striim delivers real-time data in consumable form and helps to optimize cloud storage. Available on-premises or in the cloud, Striim enables businesses to get up and running in a matter of minutes.

Data Integration for Real-Time Applications

Striim enables real-time applications on event-based messaging systems such as Kafka, fast analytics storage solutions such as Kudu, and NoSQL databases such as Cassandra by continuously feeding pre-processed data in real time. Striim offers a wizard-based UI and SQL-based language for easy and fast development. Also, when needed Striim performs SQL-based streaming analytics and visualizes the streaming data, before delivering the data to the target to provide real-time operational intelligence.

Real-Time Integration and Pre-Processing for Hadoop

Striim enables a modern, smart data architecture for data lakes by non-intrusively and continuously collecting real-time data from databases, logs, messaging systems, and sensors, and pre-processing the data-in-motion for operational reporting and analytics. To accelerate insights and optimize storage, Striim filters, masks, aggregates, transforms, and enriches the data before delivering with sub-second latency to HDFS, HBase, and Hive. Striim can also pre-process and extract features suitable for machine learning before continually delivering training files to Hadoop. Models built using Hadoop technologies can be brought into Striim, so real-time insights can guide operational decision making and truly transform the business. Striim can also monitor model fitness and trigger retraining of models for full automation.

To learn more about our streaming data integration capabilities, please visit our Real-time Data Integration solution page, schedule a demo with a Striim expert, or download the Striim platform to get started!

Real-Time Data Warehousing with Azure SQL Data Warehouse and Striim

[This post was originally published by Ellis Butterfield, Program Manager for Azure SQL Data Warehouse, on the Microsoft Azure blog. For more information about Azure SQL Data Warehouse, please visit https://azure.microsoft.com/en-us/services/sql-data-warehouse/.]

Gaining insights rapidly from data is critical to competitiveness in today’s business world. Azure SQL Data Warehouse (SQL DW), Microsoft’s fully managed analytics platform, leverages Massively Parallel Processing (MPP) to run complex interactive SQL queries at every level of scale.

Users today expect data within minutes, a departure from traditional analytics systems which used to operate on data latency of a single day or more. With the requirement for faster data, users need ways of moving data from source systems into their analytical stores in a simple, quick, and transparent fashion. In order to deliver on modern analytics strategies, it is necessary that users are acting on current information. This means that users must enable the continuous movement from enterprise data, from on-premises to cloud and everything in-between.

SQL Data Warehouse is happy to announce that Striim now fully supports SQL Data Warehouse as a target for Striim for Azure. Striim enables continuous non-intrusive performant ingestion of all your enterprise data from a variety of sources in real time. This means that users can use intelligent pipelines for change data capture from sources such as Oracle Exadata straight into SQL Data Warehouse. Striim can also be used to move fast-moving data landing in your data lake into SQL Data Warehouse with advanced functionality such as on-the-fly transformation and model-based scoring with Azure Databricks.

“Enterprises adopting cloud-based analytics need to ensure reliable, real-time and continuous data delivery from on-prem and cloud-based data sources to reduce decision latencies inherent in batch based analytics. Striim’s solution for SQL Data Warehouse is offered in the Azure marketplace, and can help our customers quickly ingest, transform, and mask real time data from transactional systems or Kafka into SQL Data Warehouse to support both operational and analytics workloads”.

– Alok Pareek, Founder and EVP of Products for Striim

 

Via in-line transformations, including denormalization, before delivering to Azure SQL Data Warehouse, Striim reduces on-premises ETL workload as well as data latency. Striim enables fast data loading to Azure SQL DW through optimized interfaces such as streaming (JDBC) or batching (PolyBase). Azure customers can store the data in the right format, and provide full context for any downstream operations, such as reporting and analytical applications.

Next steps

To learn more about how you can build a modern data warehouse using Azure SQL Data Warehouse and Striim, watch this video, schedule a demo with a Striim technologist, or get started now on the Azure Marketplace.

Learn more about SQL DW and stay up-to-date with the latest news by following @AzureSQLDW on Twitter.

Back to top