Streaming Data Integration to AWS

As businesses adopt Amazon Web Services, streaming data integration to AWS – with change data capture (CDC) and stream processing – becomes a necessary part of the solution.

You’ve already decided that you want to enable integration to AWS. This could be to Amazon RDS or Aurora, Amazon Redshift, Amazon S3, Amazon Kinesis, Amazon EMR, or any number of other technologies.

You may want to migrate existing applications to AWS, scale elastically as necessary, or use the cloud for analytics or machine learning, but running applications in AWS, as VMs or containers, is only part of the problem. You also need to consider how to you move data to the cloud, ensure your applications or analytics are always up to date, and make sure the data is in the right format to be valuable.

The most important starting point is ensuring you can stream data to the cloud in real time. Batch data movement can cause unpredictable load on cloud targets, and has a high latency, meaning your data is often hours old. For modern applications, having up-to-the-second information is essential, for example to provide current customer information, accurate business reporting, or for real-time decision making.

Moving Data to Amazon Web Services in Real-Time

integration to wasStreaming data integration to AWS from on-premise systems requires making use of appropriate data collection technologies. For databases, this is change data capture, or CDC, which directly and continuously intercepts database activity, and collects all the inserts, updates, and deletes as events, as they happen. Log data requires file tailing, which reads at the end of one or more files across potentially multiple machines and streams the latest records as they are written. Other sources like IoT data, or third party SaaS applications, also require specific treatment in order to ensure data can be streamed in real time.

Once you have streaming data, the next consideration is what processing is necessary to make the data valuable for your specific AWS destination, and this depends on the use-case.

Use Cases

For database migration or elastic scalability use-cases, where the target schema is similar to the source, moving raw data from on-premise databases to Amazon RDS or Aurora may be sufficient. The important consideration here is that the source applications typically cannot be stopped, and it takes time to do an initial load. This is why collecting and delivering database change, during and after the initial load, is essential for zero downtime migrations.

For real-time applications sourcing from Amazon Kinesis, or analytics use-cases built on Amazon Redshift or Amazon EMR, it may be necessary to perform stream processing before the data is delivered to the cloud. This processing can transform the data structure, and enrich it with additional context information, while the data is in-flight, adding value to the data and optimizing downstream analytics.

Striim’s Streaming Integration to AWS

Striim’s streaming integration to AWS can continuously collect data from on-premise, or other cloud databases, and deliver to all of your Amazon Web Services endpoints. Striim can take care of initial loads, as well as CDC for the continuous application of change, and these data flows can be created rapidly, and monitored and validated continuously through our intuitive UI.

With Striim, your cloud migrations, scaling, and analytics can be built and iterated-on at the speed of your business, ensuring your data is always where you want it, when you want it.

To learn more about streaming integration to AWS with Striim, visit our “Striim for Amazon Web Services” product page, schedule a demo with a Striim technologist, or download a free trial of the platform.

Real-Time AWS Cloud Migration Monitoring: 3-Minute Demo

AWS cloud migration requires more than just being able to run in VMs or cloud containers. Applications rely on data, and that data needs to be migrated as well.

In most cases, the original applications are essential to the business, and cannot be stopped during this process. Since it takes time to migrate the data, and time to verify the application after migration, it is essential that data changes are collected, and delivered during and after that initial load.

As the data is so crucial to the business, and change data will be continually applied for a long time, mechanisms that verify that the data is delivered correctly are an important aspect of any AWS cloud migration.

Migration Monitoring Demo

In this Migration Monitoring Demo we will show how, by collecting change data from source and target and matching transactions applied to each in real time, you can ensure your cloud database is completely synchronized with on-premise, and detect any data divergence when migrating from an on-premise database.

AWS Cloud Migration Monitoring with Striim

Key Challenges

The key challenges with monitoring AWS cloud migration include:

  • Enabling data migration without a production outage with monitoring during and after migration.
  • Detecting out-of-sync data should any divergence occur with this detection happening immediately at the time of divergence, preventing further data corruption.
  • Running the monitoring solution non-intrusively with low overhead and obtaining sufficient information to enable fast resynchronization

In our scenario, we are monitoring the migration of an on-premise application to AWS. A Striim dashboard shows real-time status, complete with alerts, and is powered by a continuously running data pipeline. The on-premise application uses an Oracle Database and cannot be stopped. The database transactions are continually replicated to an Amazon Aurora MySQL Database. The underlying migration solution could be either Striim’s Migration Solution or other solutions such as AWS DMS.  

The objective is to monitor ongoing migration of transactions and alert when any transactions go out-of-sync, indicating any potential data discrepancy. This is achieved in the Striim platform through its continuous query processing layer. Transactions are continuously collected from the source and target databases in real-time and matched within a time window. If matching transactions do not occur within a period of time, they are considered long-running. If no match occurs in an additional time period, the transaction is considered missing. Alerts are generated in both cases.

Results

The number of alerts for missing transactions and long-running transactions are displayed in the dashboard. Transaction rates and operation activity are also available in the dashboard and can be displayed for all tables, or for critical tables and users.

You can immediately see live updates and alerts when the transactions do not get propagated to the target within a user configured window, with long-running transaction that eventually make it to the target also tracked.

The dashboard is user-customizable, making it easy to add additional visualizations for specific monitoring as necessary.

You have seen how Striim can be used for continuous monitoring of your on-premise to AWS cloud migration. For more information, visit our AWS solution page, schedule a demo with a Striim technologist, or get started immediately using a download from our website, or via the AWS marketplace.

 

Real-Time Cloud Migration Monitoring with Striim

In this cloud migration monitoring demo, we will show how, by collecting change data from source and target and matching transactions applied to each in real time, you can ensure your cloud database is completely synchronized with on-premise, and detect any data divergence when migrating from an on-premise database.

This was originally published as a blog post here.

To learn more about the Striim platform, visit our platform overview page here.

 

Unedited Transcript:

Migrating applications to AWS requires more than just being able to run in VMs or cloud containers. Applications rely on data and that data needs to be migrated as well. In most cases, the original applications are essential to the business and cannot be stopped during this process since it takes time to migrate the data and time to verify the application after migration. It is essential the data changes are collected and delivered during and after that initial load. As the data is so crucial to the business and change data will be continually applied for a long time, mechanisms are verified that the data is delivered correctly are an important aspect of any cloud migration. This migration monitoring demo will show how by collecting changed data from source and targets and matching transactions applied to each in real time, you can assure your cloud database is completely synchronized with on premise and it takes any day to divergence where migrating front on-premise database.

The key challenges with monitoring cloud database migrations include enabling data migration without a production outage; with monitoring during and after migration; detecting out of sync data should any divergence occur with this detection happening immediately at the time of divergence; preventing further data corruption; running the monitoring solution, non intrusively with low overhead; and obtaining sufficient information to enable fast resynchronization. In our scenario, we’re monitoring the migration of an on premise application to AWS. A Striim dashboard shows real time status complete with alerts and is powered by continuously running data pipeline. The on premise application uses an Oracle database and cannot be stopped. The database transactions are continually replicated to an Amazon Aurora MySQL database. The underlying migration solution could either be streams, migration solution or other solutions such as AWS DMS. The objective is to monitor ongoing migration of transactions and alerts when any transactions go out of sync, indicating any potential data discrepancy.

This is achieved in the Striim platform through this continuous query processing layer. Transactions are continuously collected from the source and target databases in real time and matched within a time window. If matching transactions do not occur within a period of time, they’re considered long running. If no match occurs in an additional time period, the transaction is considered missing. Alerts are generated in both cases. The number of alerts from missing transactions and long running transactions are displayed in the dashboard. Transaction rates and operation activity are also available in the dashboard and can be displayed for all tables or just for critical tables and users. You can immediately see live updates and alerts where the transactions do that get propagated to the target within a user configured window. With lung running transactions that eventually make it to target, also tracked. The dashboard is used of customizable, making it easy to add additional visualizations for specific monitoring as necessary. You’ve seen how Striim can be used for continuous monitoring of your on premise to cloud migrations. Talk to us today about this solution and get started immediately using a download from our website or test out Striim in the AWS marketplace.

Rapid Adoption of AWS Using Streaming Data Integration with CDC

In this video, Striim Founder and CTO, Steve Wilkes, talks about moving data to Amazon Web Services in real-time and explains why streaming data integration to AWS – with change data capture (CDC) and stream processing – is a necessary part of the solution.

To learn how Striim can help you continuously move real-time data into AWS, visit our Striim for AWS page.

 

Unedited Transcript:

Adopting Amazon web services is important to your business and why? Real-time data movement through streaming integration, change, data capture and stream processing necessary parts of this process you’ve already decided that you want to adopt Amazon web services is going to be Amazon rds or ever Amazon redshift, Amazon s three Amazon, Canisius, Amazon EMR, any number of other technologies you may want to migrate existing applications to AWS scale elastically as necessary or use the cloud for analytics or machine learning or any applications in AWS as VMs or containers. So only parts of the problem. You also need to consider how to move data to the cloud and to your applications. Analytics are always up to date. Make sure the data is in the right format to be valuable. Most important starting point is ensuring you can stream data to the cloud in real time. Batch data movement can cause unpredictable load enclave targets and that’s a high latency meaning it as often now as old from an applications having up to a second.

Information is essential. For example, to provide current customer information, accurate business reporting, offer real time decision maker streaming data from on premise to Amazon web services required making use of appropriate data collection technologies for databases. This has changed their to capture or CDC. We start rectally and continuously intercepts database activity and collects all the inserts, updates and deletes as events as they happen. Love data requires file Taylor which reads at the end of one or more file across potentially multiple machines and streams the latest records as they are written. Other sources like IoT data or third party SAS applications also requires specific treatments in order to ensure data can be streamed in real time which you have streaming data. The next consideration is what processing is necessary to make that data valuable. Your specific AWS destination, and this depends on the use case for database migration or lesson scalability use cases, but the target Schema is similar to the source.

Moving raw data from on premise databases to Amazon RDS or Aurora. Maybe sufficient important consideration here is that the source applications typically cannot be stopped and it takes time to do an initial load based way. Collecting and delivering database change during and after. The initial load is essential for zero downtime migrations. The real time application sourcing from Amazon, nieces or analytics use cases built on Amazon redshift or Amazon EMR, maybe necessary to perform stream processing before the data is delivered to the cloud. There’s processing can transform the data structure and in Richard with additional context information while the data is in flight, adding value to the data and optimizing downstream analytics stream streaming integration platform. We continuously collect data from on premise or other cloud sources and delivered to all of your Amazon web service endpoints to can take care of initial loads as well as CDC for the continuous application of change. And these data flows can be created rapidly and monitored and validating continuously through our intuitive UI, the stream, your cloud migration, scaling, and analytics. We built an iterated on at the speed of your business, ensuring your data. There’s always where you wanted when you want.

 

Real-Time Data Visualization and Data Exploration

When business operations run at lightning speed generating large data volumes and operational complexity abounds, real-time data visualization and data exploration becomes increasingly critical to manage daily operations. Striim enables businesses to access, analyze, visualize and explore live operational data to understand their “Now,” and take control of business operations.Real-Time Data Visualization and Data Exploration

Real-Time, Comprehensive Insight Made Easy

By combining real-time data integration, streaming analytics, and rich data visualization in a single, enterprise-grade platform, Striim allows businesses to respond to business trends and emerging issues proactively and with full context. With Striim, users not only have up-to-the-second visibility into all corners of the business with advanced custom metrics, but also the flexibility to explore streaming data without needing to write code.

Create Sophisticated Metrics Easily

Unlike packaged solutions with fixed and generic metrics, Striim’s software platform gives businesses the flexibility to gain fast and deep insight using business-specific metrics. By ingesting, filtering, aggregating, transforming, enriching, and analyzing real-time data from virtually any source, it enables custom metrics using all relevant data and the ability to dice and slice the metrics across a wide range of dimensions for fast insight. A comprehensive set of built-in SQL operations and functions – such as Math, Statistics, Date, Spatial, String – along with customizable, jumping and sliding time windowsprovide the granular and precise metric definitions that deliver accurate performance assessment.

Gain Real-Time and Flexible Visibility into Operations

By combining streaming integration and analytics capabilities with in-memory processing, Striim updates all metrics in real time as new data streams in from various sources, and stores historical data within the built-in results store for time-based comparisons.

Via the dashboards, users can compare live data to historical averages or to a specific date and time in the past, without having to write code. Real-time, interactive dashboards allow business users to view live data with detailed field and time-based filtering at the page or chart level. In addition, users can search streaming data directly on the dashboard and drill down to detail pages.

Real-Time Data Visualization and Data Exploration

Key Platform Features for Real-Time Data Visualization and Data Exploration

Striim offers an end-to-end, enterprise-grade platform to deliver instant insights from high-volume, high-velocity data. Some of the key features for real-time data visualization and data exploration are as follows:

  • Real-time data ingestion from diverse sources: Ingests, processes, and enriches unstructured, semi-structured, and structured data from databases, log files, message queues, and sensors
  • Multi-source stream processing and analytics: Performs SQL-based continuous processing on multiple streams of live data including enrichment with static and streaming reference data
  • Flexible time windows: Offers time-based, event-based, and session-based windowing
  • Interactive, live dashboards: Delivers push-based visualization with automatic refresh
  • Rewinding: Enables to view and compare historical data via the UI
  • Search: Offers keyword search on live, streaming data
  • Field and time-based filtering: Allows filtering and comparing each chart by different dimensions
  • Page and chart level filtering: Gives the flexibility to use filter at the chart or page level
  • Embedding into custom websites: Striim charts can be embedded into any HTML5 page via iFrame along with filtering and search capabilities.

Deploy and Modify Easily as Business Needs Change

Businesses can quickly gain real-time visibility into their operations via Striim’s intuitive UI without any coding. Using Striim’s simple yet powerful streaming SQL engine, Striim applications can ingest millions of data points per second and create visualization-specific aggregates. Striim’s GUI and SQL-based language makes it easy to correlate live, streaming data with historical aggregates.

Data Visualization and Data Exploration
Striim offers an intuitive UI to easily set up data flows and correlate historical data with streaming data

Within seconds of establishing data sources and flows, users can create dashboards to view live data, and modify the dashboards and charts as needed to meet ever-changing business needs. Visualizations can be done via a variety of charts such line, area, column, maps, heat maps, tables etc. Dashboards can contain multiple pages with in-page filtering and drill down available for deeper understanding of operational metrics.

Striim’s charts can be embedded to any custom dashboard or web page to support broad collaboration and distribution of real-time insights. Striim issues real-time alerts based on custom thresholds, and can trigger workflows to enable timely action.

Benefits of Data Exploration with Striim

Using Striim for live operational dashboards and streaming data exploration, businesses gain several competitive advantages including:

  • Real-time, granular, and comprehensive insights with business-specific metrics
  • Correlation of real-time and historical data to detect deviations immediately
  • Rapid iteration of the dashboards and data flows as business needs change
  • Proactive response to emerging trends based on in-time, in-context insights
  • The ability to easily meet strict SLAs and improve customer experience

Striim enables businesses to accurately track operational performance with the right metrics, in real time, so they can course-correct fast, with full confidence.

To learn more about Striim’s real-time data visualization and data exploration capabilities, visit our Creating and Monitoring Operational Metrics solutions page, schedule a demo with a Striim technologist, or download a free trial of the platform and try it for yourself!

The Inevitable Evolution from Batch ETL to Real-Time ETL (Part 1 of 2)

 

 

Traditional extract, transform, load (ETL) solutions have, by necessity, evolved into real-time ETL solutions as digital businesses have increased both the speed in executing transactions, and the need to share larger volumes of data across systems faster. In this two-part blog post series, I will describe the transition from traditional ETL to a streaming, real-time ETL and how that shift benefits today’s data-driven organizations.

The Evolution of Real-Time ETLData integration has been the cornerstone of the digital innovation for the last several decades, enabling the movement and processing of data across the enterprise to support data-driven decision making. In decades past, when businesses collected and shared data primarily for strategic decision making, batch-based ETL solutions served these organizations well. A traditional ETL solution extracts data from databases (typically at the end of the day), transforms the data extensively on disk in a middle-tier server to a consumable form for analytics, and then loads it in batch to a target data warehouse with a significantly different schema to enable various reporting and analytics solutions.

As consumers demanded faster transaction processing, personalized experience, and self-service with up-to-date data access, the data integration approach had to adapt to collect and distribute data to customer-facing applications and analytical applications more efficiently and with lower latency. In response, two decades ago, logical data replication with change data capture (CDC) capabilities emerged. CDC moves only the change data in real time, as opposed to all available data as a snapshot, and delivers data to various databases.

These “new” technologies enabled businesses to create real-time replicas of their databases to support customer applications, migrate databases without downtime, and allow real-time operational decision making. Because CDC was not designed for extensive transformations of the data, logical replication and CDC tools lead to an “extract, load, and transform” (ELT) approach where significant transformations and enrichment would be required on the target system to put the data in the desired form for analytical processing. Many of the original logical replication offerings are also architected to run single processes on one node, which creates a single point of failure and requires an orchestration layer to achieve true high availability.

The next wind of change came with the analytical solutions shifting from traditional on-premises data warehousing on relational databases to Hadoop and NoSQL environments and Kafka-based streaming data platforms, deployed heavily in the cloud. Traditional ETL had to now evolve further to a real-time ETL solution that works seamlessly with the data platforms both on-premises and in the cloud, and combines the robust transformation and enrichment capabilities of traditional ETL with low-latency data capture and distribution capabilities of logical replication and CDC.

In Part 2 of this blog post, I will discuss these real-time ETL solutions in more detail, particularly focusing on Striim’s streaming data integration software which moves data across cloud and on-premises environments with in-memory stream processing before delivering data in milliseconds to target data platforms. In the meantime, please check out our product page to learn more about Striim’s real-time ETL capabilities.

Feel free to Schedule a technical demo with one of our lead technologists, or download or provision Striim for free to experience first-hand its broad range of capabilities.

 

Back to top