November 2022 - Str-Headless

Striim Cloud on AWS: Unify your data with a fully managed change data capture and data streaming service

Posted on November 30, 2022 by Ananda Venkatesha | 8 min read | 3 views

Businesses of all scales and industries have access to increasingly large amounts of data, which need to be harnessed effectively. According to an IDG Market Pulse survey, companies collect data from 400 sources on average. Companies that can’t process and analyze it to glean useful insights for their operations are falling behind.

Thousands of companies are centralizing their analytics and applications on the AWS ecosystem. However, fragmented data can slow down the delivery of great product experiences and internal operations.

We are excited to launch Striim Cloud on AWS: a real-time data integration and streaming platform that connects clouds, data and applications with unprecedented speed and simplicity.

With a serverless experience to build smart data pipelines in minutes, Striim Cloud on AWS helps you unify your data in real time with out-of-the box support for the following targets:

AWS S3
AWS Databases on RDS and Aurora
AWS Kinesis
AWS Redshift
AWS MSK
Snowflake
Databricks with Delta Lake on S3

along with over 100 additional connectors available at your fingertips as a fully managed service.

Striim Cloud runs natively on AWS services like EKS, VPC, EBS, Cloudwatch, and S3 enabling it to offer infinite large-scale, high performance, and reliable data streaming.

How does Striim Cloud bring value to the AWS ecosystem?

Striim enables you to ingest and process real-time data from over one hundred streaming sources. This includes enterprise databases via Change Data Capture, transactional data, and AWS Cloud environments. When you run Striim on AWS, it lets you create real-time data pipelines for Redshift, S3, Kinesis, Databricks, Snowflake and RDS for enterprise workloads.

Sources and targets

Striim supports more than 120+ sources and targets. It comes with pre-built data connectors that can automate your data movement from any source to AWS Redshift or S3 within a few minutes.

With Striim, all your team needs to do is to hit a few clicks for configuration, and an automated pipeline will be created between your source and AWS targets. Some of the sources Striim supports include:

Databases: Oracle, Microsoft SQL Server, MySQL, PostgreSQL, etc.
Data Streams: Kafka, JMS, IBM MQ, Rabbit MQ, IoT data over MQTT
Data formats: JSON, XML, Parquet, Free Form Text, CSV, and XML
AWS targets: RDS for Oracle, RDS for MySQL, RDS for SQL Server, Amazon S3, Databricks via Delta Lake on S3, Snowflake, Redshift, and Kinesis
Additional targets: Over 100 additional connectors including custom Kafka endpoints with Striim’s full-blown schema registry support

Change data capture

Change data capture (CDC) is a process in ETL used to track changes to data in databases (e.g., insert, update, delete) and stream those changes to target systems like Redshift. However, CDC approaches like trigger-based CDC or timestamps can affect the performance of the source system.

Striim supports the latest form of CDC — log-based CDC — which can reduce overhead on source systems by retrieving transaction logs from databases. It also helps move data continuously in real time in a non-intrusive manner. Learn about log-based CDC in detail here.

Streaming SQL

Standard SQL can only work with bounded data that are stored in a system. While dealing with streaming data in Redshift, you can’t use standard SQL because you are dealing with unbounded data, i.e., data that keep coming in. Striim provides a Streaming SQL engine that helps your data engineers and business analysts write SQL-style declarative queries over streaming data. These queries never stop running and can continuously produce outputs as streams.

Data transformation and enrichment

Data transformation and enrichment are critical steps to creating operational data products in the form of tables and materialized views with minimal cost and duplication of data. To organize these data into a compatible format for the target system, Striim helps you perform data transformation with Streaming SQL. This can include operations such as joining, cleaning, correlating, filtering, and enriching. For example, enriching helps you to add context to your data (e.g., by adding geographical information to customer data to understand their behavior).

What makes Striim unique in this regard is that it not only supports data transformation for batch data, but it also supports in-flight transformations for real-time streams with a full blown Streaming SQL engine called Tungsten.

Use case: How can an apparel business analyze data with Striim?

Suppose there’s a hypothetical company, Acme Corporation, which sells apparel across the country. The management wants to make timely business decisions that can help them to increase sales and minimize the number of lost opportunities due to delays in decision-making. Some of the questions that can help them to make the right decisions include the following:

Which product is trending at the moment?
Which store and location received the highest traffic last month?
What’s the inventory status across warehouses?

Currently, all store data is stored in a transaction database (Oracle). Imagine you’re Acme Corporation’s data architect. You can generate and visualize answers to the above questions by building a data pipeline in two steps:

Use Striim Cloud Enterprise to stream data from Oracle to Amazon Redshift.
After data is loaded into Redshift, use Amazon QuickSight service to show data insights and create dashboards.

Here’s how the flow will look:

In this blog, we will show you how you can configure and manage Striim Cloud Enterprise on AWS to create this pipeline for your apparel business within a few minutes.

Sign up for Striim Cloud

Signing up for Striim Cloud Enterprise is simple: just visit striim.com, get a free trial and sign up for the AWS solution. Activate your account by following the instructions.

Once you are signed in, create a Striim Cloud service, which essentially runs in the background and creates a dedicated Kubernetes cluster (EKS service on AWS) to host your pipeline, as you can see in the picture below.

Once the cluster is ready and before launching your service, configure secure connections using the secure SSH connection configuration, as seen below.

Create a pipeline for Oracle to Amazon Redshift

To create a pipeline, simply type Source: Oracle and target Amazon to see all the supported targets. In our example, we are selecting Amazon S3 as our target. This could be Amazon Redshift, Kinesis, etc.

The wizard will help you walk through the simple process with source and target credentials. The service automatically validates the credentials, connects to the sources, and fetches the list of schemas and tables available on the sources for your selection, as shown below.

On the target side, enter Amazon Redshift Access Key and secret key with the appropriate S3 bucket name and Object names to write Oracle data into, as depicted in the image below.

Follow the wizard to finish the configuration, which creates a data pipeline that collects historical data from the Oracle database and moves them to Amazon Redshift. For example, you can see the total number of sales across all branches during the last week.

In the next step, you can create an Oracle CDC pipeline via Striim to stream real-time change data coming in from different stores into Oracle to Redshift. Now, you can see real-time store data.

Pipeline to Amazon RedShift — A data pipeline streaming data from the source (Oracle) to Amazon RedShift

A data pipeline streaming data from the source (Oracle) to Amazon RedShift

Your data engineers can use streaming SQL to join, filter, cleanse, and enrich data on the real-time data stream before it’s written to the target system (S3). A monitoring feature offers real-time stream views for further low-latency insights.

Once data becomes available on Redshift, your data engineer can create dashboards and set up metrics for the relevant business use cases such as:

Current product trending
Store and location with the highest traffic last month
Inventory status dashboard across warehouses; quantity sold by apparel, historic graph vs. latest (last 24 hours)

Data like current trending products can be easily shared with management for real-time decision-making and the creation of business strategies.

For example, here’s a real-time view of the apparel trends by city:

And below are insights on the overall business, where you can see the top-selling and bottom-selling locations. The management can use this information to try out new strategies to increase sales in the bottom-selling locations, such as by introducing discounts or running a more aggressive social media campaign in those locations.

Striim is available for other cloud environments, too

Like AWS, Striim Cloud is available on other leading cloud ecosystems like Google Cloud and Microsoft Azure. You can use Striim with Azure to move data between on-premises and cloud enterprise sources while using Azure analytics tools like Power BI and Synapse. Similarly, you can use Striim with Google Cloud to move real-time data to analytics systems, such as Google BigQuery, without putting any significant load on your data sources.

Learn more about them here and here.

The Future of Streaming Data: Technology, Use Cases, and Opportunities

Posted on November 18, 2022 by Striim Team | 1 min read | 3 views

The streaming data market is constantly changing and evolving due to technological innovation and market demand. In order to stay ahead of the competition, companies need to have the most current and accurate information about the streaming data landscape. Watch our on-demand webinar to learn more about the future of the streaming data market. Sanjeev Mohan (Principal, SanjMo & Former Gartner Research VP) and Alok Pareek (Founder & EVP Products, Striim) discuss topics including:

- - The latest data trends including data products, data mesh, data observability, and more
  - The growth of real-time streaming replication and multiple market patterns to achieve replication (and their use cases)

Real-Time Healthcare Analytics: How Leveraging It Improves Patient Care

Posted on November 18, 2022 by John Kutay | 7 min read | 3 views

On a Tuesday night, a nurse in the emergency department receives a real-time alert on her smartphone: the department will be overcrowded within 1.5 hours. This alert, powered by real-time healthcare analytics, projects bed occupancy and anticipated care needs, allowing the nurse to coordinate with transport, radiology, and lab teams to prepare for the surge.

Historically, data silos limited information access, but real-time analytics now makes healthcare processes more connected. By aggregating and analyzing data, these insights boost operational efficiency and enhance patient care. In this post, we’ll explore how leveraging real-time healthcare analytics ensures seamless patient care and a smoother workflow for your team.

Why Leverage Real-Time Healthcare Analytics?

There are several compelling reasons why real-time healthcare analytics is essential for healthcare institutions. These include:

To Analyze EHR Data and Improve Patient Care

An electronic health record (EHR) digitally stores patient information, such as medical history, prescriptions, lab results, and treatments. While EHRs collect and display data, they lack real-time analysis capabilities — a gap filled by real-time healthcare analytics.

With real-time analytics, medical professionals can instantly access insights and recommendations based on current EHR data. This system ingests relevant data points, like progress and nursing notes, identifies diagnostic patterns, detects minor condition changes, and prioritizes patients with deteriorating health, enabling swift and proactive care.

Leveraging real-time healthcare analytics is essential in early sepsis detection. According to the CDC, sepsis claims 350,000 adult lives annually in the U.S. Early detection is vital yet challenging due to symptom overlap with other conditions. However, real-time analytics combined with AI can improve sepsis detection rates by up to 32%, according to one report.

The Medical University of South Carolina (MUSC) uses this technology to monitor patient health continuously, drawing on EHR data and machine learning to classify signs of sepsis onset. This proactive approach enables timely intervention, potentially saving lives, due to real-time data.

To Encourage People to Take a Proactive Approach to their Health

Another popular use case of real-time analytics in healthcare includes smartwatches and fitness trackers. Devices from the likes of Apple, Samsung, Fitbit, and others have exploded in popularity in recent years, enabling people to monitor their own health and adopt healthier habits.

They help people walk more by tracking their daily step count via in-app challenges, calculate the calories they lose during workouts and sports activities, and monitor their daily caloric intake. These wearables collect data from their sensors and use real-time analytics to provide useful insights.

While these devices are far from replacements for a doctor visit, they might alert the user to potential health risks. If someone notices their heart rate is often too high/too low, they may be more likely to visit their physician to check in.

For instance, a 12-year-old girl was alerted by her Apple Watch that she had an unusually high heart rate, and promptly sought medical attention. She was taken to a healthcare facility where doctors found her suffering from a rare condition in children: a neuroendocrine tumor on her appendix.

To Manage the Spread of Disease

Real-time analytics in healthcare can also help healthcare institutions and doctors identify trends in regards to the spread of illness. For instance, in 2020 during the Covid-19 pandemic, healthcare institutions leveraged real-time analytics to identify the growing disease. Healthcare organizations used machine learning algorithms fueled by data to analyze trends from 50 countries with the highest rates of Covid-19 and predict what would happen in the next several days.

Healthcare providers also leveraged real-time analytics in healthcare to determine how fast the virus was spreading in real time and how it mutated under various conditions. For example, the EU launched a software in 2020, InferRead, that collected image data from a CT scanner to analyze whether lungs were damaged due to a COVID infection. This analysis was generated within a few seconds, allowing a doctor to study it and diagnose the patient quickly.

Real-time analytics can also help to manage resources in the case of an outbreak. In the US, the Kinetica Active Analytics Platform was used to create a real-time analytics program for aggregating and tracking data. The purpose of this program was to aid emergency responders by collecting information on test kit quantities, personal protective equipment (PPE) availability, and hospital capacity. This allowed decision-makers to determine whether they could redirect patients to a hospital with capacity or set up alternative triage centers. Similarly, these insights also helped to distribute PPE to the locations where it was needed most, especially when a shortage made access more difficult.

To Optimize Hospital Staff Allocation

Healthcare institutions often face the critical challenge of maintaining optimal staffing levels. Leveraging real-time healthcare analytics can transform how hospitals predict staffing needs by analyzing historical data and identifying patterns in staffing operations. By continuously examining how nurses and other staff operated under varying circumstances, real-time analytics generates recommendations for each hour, considering potential unforeseen scenarios. This ensures that patients receive an appropriate level of care, minimizing resource gaps and elevating the standard of patient care.

Intel’s recent paper highlights how real-time healthcare analytics enables four hospitals to use data from diverse sources to forecast admissions accurately. By applying time series analysis — a statistical technique designed to identify patterns within admission records — these hospitals can predict patient arrivals hour by hour, optimizing preparation and resource allocation.

Additionally, data insights from real-time analytics empower healthcare institutions to enhance job satisfaction and reduce turnover. By identifying the percentage of experienced staff open to emergency shifts or overtime with incentives, healthcare providers can better manage workloads and redistribute tasks to prevent burnout.

Improve Patient Care and Operational Efficiency with Striim

For healthcare organizations aiming to optimize real-time healthcare analytics, Striim 5.0 offers a robust, secure solution. The platform not only ingests and analyzes high volumes of data in real-time but also introduces AI agents Sentinel and Sherlock to protect sensitive patient information. This feature automates authentication and connection processes, reducing overhead, enhancing data security, and ensuring compliance by masking personally identifiable information.

Discovery Health achieved a remarkable transformation with Striim, slashing data processing times from 24 hours to seconds. By replacing daily ETL processes with Striim’s Change Data Capture (CDC) technology, the organization seamlessly integrated disparate systems, eliminating delays and enabling faster, more responsive decisions. This innovation improved efficiency, reduced costs, and fostered personalized engagement by leveraging predictive analytics to encourage healthier member choices.

Backed by Oracle, Striim delivered unmatched reliability and scalability, utilizing advanced logical database replication expertise. The platform’s real-time insights empowered Discovery Health to promote wellness, enhance health outcomes, and streamline workflows. Through ongoing optimization, Discovery Health revolutionized its data infrastructure, driving informed decision-making and elevating customer experiences on a global scale.

Another healthcare organization that leverages Striim is Boston Children’s Hospital. In addition to enhancing patient outcomes, improving operational efficiency is critical to success in healthcare organizations. By consolidating data from multiple systems, including patient, billing, scheduling, clinical, and financial information, hospitals can streamline their operations and make faster, data-driven decisions.

Striim’s platform enables near real-time and batch-style processing of data from diverse sources like MS SQL Server, Google BigQuery, and Oracle, all feeding into a centralized Snowflake data warehouse. This seamless integration reduces the need for various scripts and disparate source systems, providing a single, cohesive view of the data pipelines. The hospital has not only saved time and money on support resources but has also significantly reduced the time it takes to deliver actionable insights to business users, a crucial factor in the fast-paced healthcare industry.

Ready to see for yourself how Striim can streamline operations and improve patient outcomes? Get started with a demo today.

How Striim Extends Azure Synapse Link

Posted on November 7, 2022 by Edward Bell | 2 min read | 3 views

We recently announced that Striim is a participant in Microsoft’s Intelligent Data Platform partner ecosystem. We’re also excited to share that Striim extends Synapse Link to add support for additional source systems.

There’s no question about the benefits of Azure Synapse. Whether it’s around on-demand usage, the ability to reduce high CapEx projects and increase cost savings, or enabling insight-driven decisions as fast as possible, Synapse can be an integral piece to your digital transformation journey. However, in order to make the most of Synapse and Power BI you need to reliably ingest data from disparate sources in real time.

In order to do so, Azure introduced Synapse Link, a method of easily ingesting data from Cosmos DB, SQL Server 2022, SQL DB, and Dataverse. Synapse Link utilizes either the change feed or change tracking to support continuous replication from the source transactional system. Rather than relying on legacy ETL tools to ingest data into Synapse on a nightly basis, Synapse Link enables more real-time analytical workloads with a smaller performance impact on the source database.

Outside of the sources included today with Synapse Link, Microsoft partnered with Striim to add support for real-time ingestion from Oracle and Salesforce to Synapse. Striim enables real-time Smart Data Pipelines into critical cloud services via log-based change data capture (CDC). CDC is the least intrusive method of reading from a source database, which reads from the underlying transaction logs rather than the database itself – empowering replications of high-value business-critical workloads to the cloud with minimal downtime and risk.

Besides pure data replication use cases, one common pattern that we see is the requirement to pre-process data in flight before even landing on Synapse. This reduces the time to value, and gets the data in the right format ahead of time. Within Striim it’s incredibly easy to do so either with out-of-the-box transformations, SQL code, or even Java for the most flexibility.

Whether you’re interested in replication or Smart Data Pipelines, to learn more please watch the free joint webinar: https://info.microsoft.com/ww-ondemand-unlock-insights-to-your-data-with-azure-synapse-link.html?lcid=en-us, or download our Oracle to Synapse or Salesforce to Synapse reference architectures.

If you have any questions please reach out to microsoft@striim.com, we’d be happy to discuss your specific use case in more detail.

Real-time Data Integration from Oracle to Google BigQuery Using Striim

Posted on November 1, 2022 by Striim Team | 1 min read | 4 views

Hosted on the Google Cloud Blog, read on to learn how relational databases like Oracle store data but Striim and Google Cloud BigQuery ensure timely and accurate analytics at scale.