Stream Data into Snowflake with Streaming Data Integration

In this video, learn why enterprises must stream data into Snowflake to take full advantage of this data warehouse built for the cloud.

To learn more about Striim for Snowflake Data Warehouse, visit our Snowflake solution page.

 

Video Transcription: 

You chose Snowflake to provide rapid insights into your data on a massive scale, on AWS or Azure. However, most of your source data resides elsewhere – in a wide variety of on-premise or cloud sources. How do you continually move data to Snowflake in real-time, processing it along the way, so that your fast analytics and insights are reporting on timely data?

Snowflake was built for the cloud, and built for speed. By separating compute from storage you can easily scale up and down as needed. This gives you instant elasticity supporting any amount of data, and high speed queries for any number of users, coupled with the peace of mind provided by secure data sharing. The per-second pricing and support for multiple clouds allows you to choose your infrastructure and only pay when you are using the data warehouse.

However, residing in cloud means you have to determine how to most effectively move data to Snowflake. This could be migrating an existing Teradata or Exadata Data Warehouse, or continually populating Snowflake with newly generated on-premises data from operational databases, logs, or device information. In order for the warehouse to provide up-to-date information, there should be as little latency as possible between the original data creation and its delivery to Snowflake.

The Striim platform can help with all these requirements and more. Our database adapters support change data capture, or CDC, from enterprise or cloud databases. CDC directly intercepts database activity and collects all the inserts, updates, and deletes as they happen, ready to stream into Snowflake. Adapters for machine logs and other files read at the end of multiple files in parallel to stream out data as it is written, removing the inherent latency of batch. While data from devices and messaging systems can be collected easily, independent of their format, through a variety of high-speed adapters and parsers.

After being collected continuously, the streaming data can be delivered directly into Snowflake with very low latency, or pushed through a data pipeline where it can be pre-processed through filtering, transformation, enrichment, and correlation using SQL-based queries, before delivery into Snowflake. This enables such things as data denormalization, change detection, de-duplication, and quality checking before the data is ever stored.

In addition to this, because Striim is an enterprise-grade platform, it can scale with Snowflake and reliably guarantee delivery of source data while also providing built-in dashboards and verification of data pipelines for operational monitoring purposes.

The Striim wizard-based UI enables users to rapidly create a new data flow to move data to Snowflake. In this example, real-time change data from Oracle is being continually delivered to Snowflake. The wizard walks you through all the configuration steps, checking that everything is set up properly, and results in a data flow application. This data flow can be enhanced to filter, transform and enrich the data through SQL-based queries. In the video, we add a name and email address from a cache, based on an ID present in the original data.

When the application is started, data flows in real-time from Oracle to Snowflake. Making changes in Oracle results in the transformed data being written continually to Snowflake, visible through the Snowflake UI.

Striim and Snowflake can change the way you do analytics, with Snowflake providing rapid insight to the real-time data provided by Striim. The data warehouse that is built for the cloud needs data delivered to the cloud, and Striim can continuously move data to Snowflake to support your business operations and decision-making.

To learn more about how Striim makes it easy to continuously move data to Snowflake, visit our Striim for Snowflake product page, schedule a demo with a Striim technologist, or download the platform and try it for yourself.

Real-Time Data is for Much More Than Just Analytics

Striim’s Real-Time Data is for Much More Than Just Analytics article was originally published on Forbes.

The conversation around real-time data, fast data and streaming data is getting louder and more energetic. As the age of big data fades into the sunset — and many industry folks are even reluctant to use the term — there is much more focus on fast data and obtaining timely insights. The focus of many of these discussions is on real-time analytics (otherwise known as streaming analytics), but this only scratches the surface of what real-time data can be used for.

If you look at how real-time data pipelines are actually being utilized, you find that about 75% of the use cases are integration related. That is, continuous data collection creates real-time data streams, which are processed and enriched and then delivered to other systems. Often these other systems are not themselves streaming. The target could be a database, data warehouse or cloud storage, with a goal of ensuring that these systems are always up to date. This leaves only about 25% of companies doing immediate streaming analytics on real-time data. But these are the use cases that are getting much more attention.

There are many reasons why streaming data integration is more common, but the main reason is quite simple: This is a relatively new technology, and you cannot do streaming analytics without first sourcing real-time data. This is known as a “streaming first” data architecture, where the first problem to solve is obtaining real-time data feeds.

Organizations can be quite pragmatic about this and approach stream-enabling their sources on a need-to-have, use-case-specific basis. This could be because batch ETL systems no longer scale or batch windows have gone away in a 24/7 enterprise. Or, they want to move to more modern technologies, which are most suitable for the task at hand, and keep them continually up to date as part of a digital transformation initiative.

Cloud Is Driving Streaming Data Integration

The rise of cloud has made a streaming-first approach to data integration much more attractive. Simple use cases, like migrating an on-premise database that services an in-house business application to the cloud, are often not even viable without streaming data integration.

The naive approach would be to back up the database, load it into the cloud and point the cloud application at it. However, this assumes a few things:

1. You can afford application downtime.

2. Your application can be stopped while you are doing this.

3. You can spin up and use the cloud application without testing it.

For most business-critical applications, none of these things are true.

A better approach to minimizing or eliminating downtime is an online migration that keeps the application running. To perform this task, source changes from the in-house database, using a technology called change data capture (CDC), as real-time data streams, load the database to the cloud, then apply any changes from the real-time stream that happened while you were doing the loading. The change delivery to the cloud can be kept running while you test the cloud application, and when you cut over, it will be already up to date.

Streaming data integration is a crucial element of this type of use case, and it can also be applied to cloud bursting, operational machine learning, large scale cloud analytics or any other scenario where having up-to-the-second data is essential.

Streaming Data Integration Is The Precursor To Streaming Analytics

Once organizations are doing real-time data collection, typically for integration purposes, it then opens the door to doing streaming analytics. But you can’t put the cart before the horse and do streaming analytics unless you already have streaming data.

Streaming analytics also requires preprepared data. It’s a commonly known metric that 80% of the time spent in data science is in data preparation. This is true for machine learning and also true for streaming analytics. Obtaining the real-time data feed is just the beginning. You may also need to transform, join, cleanse and enrich data streams to give the data more context before performing analytics.

As a simple example, imagine you are performing CDC on a source database and have a stream of orders being made by customers. In any well-normalized, relational database, these tables are mostly just numbers relating to detail contained in other tables.

This might be perfect for a relational, transactional system, but it’s not very useful for analytics. However, if you can join the streaming data with reference data for customers and items, you have now added more context and more value. The analytics can now show real-time sales by customer location or item category and truly provide business insights.

Without the processing steps of streaming data integration, the streaming analytics would lose value, again showing how important the real-time integration layer really is.

Busting The Myth That Real-Time Data Is Prohibitively Expensive

A final consideration is cost. Something that has been said repeatedly is that real-time systems are expensive and should only be used when absolutely necessary. The typically cited use cases are algorithmic trading and critical control systems.

While this may have been true in the past, the massive improvements in the price-performance equation for CPU and memory over the last few decades have made real-time systems, and in-memory processing in general, affordable for mass consumption. Coupled with cloud deployments and containerization, the capability to have real-time data streamed to any system is within reach of any enterprise.

While real-time analytics and instant operational insights may get the most publicity and represent the long-term goal of many organizations, the real workhorse behind the scenes is streaming data integration. 

Rapid Adoption of Google Cloud SQL Using Streaming Integration & CDC

In this video, we will demonstrate how Striim can provide continuous data integration via CDC to Google Cloud SQL through a pipeline for the real-time collection, processing, and delivery of enterprise data, sourcing from Oracle on-prem.

Are you looking to migrate to Google Cloud Platform? Start our free 90-day migration service today!

To learn more about the Striim platform, visit our platform overview page.

This was originally published as a blog post here.

 

Unedited Transcript:

Rapid adoption of Google Cloud SQL. We’re using Striim with streaming integration from any enterprise data source you want to move to Google Cloud SQL, but much of your data may currently be elsewhere locked up. Maybe this is in operational databases, data warehouses, legacy systems and other locations. You need a new hybrid cloud integration strategy for the continuous movement of enterprise data to and from Google Cloud with continuous collection, processing, and delivery of enterprise data in real time, not batch, to ensure Google Cloud SQL is always up to date. Data from on premise and cloud sources needs to be delivered to Google Cloud SQL including the one time load and continuous change to delivery with in-flight processing to ensure up to second information for your users, and that’s where Striim comes in. Striim is a next generation streaming integration and intelligence platform that supports your hybrid cloud initiatives and has integration with multiple Google Cloud technologies. We will demonstrate how Striim can provide continuous data integration into Google Cloud SQL through a pipeline for the real-time collection, processing and delivery of enterprise data.

Sourcing from Oracle on premise. In this case, we’ll be doing an initial load followed by continuous change delivery from Oracle to Google Cloud SQL. Striim’s UI makes it easy to continuously and non intrusively ingest all your enterprise data from a variety of sources in real time. We’ll start by doing an initial load of data from Oracle on premise to Google Cloud SQL using a data flow. When the flow is started, the full contents of the on premise customer table is loaded into Google Cloud SQL. After a short time. All the rows in the source table are present in the Google Cloud SQL customer target table. This can be monitored using Striim and the Google Cloud monitor UI. Once the initial load is complete, we can continuously deliver changes using CDC from Oracle into the Google Cloud SQL instance. A separate flow is used so that the initial load and CDC can be coordinated after many changes. You can see that the Google Cloud SQL is completely up to date with the on premise Oracle instance. The continuous updates can also be monitored through the Striim UI. You can see how Striim can enable your hybrid cloud initiatives and accelerate the adoption of Google Cloud SQL. Get started with Striim now with a trial download on our website or contact us if you want to know more.

 

Simplify Your Azure Hybrid Cloud Architecture with Streaming Data Integration

While the typical conversation about Azure hybrid cloud architecture may be centered around scaling applications, VMs, and microservices, the bigger consideration is the data. Spinning up additional services on-demand in Azure is useless if the cloud services cannot access the data they need, when they need it.

“According to a March 2018 hybrid cloud report from 451 Research and NTT Communications, around 63% of firms have a formal strategy for hybrid infrastructure. In this case, hybrid cloud does not simply mean using a public cloud and a private cloud. It means having a seamless flow of data between all clouds, on and off-premises.” – Data Foundry

To help simplify providing a seamless flow of data to your Microsoft Azure hybrid cloud infrastructure, we’re happy to announce that the Striim platform is available in the Microsoft Azure Marketplace.

How Streaming Data Integration Simplifies Your Azure Hybrid Cloud Architecture

Enterprise-grade streaming data integration enables continuous real-time data movement and processing for hybrid cloud, connecting on-prem data sources and cloud environments, as well as bridging a wide variety of cloud services. With in-memory stream processing for hybrid cloud, companies can store only the data they need, in the format that they need. Additionally, streaming data integration enables delivery validation and data pipeline monitoring in real time.

Streaming data integration simplifies real-time streaming data pipelines for cloud environments. Through non-intrusive change data capture (CDC), organizations can collect real-time data without affecting source transactional databases. This enables cloud migration with zero database downtime and minimized risk, and feeds real-time data to targets with full context – ready for rich analytics on the cloud – by performing filtering, transformation, aggregation, and enrichment on data-in-motion.

Azure Hybrid Cloud Architecture

Key Traits of a Streaming Data Integration Solution for Your Azure Hybrid Cloud Architecture

There are three important objectives to consider when implementing a streaming data integration solution in an Azure hybrid cloud architecture:

  • Make it easy to build and maintain –The ability to use a graphical user interface (GUI) and a SQL-based language can significantly reduce the complexity of building streaming data pipelines, allowing more team members within the company to maintain the environment.
  • Make it reliable – Enterprise hybrid cloud environments require a data integration solution that is inherently reliable with failover, recovery and exactly-once processing guaranteed end-to-end, not just in one slice of the architecture.
  • Make it secure –Security needs to be treated holistically, with a single authentication and authorization model protecting everything from individual data streams to complete end-user dashboards. The security model should be role-based with fine-grained access, and provide encryption for sensitive resources.

Striim for Microsoft Azure

The Striim platform for Azure is an enterprise-grade data integration platform that simplifies an Azure-based hybrid cloud infrastructure. Striim provides real-time data collection and movement from a variety of sources such as enterprise databases (ie, Oracle, HPE NonStop, SQL Server, PostgreSQL, Amazon RDS for Oracle, Amazon RDS for MySQL via low-impact, log-based change data capture), as well as log files, sensors, messaging systems, NoSQL and Hadoop solutions.

Once the data is collected in real time, it can be streamed to a wide variety of Azure services including Azure Cosmos DB, Azure SQL Database, Azure SQL Data Warehouse, Azure Event Hubs, Azure Data Lake Storage, and Azure Database for PostgreSQL.

While the data is streaming to Azure, Striim enables in-stream processing such as filtering, transformations, aggregations, masking, and enrichment, making the data more valuable when it lands. This is all done with sub-second latency, reliability and securty via an easy-to-use interface and SQL-based programming language.

To learn more about Striim’s capabilities to support the data integration requirements for an Azure hybrid cloud architecture, read today’s press release announcing the availability of the Striim platform in the Microsoft Azure Marketplace, and check out all of Striim’s solutions for Azure.

Log-Based Change Data Capture: the Best Method for CDC

Change data capture, and in particular log-based change data capture, has become popular in the last two decades as organizations have discovered that sharing real-time transactional data from OLTP databases enables a wide variety of use-cases. The fast adoption of cloud solutions requires building real-time data pipelines from in-house databases, in order to ensure the cloud systems are continually up to date. Turning enterprise databases into a streaming source, without the constraints of batch windows, lays the foundation for today’s modern data architectures. In this blog post, I would like to discuss Striim’s CDC capabilities along with its unique features that enhance the change data capture, as well as its processing and delivery across a wide range of sources and targets.

Log-based Change Data CaptureLog-Based Change Data Capture

In our blog post about Change Data Capture, we explained why log-based change data capture is a better method to identify and capture change data. Striim uses the log-based CDC technique for the same reasons we stated in that post: Log-based CDC minimizes the overhead on the source systems, reducing the chances of performance degradation. In addition, it is non-intrusive. It does not require changes to the application, such as adding triggers to tables would do. It is a light-weight but also a highly-performant way to ingest change data. While Striim reads DML operations (INSERTS, UPDATES, DELETES) from the database logs, these systems continue to run with high-performance for their end users.

Striim’s strengths for real-time CDC are not limited to the ingestion point. Here are a few capabilities of the Striim platform that build on its real-time, log-based change data capture in enabling robust, end-to-end streaming data integration solutions:

Log-based CDC from heterogeneous databases for non-intrusive, low-impact real-time data ingestion

Striim uses log-based change data capture when ingesting from major enterprise databases including Oracle, HPE NonStop, MySQL, PostgreSQL, MongoDB, among others. It minimizes CPU overhead on sources, does not require application changes, and substantial management overhead to maintain the solution.

Ingestion from multiple, concurrent data sources to combine database transactions with semi-structured and unstructured data

Striim’s real-time data ingestion is not limited to databases and the CDC method. With Striim you can merge real-time transactional data from OLTP systems with real-time log data (i.e., machine data), messaging systems’ events, sensor data, NoSQL, and Hadoop data to obtain rich, comprehensive, and reliable information about your business.

End-to-end change data integration

Striim is designed from the ground-up to ingest, process, secure, scale, monitor, and deliver change data across a diverse set of sources and targets in real time. It does so by offering several robust capabilities out of the box:

  • Transaction integrity: When ingesting the change data from database logs, Striim moves committed transactions with the transactional context (i.e., ACID properties) maintained. Throughout the whole data movement, processing, and delivery steps, this transactional context is preserved so that users can create reliable replica databases, such as in the case of cloud bursting.
  • In-flight change data processing: Striim offers out-of-the-box transformers, and in-memory stream processing capabilities to filter, aggregate, mask, transform, and enrich change data while it is in motion. Using SQL-based continuous queries, Striim immediately turns change data into a consumable format for end users, without losing transactional context.
  • Built-in checkpointing for reliability: As the data moves and gets processed through the in-memory components of the Striim platform, every operation is recorded and tracked by the solution. If there is an outage, Striim can replay the transactions from where it was left off — without missing data or having duplicates.
  • Distributed processing in a clustered environment: Striim comes with a clustered environment for scalability and high availability. Without much effort, and using inexpensive hardware, you can scale out for very high data volumes with failover and recoverability assurances. With Striim, you don’t need to build your own clusters with third-party products.
  • Continuous monitoring of change data streams: Striim continuously tracks change data capture, movement, processing, and delivery processes, as well as the end-to-end integration solution via real-time dashboards. With Striim’s transparent pipelines, you have a clear view into the health of your integration solutions.
  • Schema change replication: When source Oracle database schema is modified and a DDL statement is created, Striim applies the schema change to the target system without pausing the processes.
  • Data delivery validation. For database sources and targets, Striim offers out-of-the-box data delivery verification. The platform continuously compares the source and target systems, as the data is moving, validating that the databases are consistent and all changed data has been applied to the target. In use cases, where data loss must be avoided, such as migration to a new cloud data store, this feature immensely minimizes migration risks.
  • Concurrent, real-time delivery to a wide range of targets: With the same software, Striim can deliver change data in real time not only to on-premise databases but also to databases running in the cloud, cloud services, messaging systems, files, IoT solutions, Hadoop and NoSQL environments. Striim’s integration applications can have multiple targets with concurrent real-time data delivery.
  • Pre-packaged applications for initial load and CDC: Striim comes with example integration applications that include initial load and CDC for PostgreSQL environments. These integration applications enable setting up data pipelines in seconds, and serve as a template for other CDC sources as well.

Turning Change Data to Time-Sensitive Insights

In addition to building real-time integration solutions for change data, Striim can perform streaming analytics with flexible time windows allowing you to gain immediate insights from your data in motion. For example, if you are moving financial transactions using Striim, you can build real-time dashboards that alert on potential fraud cases before Striim delivers the data to your analytics solution.

Log-based change data capture is the modern way to turn databases into streaming data sources. However, ingesting the change data is only the first of many concerns that integration solutions should address. You can learn more about Striim’s CDC offering by scheduling a demo with a Striim technologist or experience its enterprise-grade streaming integration solution first-hand by downloading a free trial.

 

Microsoft SQL Server CDC to Kafka

By delivering high volumes of data using Microsoft SQL Server CDC to Kafka, organizations gain visibility of their business and the vital context needed for timely operational decision making. Getting maximum value from Kafka solutions requires ingesting data from a wide variety of sources – in real time – and delivering it to users and applications that need it to take informed action to support the business.Microsoft SQL Server to Kafka

Traditional methods used to move data, such as ETL, are just not sufficient to support high-volume, high-velocity data environments. These approaches delay getting data to where it can be of real value to the organization. Moving all the data, regardless of relevance, to the target creates challenges in storing it and getting actionable data to the applications and users that need it. Microsoft SQL Server CDC to Kafka minimizes latency and prepares data so it is delivered in the correct format for different consumers to utilize.

In most cases, the data that resides in transactional databases like Microsoft SQL Server is the most valuable to the organization. The data is constantly changing reflecting every event or transaction that occurs.  Using non-intrusive, low-impact change data capture (CDC) the Striim platform moves and processes only the changed data. With Microsoft SQL Server CDC to Kafka users manage their data integration processes more efficiently and in real time. 

Using a drag-and-drop UI and pre-built wizards, Striim simplifies creating data flows for Microsoft SQL Server CDC to Kafka. Depending on the requirements of users, the data can either be delivered “as-is,” or in-flight processing can filter, transform, aggregate, mask, and enrich the data. This delivers the data in the format needed with all the relevant context to meet the needs of different Kafka consumers –with sub-second latency.

Striim is an end-to-end platform that delivers the security, recoverability, reliability (including exactly once processing), and scalability required by an enterprise-grade solution. Built-in monitoring also compares sources and targets and validates that all data has been delivered successfully. 

In addition to Microsoft SQL Server CDC to Kafka, Striim offers non-intrusive change data capture (CDC) solutions for a range of enterprise databases including Oracle, Microsoft SQL Server, PostgreSQL, MongoDB, HPE NonStop SQL/MX, HPE NonStop SQL/MP, HPE NonStop Enscribe, and MariaDB.

For more information about how to use Microsoft SQL Server CDC to Kafka to maintain real-time pipelines for continuous data movement, please visit our Change Data Capture solutions page.

If you would like a demo of how Microsoft SQL Server CDC to Kafka works and to talk to one of our technologists, please contact us to schedule a demo.

Real-Time Data Ingestion – What Is It and Why Does It Matter?

 

 

The integration and analysis of data from both on-premises and cloud environments give an organization a deeper understanding of the state of their business. Real-time data ingestion for analytical or transactional processing enables businesses to make timely operational decisions that are critical to the success of the organization – while the data is still current. real-time data ingestion diagram

Transactional and operational data contain valuable insights that drive informed and appropriate actions. Achieving visibility into business operations in real time allows organizations to identify and act on opportunities and address situations where improvements are needed. Real-time data ingestion to feed powerful analytics solutions demands the movement of high volumes of data from diverse sources without impacting source systems and with sub-second latency.

Using traditional batch methods to move the data introduces unwelcome delays. By the time the data is collected and delivered it is already out of date and cannot support real-time operational decision making. Real-time data ingestion is a critical step in the collection and delivery of volumes of high-velocity data – in a wide range of formats – in the timeframe necessary for organizations to optimize their value.

The Striim platform enables the continuous movement of structured, semi-structured, and unstructured data – extracting it from a wide range of sources and delivering it to cloud and on-premises endpoints – in real time and available immediately to users and applications.

The Striim platform supports real-time data ingestion from sources including databases, log files, sensors, and message queues and delivery to targets that include Big Data, Cloud, Transactional Databases, Files, and Messaging Systems. Using non-intrusive Change Data Capture (CDC) Striim reads new database transactions from source databases’ transaction or redo logs and moves only the changed data without impacting the database workload.

Real-time data ingestion is critical to accessing data that delivers significant value to a business. With clear visibility into the organization, based on data that is current and comprehensive, organizations can make more informed operational decisions faster.

To read more about real-time data ingestion, please visit our Real-Time Data Integration solutions page.

To have one of our experts guide you through a brief demo of our real-time data ingestion offering, please schedule a demo.

Kafka to MySQL

The scalable and reliable delivery of high volumes of Kafka data to enterprise targets via real-time Kafka integration gives organizations current and relevant information about their business. Loading data from Kafka to MySQL enables organizations run rich custom queries on data enhanced with pub/sub messaging data to make key operational decisions in the timeframe for them to be most effective.

Kafka to MySQLTo get optimal value from the rich messaging data generated by CRM, ERP, and e-commerce applications, large data sets need to be delivered from Kafka to MySQL with sub-second latency. Integrating data from Kafka to MySQL enhances transactional data – providing greater understanding of the state of operations. With access to this data, users and applications have the context to make decisions and take essential and timely action to support the business.

Using traditional batch-based approaches to the movement of data from Kafka to MySQL creates an unacceptable bottleneck – delaying the delivery of data to where it can be of real value to the organization. This latency limits the potential for this data to make critical operational decisions that enhance customer experiences, optimize processes, and drive revenue.

ETL methods move the data “as is” – without any pre-processing. However, depending on the requirements not all the data may be needed and the data that is necessary may need to be augmented with other data to make it useful. Ingesting high volumes of raw data creates additional challenges when it comes to storage and getting high value actionable data to users and applications.

Building real time data pipelines from Kafka to MySQL, Striim allows users to minimize latency and support their high-volume, high-velocity data environments. Striim offers real time data ingestion with in-flight processing including filtering, transformations, aggregations, masking, and enrichment to deliver relevant data from Kafka to MySQL in the right format and with full context.

Striim also includes built-in security, delivery validation, and additional features to essential for the scalability and reliability requirements of mission-critical applications. Real time pipeline monitoring detects any patterns or anomalies as the data is moving from Kafka to MySQL. Interactive dashboards provide visibility into the health of the data pipelines and highlight issues with instantaneous alerts – allowing for timely corrective action to be taken on the results of comprehensive pattern matching, correlation, outlier detection, and predictive analytics.

For more information about gaining timely intelligence from integrating high volumes of rich messaging data from Kafka to MySQL, please visit our Kafka integration page at: https://www.striim.com/blog/kafka-stream-processing-with-striim/

If you would like a demo of real time data integration from Kafka to MySQL, and to talk to one of our experts, please contact us to schedule a demo.

Data Pipeline to Cloud

 

 

Building a streaming data pipeline to cloud services is essential to moving enterprise datain real time between on-premises and cloud environments.

Extending data infrastructure to hybrid and multi-cloud architectures enables businesses to scale easily and leverage a variety of powerful cloud-based services. Data must be a key consideration when migrating applications to the cloud, to ensure that services have access to the data they need, when they need it, and in the format required. Data Pipeline to Cloud

Although adopting a cloud architecture offers significant benefits in terms of savings and flexibility, it also creates challenges in managing data across different locations. Using traditional approaches to data movement introduces latency for applications that demand up-to-the-second information. Batch ETL methods are also constrained by the number of sources and targets that can be supported.

The Striim platform simplifies the building of a streaming data pipeline to cloud, allowing organizations to leverage fully connected hybrid cloud environments across a variety of use cases. Examples include offloading operational workloads, and extending a data center to the cloud, as well as gaining insights from cloud-based analytics.

Taking advantage of Striim’s easy-to-use wizards to build and modify a highly reliable and scalable data pipeline to cloud environments, data can be moved continuously and in real time from heterogenous on-premises or cloud-based sources – including transactional databases, log files, sensors, Kafka, Hadoop, and NoSQL databases – without slowing down source systems. Using non-intrusive, real-time change data capture (CDC) ensures continuous data synchronization by moving and processing only changed data.

Striim feeds real-time data with full-context via the data pipeline to cloud and other targets, processing and formatting it in-memory. Filtering, transforming, aggregating, enriching, and analyzing data all occurs while the data is in-flight, before delivery of the relevant data sets to multiple endpoints.

Built-in data pipeline to cloud monitoring via interactive dashboards and real-time alerts allows users to visualize the data flow and the content of datain real time. With up-to-the-second visibility of the data pipeline to cloud infrastructure, users can quickly and easily verify the ingestion, processing, and delivery of their streaming data.

To read more about building a real time data pipeline to cloud using Striim, please go to: https://www.striim.com/use-case/real-time-analytics/

If you would like to see how a data pipeline to cloud is built, please schedule a demo with one of our technologists.

What is Streaming SQL?esdfgv

 

 

Streaming SQL has become essential to real-world, real-time data processing solutions. But before examining what it is and how it works, we need to take a brief look back.

With the continuous and staggering growth of data volumes over the years, and the rising demands for analysis of data, Structured Query Language, or SQL, has become an essential component of data management and business analytics.What is Streaming SQL?

Because databases store data before it’s available for querying, however, this data is invariably old by the time it’s queried. Today, many organizations need to analyze data in real-time, which requires the data to be streamed. As a result of this shift, there’s a need for a new version of SQL that supports stream processing.

Enter Streaming SQL. Streaming SQL is similar to the older version of SQL, but it differs in how it addresses stored and real-time data. Streaming SQL platforms are continuously receiving flows of data. It’s this continuous nature of streaming that gives the technology its true value compared with traditional SQL solutions.

A key part of streaming SQL are windows and event tables, which trigger actions when any kind of change occurs with the data. When a window is updated, aggregate queries recalculate, and this provides results such as sums over micro-batches.

Streaming systems allow organizations to input huge volumes of data—including reference, context, or historical data—into event tables from files, databases, and various other sources. These tools enable users to write SQL-like queries for streaming data without the need to write code.

With Streaming SQL, queries are often highly complex, using case statements and pattern-matching syntax. These solutions make it easy for organizations to ingest, process, and deliver real-time data across a variety of environments—whether they are in the cloud or on-premises.

This helps enterprises quickly adopt a modern data architecture, creating streaming data pipelines to public cloud environments such as Microsoft Azure, Amazon Web Services, and Google Cloud Platform, as well as to Kafka, Hadoop, NoSQL, and relational databases.

It’s important to realize that Streaming SQL is not something that should be used to run on all data, such as massive databases with a billion rows. That’s not what it’s designed for. It’s better suited for working on smaller subsets of data, when there is a need to get quick results and immediately identify value in new data that’s being created.

One of the strengths of Streaming SQL comes from its ability to transform, filter, aggregate, and enrich data. It has the ability to combine all these functions together to enable organizations to get maximum value from the data constantly streaming into their systems.

To learn more about the power of streaming SQL, visit Striim Platform Overview product page, schedule a demo with a Striim technologist, or download a free trial of the platform and try it for yourself!

 

Back to top