Everyone wants real-time data…in theory. You see real-time stock tickers on TV, you use real-time odometers when you’re driving to gauge your speed, when you check the weather in your app.
Yet the “Modern Data Stack” is largely focussed on delivering batch processing and reporting on historical data with cloud-native platforms. While these cloud analytics platforms have transformed business operations, we are still missing the real-time piece of the puzzle and many data engineers feel inclined to think real-time is simply out of their organization’s reach. As a result, companies don’t have a real-time, single source of truth for their business nor can they take in-the-moment actions on customer behavior.
Why? Real-time data is currently synonymous with spinning up complex infrastructure, cobbling together multiple projects, and figuring out the integrations to internal systems yourself. The more valuable work of delivering fresh data to enable real-time data-driven applications in the business seems like an afterthought compared to the engineering prerequisites.
Now there is another way…
Striim is a simple unified data integration and streaming platform that uniquely combines change data capture, application integration, and Streaming SQL as a fully managed service that is used by the world’s top enterprises to truly deliver real-time business applications.
With Striim Developer, we’ve opened up the core piece of Striim’s Streaming SQL and Change Data Capture engine as a free service to stream up to 10 million events per month with an unlimited number of Streaming SQL queries. Striim Developer includes:
CDC connectors for PostgreSQL, MongoDB, SQLServer, MySQL, and MariaDB
SaaS connectors for Slack, MS Teams, Salesforce, and others
Streaming SQL, Sliding and Jumping Windows, Caches to join data from databases and data warehouses like Snowflake
Source and Target connectors for BigQuery, Snowflake, Redshift, S3, GCS, ADLS, Kafka, Kinesis, and more
Now any data engineer can quickly get started prototyping streaming use cases for production use with no upfront cost. You can even use Striim’s synthetic continuous data generator and plug it into your targets to see how real-time data behaves in your environment.
What happens when you hit your monthly 10 million event quota? We simply pause your account and you can resume using it the following month without losing your pipelines. You also download your pipelines as code and upgrade to Striim Cloud in a matter of clicks. No effort wasted.
Use cases you can address in Striim Developer:
Act on anomalous customer behavior by comparing real-time data with their historical norms, then alert internally in Slack or Teams
Implement data contracts on database schemas and freshness SLAs with Striim’s CDC, Streaming SQL, and schema evolution rules
Compute moving averages, aggregations, and run regressions on streaming data from Kafka or Kinesis using SQL.
If you’d like to join our first cohort of Striim Developers, you can sign up here.
If you’d like to get an overview from a data streaming expert first, request a demo here.
According to a recent study by KX, US businesses could see a total revenue uplift of $2.6 trillion through investment in real-time data analytics. From telecommunication to retail, businesses are harnessing the power of data analytics to optimize operations and drive growth.
Striim is a data integration platform that connects data from different applications and services to deliver real-time data analytics. These three companies successfully harnessed data analytics through Striim and serve as excellent examples of the practical applications of this valuable tool across industries and use cases.
1. Ciena: Enabling Fast Real-time Insights to Telecommunication Network Changes
Ciena is an American telecommunications networking equipment and software services supplier. It provides networking solutions to support the world’s largest telecommunications service providers, submarine network operators, data and cloud operators, and large enterprises.
Use cases
Ciena’s data team wanted to build a modern, self-serve data and analytics ecosystem that:
Improves the customer experience by enabling real-time insights and intelligent automation to network changes as they occur.
Facilitates data access across the enterprise by removing silos and empowering every team to make data-driven decisions quickly.
To meet its goals, Ciena chose Snowflake as its data warehousing platform for operational reporting and analytics and Striim as its data integration and streaming solution to replicate changes from its Oracle database to Snowflake. The company used Striim to collect, filter, aggregate, and update (in real time) 40-90 million business events to Snowflake daily across systems that manage manufacturing, sales, and dozens of other crucial business functions to enable advanced real-time analytics.
With its real-time analytics platform, Ciena has offered customers up-to-date insights as changes occurred in its network, thus improving the customer experience. Additionally, operators can begin experimenting with machine learning by using real-time analytics to identify network events that could impact performance.
Finally, with its self-serve analytics platform, everyone in the organization can now access the data they need to make faster data-driven decisions. With real-time analytics, Ciena’s customers no longer have to wait to see their updated data because it is displayed instantly after any changes are made in the source platforms.
“Because of Striim, we have so much customer and operational data at our fingertips. We can build all kinds of solutions without worrying about how we’ll provide them with timely data,” Rajesh Raju, director of data engineering at Ciena, explains.
2. Macy’s: Improving Digital and Mobile Shopping Experiences
Macy’s, Inc. is one of America’s largest retailers, delivering quality fashion to customers in more than 100 international destinations through the leading e-commerce site macys.com. Macy’s, Inc. sells a wide range of products, including men’s, women’s, and children’s clothes and accessories, cosmetics, home furnishings, and more.
Use cases
Macy’s real-time analytics use cases were to:
Achieve real-time visibility into customer and inventory orders to maximize operational cost, especially during the peak holiday events like Black Friday and Cyber Monday
Leverage artificial intelligence and machine learning to personalize customer shopping experiences.
Quickly turn data into actionable insights that help Macy’s deliver quality digital customer experiences and improve operational efficiencies.
Macy’s migrated its on-premise inventory and order data to Google Cloud storage to reach its objectives. The company decided to move to the cloud based on the benefits of cost efficiency, flexibility, and improved data management. To facilitate the data integration process, it used Striim, which allowed it to:
Import historical and real-time on-premise data from its Oracle and DB2 mainframe databases.
Process the data in flight, including detecting and transforming mismatched timestamp fields.
Continuously deliver data to its Big Query data warehouse for scalable analysis of petabytes of information.
Real-time data analytics has been a critical factor in Macy’s ability to understand customer behaviors and improve the shopping experience for its customers. Data analytics has enabled the company to increase customer purchases and loyalty and optimize its operations to minimize costs. As a result, Macy’s has been able to offer its customers a seamless and personalized shopping experience.
MineralTree, formerly Inspyrus, is a fintech SaaS company specializing in automating the accounts payable (AP) process of invoice capture, invoice approval, payment authorization, and payment completion. To do this, the company connects with hundreds of different ERP and accounting systems companies and streamlines the entire AP process into a unified system.
Use cases
MineralTree wanted to build a real-time data analytics system to:
Provide customers with a real-time view of all their invoicing reports as they occur.
Help customers visualize their data using a business intelligence tool.
MineralTree used Striim to seamlessly integrate customer data from various ERP and accounting systems into its Snowflake cloud data warehouse. Striim’s data integration connector enabled the company to generate real-time operational data from Snowflake and use it to power the business intelligence reports it provides to customers through Looker.
MineralTree updated data stack, consisting of Striim, Snowflake, dbt, and Looker, has enhanced the invoicing operations of its customers through rich, value-added reports.
According to Prashant Soral, CTO, the real-time data integration provided by Striim from operational systems to Snowflake has been particularly beneficial in generating detailed, live reports for its customers.
Transform How Your Company Operates Using Real-time Analytics With Striim
Real-time analytics transforms how your business operates by providing accurate, up-to-date information that can help you make better decisions and optimize your operations.
Striim offers an enterprise-grade platform that allows you to easily build continuous, streaming data pipelines to support real-time cloud integration, log correlation, edge processing, and analytics. Request a demo today.
Low code development is a powerful tool for businesses looking to streamline their processes and improve efficiency. Striim is a low-code platform that provides users with a variety of benefits, including the ability to quickly and efficiently process and analyze data in real time. By joining the Striim community, low-code users can take advantage of the following benefits:
Real-time analytics: One of the key benefits of using Striim is the ability to process and analyze data in real time. This means businesses can gain insights into their operations and make more informed decisions without waiting for data to be processed and analyzed.
Single source of truth with easy integration: Striim is designed to be easy to integrate with a wide range of data sources and systems, including databases, data lakes, and cloud services. This means that businesses can easily connect all of their data sources and gain a complete view of their operations.
High scalability: Striim is highly scalable, which means that it can easily handle large volumes of data. This is particularly useful for businesses experiencing rapid growth or needing to process large amounts of data in real-time.
Community support: By joining the Striim community, users can take advantage of the support and knowledge of other Striim users. This can be particularly valuable for businesses new to low-code development or looking to improve their processes.
Cost-effective: Low code development is a cost-effective solution for businesses. Striim is no exception. It provides businesses with a robust platform that enables them to streamline their operations and gain insights into their data without investing in expensive development resources.
In conclusion, Striim is a low-code platform that provides businesses with a powerful real-time tool for processing and analyzing data. By joining the Striim community, low-code users can take advantage of benefits such as real-time data processing, easy integration, high scalability, community support, and cost-effectiveness. These benefits can help businesses improve their operations and gain insights into their data, which can lead to increased efficiency and better decision-making.
As a data architect, business intelligence professional, or Chief Technical Officer, you know how important it is to have access to real-time data streaming to make the most informed decisions for your organization. That’s where Striim comes in.
One of the biggest benefits of using Striim is the ability to easily integrate with a variety of data sources, including databases, message queues, data warehouses, sensors, and files. This allows you to collect and stream data from a wide range of sources, providing a comprehensive view of your organization’s data.
But, as a busy professional, you may be wondering how you can stay up-to-date on the latest developments and best practices in the world of data streaming. That’s where the Striim Community and Discord come in.
By joining the Striim Community, you’ll have access to a wealth of knowledge and resources from other professionals using Striim to stream data in real-time. You can ask questions, share your experiences, and learn from others facing similar challenges.
The Striim Discord server is another great resource for staying connected with the Striim community. Here, you can join discussions and participate in live chats with other Striim users. You can also access support from Striim experts and get answers to your technical questions.
In addition to the knowledge and support you’ll gain from the Striim Community and Discord, there are many other benefits to using Striim for data streaming. For example, Striim’s built-in machine-learning capabilities allow you to analyze data streams in real-time, providing valuable insights and helping you make more informed decisions.
Striim also offers a low-code development environment, making it easy for non-technical users to build and deploy data streaming applications. This can save your organization time and resources, allowing you to quickly and easily implement data streaming solutions.
Overall, using Striim for data streaming offers a wide range of benefits for data architects, business intelligence professionals, and Chief Technical Officers. By joining the Striim Community and accessing the resources on the Striim Discord server, you can stay up-to-date on the latest developments and best practices, gain valuable support and insights from other professionals, and leverage the powerful features of Striim to stream data in real-time. Click below to get your free invite code.
At Striim we recognize the essential role that our software plays in the data architecture of our customers. Our unified real-time data integration and stream processing platform and our fully managed SaaS data products in Striim Cloud, are the vital engines that drive the data for many mission critical applications. Our customers need to trust us, and our software, to be secure and available.
Nine months ago we announced our SOC 2 Type I certification. To further this trust, we are very excited to announce that Striim has now achieved SOC 2 Type II certification.
A SOC 2 assessment report provides detailed information and assurance about an organization’s security, confidentiality, availability, processing integrity, and/or privacy controls, based on their assurance of compliance with the American Institute of Certified Public Accountants (AICPA) Trust Services Principles and Criteria for Security. A SOC 2 report is often the primary document that the security departments of our customers will rely on to assess Striim’s ability to maintain adequate security, and reviewing such documents is itself often required by SOC 2 controls.
SOC 2 compliance comes in two forms: the SOC 2 Type I report which describes the design of the controls we have in place to meet relevant trust criteria at specific point in time; and a SOC 2 Type II report which details the operational effectiveness of those controls over a specified period of time. These reports are the results of audits performed by independent third parties, in our case Grant Thornton LLP.
We completed SOC 2 Type I last year, and successfully operated the controls for a period of nine months in order to become SOC 2 Type II certified. The Controls that the audit covers include Striim as a corporation, our on-premise Platform and the Striim Cloud managed SaaS offering. They cover infrastructure, software, devices, people, data, and our corporate and customer policies, procedures and processes.
To achieve this certification, we relied on the investments we made for SOC 2 Type I certification in defining processes, policies and procedures, as well as training and utilization of technologies. Continual internal audits ensured we were meeting our goals and not straying from the many controls we have in place. This required the continual efforts of a cross functional team including contributions from executive management, security, human resources, engineering, infrastructure and legal departments.
SOC 2 is not just a certification, and it is not something you do once just to gain a check mark. The annual audits require that the controls and processes around them are ingrained into the DNA of every Striimer, and the insight gained during the process is a stepping stone to other broader and industry specific certifications.
This is just the start of our journey, so stay tuned for further exciting updates. The SOC 2 Type II report is available on request for our customers and those in the process of evaluating Striim.
We are pleased to announce the release of Striim Platform 4.1, the latest version of Striim’s flagship real-time streaming and data integration platform. Our releases incorporate feedback from our customers in terms of new features, enhancements to existing features, and bug fixes. We have centered Striim 4.1. around the themes of scalability, performance, and automation.
3 new data adapters
We have introduced 3 new data adapters and 1 new parser in Striim 4.1 to support customers’ high-performance applications and workflows that process large volumes of data. With these new adapters and parsers, Striim now supports over 125 types of readers and writers.
OJet reader for Oracle: Ojet is Striim’s next-generation high-performance Oracle adapter that can read up to 150+ gigabytes of data per hour from Oracle databases (up to version-21c). OJet is the highest-performing Oracle CDC reader today. We tested OJet to be able to read 3 billion events per day from Oracle and write to Google BigQuery with an average end-to-end latency of 1.9 seconds. With an average event size of 1.3 KB, this means that OJet read 3.8 TB of data per day. We have designed OJet for efficiency: in our tests, OJet resulted in a mere 43% CPU utilization across 8 cores.
Azure Cosmos DB reader: Microsoft Azure Cosmos DB is a fully-managed NoSQL database service for modern application development. Striim introduces a new adapter to ingest data using change streams from Azure Cosmos DB with the SQL API or the MongoDB API. You can now use Striim to read real-time data from operational applications running on Cosmos DB, and write to their preferred datawarehouse, such as Azure Synapse, Snowflake, or Google BigQuery to gain visibility into their operational data.
Databricks Delta Lake writer: Stiim now supports real-time integration to Databricks Delta Lake, a long-requested feature by our customers. Delta Lake can improve the reliability of data lakes by providing additional capabilities such as ACID transactions, scalable metadata handling, and unified stream and batch data processing. You can now use the Databricks Delta Lake writer to build your real-time SQL analytics, real-time monitoring, and real-time machine-learning workflows.
Parquet parser: Apache Parquet is a column storage file format that is popular in the data engineering and AI/ML ecosystems. You can now read data in Parquet format from supported sources such as Amazon S3 or distributed file systems such as the Hadoop Distributed File System, thus enabling real-time integration and analytics with your big data applications.
Enhancements
In addition, we have also enhanced our existing readers and writers. We have updated our Salesforce reader to support the latest Salesforce API (v51), and to read custom and multi-objects. We now support Kerberos-based authentication when reading from Oracle and PostgreSQL databases, and merge operations with Microsoft Azure Synapse.
Striim 4.1 offers enhanced operational and management enhancements for our customers that have deployed Striim on a single or multiple nodes. We support smart application rebalance by monitoring the compute resources consumed by Striim applications, and, in the event of a node going down, distributing Striim applications among the existing nodes. Striim can detect when the node rejoins the cluster, and it can redistribute Striim applications to balance the load among all online nodes. This maximizes operational uptime, reduces manual intervention, and provides improved scalability and cluster performance for our customers.
Data observability and data traceability are emerging patterns among enterprise customers. When dealing with data integration at scale across multiple teams, and hundreds to thousands of users, enterprise customers often ask where a data entry or data field originated. We are the first data streaming platform to natively support data streaming lineage functions. Striim can send your application metadata to your chosen data warehouse or analytical system. You can then use a data governance tool to know about all Striim components that process your data as the data moves from source to target.
With Striim 4.1, we support emerging workload patterns and collaboration among developers and database administrators by sending real-time alerts to Slack channels, thus enabling them to monitor and react to their data pipelines in real-time. Additionally, customers can build on Slack’s integrations with enterprise tools such as ServiceNow or PagerDuty to automatically create IT tickets based on the incoming alert message.
These are just a few of the major new features that are part of Striim 4.1. To hear more about Striim 4.1, you can watch a LinkedIn Live recording from the recent launch. You can also visit the Striim User Guide for a full list of new features included in the release, as well as the list of customer-reported issues that are fixed with this release.
We are excited to announce our new Striim Database Migration Service, StreamShift that provides native integration with Microsoft Azure Cosmos DB. We have worked hard to resolve any pain points around data integration, migration and data analytics for Azure Cosmos DB users. Striim provides a rich user experience, cost effective data movement, enhanced throughput throttling, and flexibility with over 100 native connectors.
Problem
Traditional ETL data movement methods are not suitable for today’s analytics or database migration needs. Batch ETL methods introduce latency by periodically reading from the source data service and writing to target data warehouses or databases after a scheduled time. Any analytics or conclusions made from the target data service are done on old data, delaying business decisions, and potentially creating missed business opportunities. Additionally, we often see a hesitancy to migrate to the cloud where users are concerned of taking any downtime for their mission critical applications.
Azure Cosmos DB users need native integration that supports relational databases, non-relational and document databases as sources and offers flexibility to fine-tune Azure Cosmos DB target properties.
Striim’s latest integration with Cosmos DB solves the problem
The Striim software platform offers continuous real-time data movement from a wide range of on-premises and cloud-based data sources to Azure. While moving the data, Striim has in-line transformation and processing capability (e.g., denormalization). You can use Striim to move data into the main Azure services, such as Azure Synapse, Azure SQL Database, Azure Cosmos DB, Azure Storage, Azure Event Hubs, Azure Database for MySQL and Azure Database for PostgreSQL, Azure HDInsight, in a consumable form, quickly and continuously.
Striim offers real-time uninterrupted continuous data replication with automatic data validation, which assures zero data loss and data corruption.
Even though Striim can move data to various other Azure targets, in this blog we will focus on Azure Cosmos DB use cases that were recently released.
Supported sources for Azure Cosmos DB as a target:
Source
Target
SQL
Azure Cosmos DB
MongoDB
Azure Cosmos DB
Cassandra
Azure Cosmos DB
Oracle
Azure Cosmos DB
MySQL
Azure Cosmos DB
PostgreSQL
Azure Cosmos DB
Salesforce
Azure Cosmos DB
HDFS
Azure Cosmos DB
MSJet
Azure Cosmos DB
Architecture
The architecture below shows how Striim can replicate data from a range of sources including heterogeneous databases to various targets on Azure. However, this blog will focus on Azure Cosmos DB.
Low-Impact Change Data Capture
Striim uses CDC (Change Data Capture) to extract change data from the database’s underlying transaction logs in real time, which minimizes the performance load on the RBMS by eliminating additional queries.
Non-stop, non-intrusive data ingestion for high-volume data
Support for data warehouses such as Oracle Exadata, Teradata, Amazon Redshift; and databases such as Oracle, SQL Server, HPE Nonstop, MySQL, PostgreSQL, MongoDB, Amazon RDS for Oracle, Amazon RDS for MySQL
Real-time data collection from logs, sensors, Hadoop, and message queues to support operational decision making
Continuous Data Processing and Delivery
In-flight transformations – including denormalization, filtering, aggregation, enrichment – to store only the data you need, in the right format
Built-In Monitoring and Validation
Interactive, live dashboards for streaming data pipelines
Continuous verification of source and target database consistency
Real-time alerts via web, text, email
Use case: Replicating On-premises MongoDB data to Azure Cosmos DB
Let’s take a look at how to migrate data from MongoDB to the Azure Cosmos DB API for MongoDB within Striim. Using the new native Azure Cosmos DB connector users can now set properties like collections, RUs, partition key, exclude collections, batch policy, retry policy, etc. before replication.
To get started, in your Azure Cosmos DB instance, create a database mydb containing the collection employee with the partition key /name.
After installing Striim either locally or through the Azure Marketplace, you can take advantage of the Web UI and wizard-based application development to migrate and replicate data to Azure Cosmos DB in only a few steps.
Choose MongoDB to Azure Cosmos DB app from applications available on Striim
Enter your source MongoDB connection details and select the databases and collections to be moved to Azure Cosmos DB.
Striim users now will have customized options to choose the Azure Cosmos DB target APIs between Mongo, Cassandra, or Core (SQL). Throughput (RU/s) calculation and cost can be calculated using Azure Cosmos DB capacity calculator and appropriate partition key must be chosen for the target. The details can be referred directly within Striim’s configuration wizard.
Enter the target Azure Cosmos DB connection details and map the MongoDB to Azure Cosmos DB collections.
That’s it! Striim will handle the rest from validating the connection string and properties required for the data pipeline to automatically moving the data validating the data on the target. After completing the wizard, you’ll arrive at the Flow Designer page, and start seeing data replicated in real time.
Let’s take another example, say we have an on-premises Oracle database with the customer table shown below. While migrating this Oracle database to Azure Cosmos DB we may want to mask or hide the customer Telephone number and SSN columns.
In two simple steps, we can achieve this in flight with Striim.
Step 1 – Create App: Within the Striim UI create an application with a source Oracle Reader. In the left-hand menu bar under the Event Transformers tab, drag and drop the To DB Event Transformer. Then, drag and drop the Field Masker onto the pipeline and specify the fields to be masked. Insert type conversion of WA event type to Typed event and create Field Mask component and select the fields to be masked. In our case we want the Telephone number field to be partially masked and SSN to be fully masked. Lastly, drag and drop a Cosmos DB Target to write to Cosmos DB.
Step 2 – Run App: Deploy and run the app. Check the target Azure Cosmos DB Data Explorer you should see the customer phone number and SSN are masked.
Instead of using these out of the box transformations within the UI, you can also write SQL statements using a Continuous Query (CQ), or Java code using an Open Processor (OP) component. The OP can also be used to merge multiple source documents into a single Azure Cosmos DB document. For our example, you can use the attached SQL statement in a CQ instead of the two transformation components.
SELECT CUSTOMER_ID AS CUSTOMER_ID,
FIRST_NAME AS FIRST_NAME,
LAST_NAME AS LAST_NAME,
CITY_NAME AS CITY_NAME,
ADDRESS AS ADDRESS,
maskCreditCardNumber(TELEPHONE_NUMBER, “ANONYMIZE_PARTIALLY”) AS TELEPHONE_NUMBER,
maskCreditCardNumber(SSN, “ANONYMIZE_COMPLETELY”) AS SSN FROM converted_events2 i;
Source table
Striim flow design
Target Cosmos DB Data explorer output
Benefits
Purpose-built service with specific configuration parameters to control scale, performance and cost
Driving continuous cloud service consumption through ongoing data flow (vs. scheduled batch load).
In-flight transformations – including denormalization, filtering, aggregation, enrichment – to store only the data you need, in the right format
Allowing low-latency data to be available in Azure for more valuable workloads.
Mitigating risks in Azure adoption by enabling a phased transition, where customers can use their existing and new Azure systems in parallel. Striim can move real-time data from customers’ existing data warehouses such as Teradata and Exadata, and on-prem or cloud-based OLTP systems, such as Oracle, SQLServer, PostgreSQL, MySQL, and HPE Nonstop using low-impact change data capture (CDC).
As a unified data streaming and integration company, the Striim platform sits at the heart of our customers’ data architecture. It is crucial that our customers trust our software, and our company, to always do the right thing from a security perspective.
With that in mind, we are thrilled to announce that Striim is now officially SOC 2 Type 1 certified.
A SOC 2 assessment report provides detailed information and assurance about an organization’s security, confidentiality, availability, processing integrity, and/or privacy controls, based on their assurance of compliance with the American Institute of Certified Public Accountants (AICPA) Trust Services Principles and Criteria for Security. A SOC 2 report is often the primary document that the security departments of our customers will rely on to assess Striim’s ability to maintain adequate security.
SOC 2 compliance comes in two forms: the SOC 2 Type 1 report which describes the design of the controls we have in place to meet relevant trust criteria at specific point in time; and a SOC 2 Type 2 report which details the operational effectiveness of those controls over a specified period of time. These reports are the results of audits performed by independent third parties, in our case Grant Thornton LLP.
We have completed SOC 2 Type 1 and are in the process of the requisite assessments over time to complete SOC 2 Type 2.
To achieve this certification, we have undergone a year-long effort to ensure that our people, principles, and processes are fully aligned with the level of security our customers would expect from a SaaS company. This has involved investments in training and new technologies to help automate processes and protect infrastructure, and a lot of documentation, reporting, and continual internal reviews.
The scope of the report covers all people, systems, and processes involved in getting the Striim software into the hands of our customers, whether they are using Striim on-premise, in their own cloud environment, utilizing containers, or are one of the initial Striim Cloud private preview customers.
SOC 2 is not just a certification, it is a way of thinking, and a journey that requires a deep dive into everything you do. Completing this certification has given us the opportunity to solidify security as a number one operating principle within the company, and ensure that all actions involve security considerations. Now that we have all of the required controls in place, we are working diligently to show how we can maintain those controls throughout the year, as we work towards SOC 2 Type 2 certification. We’ll keep you posted.
With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service – Harsha Kapre, Director of Product Management at Snowflake
Data Integration for Snowflake: Announcing Striim on Partner Connect
At Striim, we value building real-time data integration solutions for cloud data warehouses. Snowflake has become a leading Cloud Data Platform by making it easy to address some of the key challenges in modern data management such as
It only took you minutes to get up and running with Snowflake. So, it should be just as easy to move your data into Snowflake with an intuitive cloud-based data integration service.
To give you an equally seamless data integration experience, we’re happy to announce that Striim is now available as a cloud service directly on Snowflake Partner Connect.
“Striim simplifies and accelerates the movement of real-time enterprise data to Snowflake with an easy and scalable pay-as-you-go model,” Director of Product Management at Snowflake, Harsha Kapre said. “With Striim now on Snowflake Partner Connect, customers can start loading their data in minutes with one-click access to a proven and intuitive cloud-based data integration service.”
John Kutay, Director of Product Growth at Striim, highlights the simplicity of Striim’s cloud service on Partner Connect: “We focused on delivering an experience tailored towards Snowflake customers; making it easy to bridge the gap between operational databases and Snowflake via self-service schema migration, initial data sync, and change data capture.”
A Quick Tutorial
We’ll dive into a tutorial on how you can use Striim on Partner Connect to create schemas and move data into Snowflake in minutes. We’ll cover the following in the tutorial:
Launch Striim’s cloud service directly from the Snowflake UI
Migrate schemas from your source database. We will be using MySQL for this example, but the steps are almost exactly the same other databases (Oracle, PostgreSQL, SQLServer, and more).
Perform initial load: move millions of rows in minutes all during your free trial of Striim
Kick off a real-time replication pipeline using change data capture.
Monitoring your data integration pipelines with real-time dashboards and rule-based alerts
But first a little background!
What is Striim?
At a high level, Striim is a next generation Cloud Data Integration product that offers change data capture (CDC) enabling real-time data integration from popular databases such as Oracle, SQLServer, PostgreSQL and many others.
In addition to CDC connectors, Striim has hundreds of automated adapters for file-based data (logs, xml, csv), IoT data (OPCUA, MQTT), and applications such as Salesforce and SAP. Our SQL-based stream processing engine makes it easy to enrich and normalize data before it’s written to Snowflake.
Our focus on usability and scalability has driven adoption from customers like Attentia, Belgium-based HR and well-being company, and Inspyrus, a Silicon Valley-based invoice processing company, that chose Striim for data integration to Snowflake.
While many products focus on batch data integration, Striim specializes in helping you build continuous, real-time database replication pipelines using change data capture (CDC).This keeps the target system in sync with the source database to address real-time requirements.
Before we dive into an example pipeline, we’ll briefly go over the concept of Change Data Capture (CDC). CDC is the process of tailing the database’s change logs, turning database events such as inserts, updates, deletes, and relevant DDL statements into a stream of immutable events, and applying those changes to a target database or data warehouse.
Change data capture is also a useful software abstraction for other software applications such as version control and event sourcing.
Imagine Each Event as a Change to an Entry in a Database
Striim brings decades of experience delivering change data capture products that work in mission-critical environments. The founding team at Striim was the executive (and technical) team at GoldenGate Software (now Oracle GoldenGate). Now Striim is offering CDC as an easy-to-use, cloud-based product for data integration.
Migrating data to Snowflake in minutes with Striim’s cloud service
Let’s dive into how you can start moving data into Snowflake in minutes using our platform. In a few simple steps, this example shows how you can move transactional data from MySQL to Snowflake. Let’s get started:
1. Launch Striim in Snowflake Partner Connect
In your Snowflake UI, navigate to “Partner Connect” by clicking the link in the top right corner of the navigation bar. There you can find and launch Striim.
2. Sign Up For a Striim Free Trial
Striim’s free trial gives you seven calendar days of the full product offering to get started. But we’ll get you up and running with schema migration and database replication in a matter of minutes.
3. Create your first Striim Service.
A Striim Service is an encapsulated SaaS application that dedicates the software and fully managed compute resources you need to accomplish a specific workload; in this case we’re creating a service to help you move data to Snowflake! We’re also available to assist with you via chat in the bottom right corner of your screen.
4. Start moving data with Striim’s step-by-step wizards.
In this case, the MySQL to Snowflake is selected. As you can see, Striim supports data integration for a wide-range of database sources – all available in the free trial.
5. Select your schemas and tables from your source database
6. Start migrating your schemas and data
After select your tables, simply click ‘Next’ and your data migration pipeline will begin!
7. Monitor your data pipelines in the Flow Designer
As your data starts moving, you’ll have a full view into the amount of data being ingested and written into Snowflake including the distribution of inserts, updates, deletes, primary key changes and more.
For a deeper drill down, our application monitor gives even more insights into low-level compute metrics that impact your integration latency.
Real-Time Database Replication to Snowflake with Change Data Capture
Striim makes it easy to sync your schema migration and CDC applications.
While Striim makes it just as easy to build these pipelines, there are some prerequisites to configuring CDC from most databases that are outside the scope of Striim.
To use MySQLReader, the adapter that performs CDC, an administrator with the necessary privileges must create a user for use by the adapter and assign it the necessary privileges:
CREATE USER 'striim' IDENTIFIED BY '******';
GRANT REPLICATION SLAVE ON *.* TO 'striim';
GRANT REPLICATION CLIENT ON *.* TO 'striim';
GRANT SELECT ON *.* TO 'striim';</code>
The MySQL 8 caching_sha2_password authentication plugin is not supported in this release. The mysql_native_password plugin is required. The minimum supported version is 5.5 and higher.
The REPLICATION privileges must be granted on *.*. This is a limitation of MySQL.
You may use any other valid name in place of striim. Note that by default MySQL does not allow remote logins by root.
Replace ****** with a secure password.
You may narrow the SELECT statement to allow access only to those tables needed by your application. In that case, if other tables are specified in the MySQLReader properties, Striim will return an error that they do not exist.
MYSQL BINARY LOG SETUP
MySQLReader reads from the MySQL binary log. If your MySQL server is using replication, the binary log is enabled, otherwise it may be disabled.
For MySQL, the property name for enabling the binary log, its default setting, and how and where you change that setting vary depending on the operating system and your MySQL configuration, so see the documentation for the version of MySQL you are running for instructions.
If the binary log is not enabled, Striim’s attempts to read it will fail with errors such as the following:
2016-04-25 19:05:40,377 @ -WARN hz._hzInstance_1_striim351_0423.cached.thread-2
com.webaction.runtime.Server.startSources (Server.java:2477) Failure in Starting
Sources.
java.lang.Exception: Problem with the configuration of MySQL
Row logging must be specified.
Binary logging is not enabled.
The server ID must be specified.
Add --binlog-format=ROW to the mysqld command line or add binlog-format=ROW to your
my.cnf file
Add --bin-log to the mysqld command line or add bin-log to your my.cnf file
Add --server-id=n where n is a positive number to the mysqld command line or add
server-id=n to your my.cnf file
at com.webaction.proc.MySQLReader_1_0.checkMySQLConfig(MySQLReader_1_0.java:605) ...</code>
Once those prerequisites are completed, you can run the MySQL CDC wizard and start replicating data from your database right where your schema migration and initial load left off.
Maximum Uptime with Guaranteed Delivery, Monitoring and Alerts
Striim gives your team full visibility into your data pipelines with the following monitoring capabilities:
Rule-based, real-time alerts where you can define your custom alert criteria
Real-time monitoring tailored to your metrics
Exactly-once processing (E1P) guarantees
Striim uses a built-in stream processing engine that allows high volume data ingest and processing for Snowflake ETL purposes.
Conclusion
To summarize, Striim on Snowflake Partner Connect provides an easy-to-use cloud data integration service for Snowflake. The service comes with a 7-day free trial, giving you ample time to begin your journey to bridge your operational data with your Snowflake Data Warehouse.
We are pleased to announce the general availability of Striim 3.10.1 that includes support for new and enhanced Cloud targets, extends manageability and diagnostics capabilities, and introduces new ease of use features to speed our customers’ cloud adoption. Key Features released in Striim 3.10.1 are directly available through Snowflake Partner Connect to enable rapid movement of enterprise data into Snowflake.
This new release introduces many new features and capabilities, summarized here:
Let’s review the key themes and features of this new release, starting with the new and expanded cloud targets
Striim on Snowflake Partner Connect
From Snowflake Partner Connect, customers can launch a trial Striim Cloud instance directly as part of the Snowflake on-boarding process from the Snowflake UI and load data, optionally with change data capture, directly into Snowflake from any of our supported sources. You can read about this in a separate blog.
Expanded Support for Cloud Targets to Further Enhance Cloud Adoption
The Striim platform has been chosen as a standard for our customers’ cloud adoption use-cases partly because of the wide range of cloud targets it supports. Striim provides integration with databases, data warehouses, storage, messaging systems and other technologies across all three major cloud environments.
A major enhancement is the introduction of support for the Google BigQuery Streaming API. This not only enables real-time analytics on large scale data in BigQuery by ensuring that data is available within seconds of its creation, but it also helps with quota issues that can be faced by high volume customers. The integration through the BigQuery streaming API can support data transfer up to 1GB per second.
In addition to this, Striim 3.10.1 also has the following enhancements:
Optimized delivery to Snowflake and Azure Synapse that facilitates compacting multiple operations on the same data to a single operation on the target resulting in much lower change volume
Delivery to MongoDB cloud and MongoDB API for Azure Cosmos DB
Delivery to Apache Cassandra, DataStax Cassandra, and Cassandra API for Azure Cosmos DB
Support for delivery of data in Parquet format to Cloud Storage and Cloud Data Lakes to further support cloud analytics environments
Schema Conversion to Simplify Cloud Adoption Workflows
As part of many cloud migration or cloud integration use-cases, especially during the initial phases, developers often need to create target schemas to match those of source data. Striim adds the capability to use source schema information from popular databases such as Oracle, SQL Server, and PostgreSQL and create appropriate target schema in cloud targets such as Google BigQuery, Snowflake and others. Importantly, these conversions understand data type and structure differences between heterogeneous sources and targets and act intelligently to spot problems and inconsistencies before progressing to data movement, simplifying cloud adoption.
Enhanced Monitoring, Alerting and Diagnostics
On-going data movement between on-premise and cloud environments for migrations, or powering reporting and analytics solutions, are often part of an enterprise’s critical applications. As such they demand deep insights into the status of all active data flows.
Striim 3.10.1 adds the capability to inherently monitor data from its creation in the source to successful delivery in a target, generate detailed lag reports, and alert on situations where lag is outside of SLAs.
In addition, this release provides detailed status on checkpointing information for recovery and high availability scenarios, with insight into checkpointing history and currency.
Simplifies Working with Complex Data
As customers work with heterogeneous environments and adopt more complex integration scenarios, they often have to work with complex data types, or perform necessary data conversions. While always possible through user defined functions, this release adds multiple commonly requested data manipulation functions out of the box. This simplifies working with JSON data and document structures, while also facilitating data cleansing, and regular expression operations.
On-Going Support for Enterprise Sources
As customers upgrade their environments, or adopt new technologies, it is essential that their integration platform keeps pace. In Striim 3.10.1 we extend our support for the Oracle database to include Oracle 19c, including change data capture, add support for schema information and metadata for Oracle GoldenGate trails, and certify our support for Hive 3.1.0
These are a high level view of the new features of Striim 3.10.1. There is a lot more to discover to aid on your cloud adoption journey. If you would like to learn more about the new release, please reach out to schedule a demo with a Striim expert.