Striim 5.0 Release: Supercharge Your Data Integration with Microsoft Dataverse Writer

Posted on February 25, 2025 by Striim Team | 3 min read | 4 views

Organizations need to ensure their systems are integrated seamlessly for efficient decision-making. Striim’s 5.0 release introduces the Microsoft Dataverse Writer, a powerful target adapter that enhances how businesses integrate and write data to Microsoft Dataverse. With its ability to handle both standard and custom objects, Striim allows companies to unlock real-time insights and drive more informed decisions across their operations.

What Does It Do?

The Microsoft Dataverse Writer adapter is designed to write events to Microsoft Dataverse using the Dataverse Web API. By leveraging the ODATA standard REST interfaces of Dataverse, this adapter supports all basic Data Manipulation Language (DML) operations. It allows you to write events to both standard and custom objects within a Dataverse instance. The adapter can handle writing large groups of events—up to 1000 events in a single OData Batch API call—offering significant scalability for enterprises. Additionally, you can configure an expiration property for the batch, adding flexibility in managing data flow.

How Do You Use It?

Using the Microsoft Dataverse Writer is simple and adaptable to your business needs. The adapter connects to a Dataverse instance using either a manual password credential grant flow or an automated OAuth flow with a public OAuth app. This flexibility ensures that you can securely and efficiently authenticate and authorize the connection between Striim and your Dataverse instance, giving you control over how data is managed and processed.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim provides immense value by enabling real-time data flow with high throughput and low latency. This ensures that businesses can handle large-scale data with minimal delay, providing immediate insights and improving decision-making processes. For example, Striim can power hyper-personalized marketing campaigns by integrating data across platforms, prioritize high-value customer interactions through lead and account scoring, and optimize operations like customer support ticket management, pricing, and inventory.

Plus, Striim’s integration with Microsoft Dataverse enhances the flexibility and accuracy of your data by supporting both standard and custom objects. Although the adapter currently does not support relationships across tables (like foreign keys), the ability to specify key columns ensures data consistency and integrity across your operations.

Transform Your Business Today!

Striim’s Microsoft Dataverse Writer empowers businesses to harness the full potential of their Dataverse data, driving real-time insights and enabling smarter business strategies.

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Striim 5.0 Release: Streamline Data Integration with ServiceNow Reader and Writer

Posted on February 25, 2025 by Striim Team | 2 min read | 4 views

Striim’s new 5.0 release introduces the ServiceNow Reader and Writer adapters, a game-changing feature that enhances how companies integrate and manage their data with ServiceNow. By seamlessly reading from and writing to ServiceNow’s platform, Striim enables businesses to optimize workflows and improve operational efficiency like never before.

What Does It Do?

The ServiceNow Reader and Writer adapters leverage ServiceNow’s REST APIs to read from and write into the tables and objects within the ServiceNow instance. The ServiceNow Reader emits WAEvents, which can then be propagated to various target systems supported by Striim. Whether it’s integrating with CRM systems like Salesforce or pushing data to analytics platforms, this powerful feature streamlines the movement of data and ensures that businesses have access to real-time insights.

How Do You Use It?

Using Striim’s ServiceNow Reader and Writer adapters is straightforward. The reader allows you to pull data from your ServiceNow platform in real-time, making it ideal for applications that require constant updates, such as customer service and IT operations. With this functionality, you can ensure that your systems are always working with the latest data, enabling more responsive and efficient operations.

Meanwhile, the writer enables reverse ETL functionality, allowing you to push insights from analytics systems back into ServiceNow. This makes it easier to implement use cases like Next Best Action or Real-Time Personalization. By doing so, your teams are provided with the most up-to-date information to act on, ensuring more informed decisions and optimized customer interactions.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim brings immense value by providing the fastest way to read from ServiceNow in real-time, enhancing operational workflows and ensuring your data is always fresh. It also supports a wide range of data warehouses and data lake applications, enabling companies to move their ServiceNow data quickly for building analytical reports, identifying trends, and optimizing process automation. With Striim’s power of real-time data migration, businesses can extract greater value from their ServiceNow data, improving decision-making and operational efficiency.

Transform Your Business Today!

Striim’s ServiceNow Reader and Writer are the perfect tools to help you transform your business, ensuring that your workflows and data integration processes remain agile and efficient.

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Striim 5.0 Release: Unlock Real-Time Marketing Insights with the Google Ads Reader

Posted on February 25, 2025 by Striim Team | 3 min read | 4 views

Real-time insights are crucial for making data-driven decisions and staying ahead of the competition. Striim 5.0’s latest feature, the Google Ads Reader, helps businesses unlock the full potential of their Google Ads data by providing seamless, real-time integration with analytical systems like BigQuery and Snowflake. Let’s dive into what this feature can do, how to use it, and the value Striim adds to your business.

What Does It Do?

The Google Ads Reader enables businesses to sync their Google Ads data in real time using the Google Ads API v15. This integration simplifies the process of reading, analyzing, and acting on your ads data by directly feeding it into powerful analytics platforms. Whether you’re analyzing campaign performance, user engagement, or ad effectiveness, Striim helps you create smart, efficient data pipelines that provide valuable insights for better decision-making.

How Do You Use It?

Striim’s Google Ads Reader supports three modes of operation, allowing you to choose the best approach based on your needs:

Initial Load: Start by setting the mode in the Google Ads Reader to load your historical data, providing a baseline for further analysis.
Incremental Load: Once the initial load is complete, use the “Incremental Load” mode for near real-time continuous replication. The adapter reads new source data at regular intervals, ensuring your insights stay up to date.
Automated Mode: In this mode, the reader completes the initial load at the application’s start time. After that, it automatically begins polling for incremental changes at the specified interval, making it a seamless and hands-off solution for ongoing data synchronization.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim’s real-time data integration offers significant advantages for businesses looking to optimize their marketing strategies:

Real-Time Data Integration: Striim allows you to ingest and process data in real time, writing it directly into popular data warehouses like Google BigQuery and Snowflake. This enables faster, more informed decisions, helping you respond quickly to shifts in market dynamics and customer behavior.
Historical Data Analysis: With Striim, you can bypass Google Ads’ data retention limits and store historical data for long-term trend analysis. This feature allows you to track the performance of your ads campaigns over time, providing deeper insights into what works and what doesn’t.

Transform Your Business with Real-Time Insights

Striim’s Google Ads Reader makes it easy to integrate your Google Ads data into your analytics systems for smarter, data-driven marketing decisions. Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Striim 5.0 Release: Unlock Real-Time Insights with the JIRA Reader Integration

Posted on February 25, 2025 by Striim Team | 3 min read | 4 views

Striim 5.0 Release: Unlock Real-Time Insights with the JIRA Reader Integration

Striim 5.0 brings exciting new features that streamline real-time data management and empower businesses to make data-driven decisions faster. Among these, the new Atlassian JIRA Reader stands out as a key innovation, enabling seamless integration with JIRA, a powerful issue tracking system widely used for bug tracking and project management. In this post, we’ll explore what the JIRA Reader does, how to use it, and the value Striim adds to your business.

What Does the JIRA Reader Do?

The JIRA Reader in Striim 5.0 facilitates the movement of data from a JIRA instance to any supported Striim target system. By leveraging JIRA’s REST APIs (version 9.11), it reads tables and objects from JIRA and emits WAEvents, which can be propagated to Striim’s target systems. Whether you’re analyzing project data, bug reports, or team performance, the JIRA Reader can help you manage and analyze this data in real-time. For a comprehensive list of supported targets, you can check out Striim’s documentation.

How Do You Use It?

Using the JIRA Reader is simple. The reader operates in “Automated” mode, where it first performs an initial load of data from your JIRA instance, starting at the application’s specified time. Once the initial load is complete, it automatically begins polling for incremental changes at the configured interval, ensuring continuous data synchronization. This automated process saves time and reduces the need for manual intervention, making it easy to keep your data fresh and up to date.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim’s JIRA Reader enhances your ability to manage large-scale data with high throughput and low latency. Real-time data flow ensures that you can process and analyze information as it comes in, offering immediate insights for decision-making.

For example, by using the JIRA Reader, you can create real-time reports on project progress, bug data, team velocity, and key performance indicators (KPIs). Writing JIRA project data into data lakes or warehouses enables you to identify patterns, potential problems, and areas for improvement. With this kind of actionable insight, your business can make smarter decisions, improve processes, and boost team productivity.

Transform Your Business Today!

The JIRA Reader feature, combined with Striim’s real-time capabilities, helps businesses stay agile and responsive in today’s fast-paced environment. By integrating your JIRA data with Striim’s platform, you unlock the power of data-driven decision-making and continuous improvement.

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Striim 5.0 Release: Unleash Real-Time HubSpot Integration with Our Latest Connector

Posted on February 25, 2025 by Striim Team | 3 min read | 3 views

As businesses increasingly rely on HubSpot’s customer platform for marketing, sales, customer service, and more, the ability to seamlessly move data in real time is a game changer. Striim 5.0 introduces the HubSpot Reader, a powerful connector that integrates your HubSpot CRM data with any target system. With Striim’s robust real-time data movement capabilities, you can unlock the full potential of your HubSpot data to enhance analytics, streamline operations, and improve customer experiences.

What Does It Do?

Striim’s HubSpot Reader uses HubSpot APIs to extract data from your CRM platform, emitting WAEvents that can be processed with continuous queries or directed to a Striim target. This connector supports three distinct modes:

Initial Load: Pulls all existing data from HubSpot, ideal for creating a foundational dataset.
Incremental Load: Captures and replicates new source data in near real-time, ensuring your systems are always up to date.
Automated Mode: Combines both approaches, completing an initial load before transitioning to incremental updates automatically.

This flexibility allows businesses to tailor data movement workflows to their specific needs, whether for one-time migrations or ongoing synchronization.

How Do You Use It?

Using the HubSpot Reader is simple and adaptable. After configuring the mode of operation, you can seamlessly connect to HubSpot to read data and direct it to any target supported by Striim, including leading data warehouses and data lakes.

For example:

Begin with an Initial Load to build a comprehensive dataset from your HubSpot environment.
Switch to Incremental Load to enable continuous replication, capturing changes like new leads, updated deals, or customer interactions.
Use Automated Mode to eliminate manual intervention, ensuring uninterrupted, real-time updates.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim enhances the value of your HubSpot data by integrating it with various systems to support advanced analytics, improve customer support, and boost operational efficiency. With real-time data migration, businesses gain faster decision-making and up-to-date insights. Its seamless integration capabilities connect HubSpot to data warehouses, lakes, or operational systems, providing a unified view of your business. Additionally, Striim offers enhanced flexibility for diverse use cases, from creating analytical reports and optimizing marketing campaigns to elevating customer support workflows.

Experience the Power of Striim 5.0

Striim 5.0 takes HubSpot data integration to the next level with a connector that’s as flexible as it is powerful. Whether you’re laying the groundwork for data-driven initiatives or fine-tuning your existing workflows, the HubSpot Reader ensures real-time, reliable data movement.

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Striim 5.0 Release: Supercharge Customer Service with the Zendesk Reader

Posted on February 25, 2025 by Striim Team | 2 min read | 3 views

Real-time access to data is essential for delivering outstanding customer experiences. Striim’s 5.0 release introduces the Zendesk Reader, a powerful tool that enables businesses to seamlessly integrate their Zendesk data into their broader data ecosystem. This integration enhances decision-making and helps teams improve customer service efficiency by providing timely insights from their help desk management system.

What Does It Do?

The Striim Zendesk Reader ingests data from Zendesk’s cloud-based help desk platform and emits WAEvents, which can be processed through continuous queries or directed to any supported Striim target. By leveraging the Zendesk API, the reader reads the user’s objects and tables, delivering data directly to the Striim platform. This provides a streamlined way to access and use critical customer service data for business analytics and decision-making.

How Do You Use It?

The Zendesk Reader can be used in two modes: the initial load mode and incremental load mode. For an initial load, you can set the mode in the Intercom Reader, allowing you to extract all relevant Zendesk data for the first time. After the initial load, you can switch to “Incremental Load” mode for near real-time continuous replication. This mode enables the adapter to read new source data at regular intervals, ensuring that you always have the latest updates flowing through your systems.

To use the Zendesk Reader, the user should have access to a Zendesk instance or an Access token of the OAuth client registered to the instance. This ensures the necessary permissions are in place for data extraction and integration.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim’s Zendesk Reader delivers immense value by enabling real-time data flow with high throughput and low latency. This ensures the seamless handling of large-scale data, giving businesses immediate access to valuable insights. By writing data in real time to a data warehouse, you can build a comprehensive Customer Data Platform (CDP) to enhance your customer insights and decision-making processes.

Plus, Striim empowers businesses to integrate Zendesk data with machine learning (ML) and analytics systems for advanced workflows like Next Best Action, LTV (Lifetime Value) Analysis, and churn analysis. These integrations allow you to anticipate customer needs and make data-driven decisions that improve customer satisfaction and retention.

Transform Your Business Today!

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

5.0 Release: Unlocking the Power of Snowflake CDC for Real-Time Data Replication

Posted on February 25, 2025 by Striim Team | 2 min read | 5 views

What is Snowflake CDC?

Snowflake CDC (Change Data Capture) is a method that enables real-time data replication from Snowflake databases by tracking and capturing changes made to tables. Using a specialized Snowflake Reader, it enables continuous replication after an initial load, ensuring that any data manipulation language (DML) changes like inserts, updates, and deletes are identified and captured in near real-time.

What Does It Do?

The Snowflake Reader is designed to monitor and read changes occurring in a Snowflake database. It identifies changes in tables through a “CHANGES” clause, querying the table at incrementing time intervals to ensure up-to-date information. This process is ideal for scenarios where keeping track of ongoing data modifications is essential for accurate analytics, reporting, or operational use cases.

The Snowflake Reader can capture both DML changes and certain limited DDL (Data Definition Language) changes, keeping your data in sync and allowing you to confidently use Snowflake as a dynamic, continuously updated data source.

How Do You Use It?

Initial Load: Start by using the standard Database Reader to load your data into Snowflake for the first time.
Continuous Replication: Once the initial load is complete, the Snowflake Reader takes over, enabling CDC to maintain ongoing updates in real time. This setup is beneficial for applications that require near real-time data synchronization, reducing latency and ensuring the data stays fresh.

Want to dive deeper? Check out the doc and explore more.

How Does Striim Add Value?

Striim’s Snowflake CDC functionality supports several high-impact use cases:

Reverse ETL: Many organizations need to read analytics results from Snowflake and apply those insights directly in operational systems like CRM, SCM, or other transactional databases. With Snowflake CDC, Striim enables this seamless reverse ETL process, allowing data like customer lifetime value (LTV) or churn predictions to be easily updated across systems.
Data Warehouse Consolidation: Companies with multiple departmental data warehouses can use Snowflake CDC to continuously sync data across these instances, ensuring a consistent and consolidated view at the corporate level.

Additional Highlights

Snowflake CDC Reader supports all Snowflake data types, except for the Vector type, making it flexible enough to handle diverse data requirements.

Ready to power your business with real-time data? Try Striim today with a free trial or book a demo to see it in action.

Start Your Free Trial | Schedule a Demo

Securing Data in Striim: Cryptographic Key Management in Striim, Field-Level Encryption, and Vault Integration

Posted on February 24, 2025 by Swarit Joshipura | 6 min read | 5 views

How Striim Keeps Enterprise Grade Streaming Workloads Safe

The Streaming Data Security Challenge

At Striim our approach to security encompasses protecting data both in motion and at rest, utilizing advanced encryption techniques and integrating with powerful tools.

In this blog, I’ll walk you through 2 small pieces of this architecture on how Striim secures sensitive data & metadata (a) in motion and (b) at rest.

The Striim Shield: Protecting Data In Motion

One of the standout features gaining traction in this landscape is field-level encryption within data pipelines. While some startups have made this their sole focus, Striim takes it a step further by offering this capability at no extra cost through our innovative practice known as encrypted streaming.

With Shield, you can easily encrypt one or more fields within a data stream with your own self-managed encryption key. This means that as your data flows downstream—eventually reaching its external target—those sensitive fields remain securely encrypted.

In the above, you can clearly see that the SSN, or social security number, is being encrypted through the Shield component.

So, how does this work?

The system architecture is centered around a Key Management Service (KMS) that maintains a master Key Encryption Key (KEK). This KEK is utilized to encrypt the Data Encryption Key (DEK), which is generated by the Striim server using the Tink cryptographic library.

The encryption process is optimized through a batching mechanism. A Tink client is batched to utilize a specific KEK for a predetermined number of events flowing through the data pipeline. This approach balances computational efficiency with security requirements.

The implementation serves two primary technical objectives:

Encryption of data prior to warehouse ingestion, ensuring protection during network transit.
Maintenance of data encryption within the warehouse, providing security for data at rest.

So, what about decryption? – to decrypt the data, customers must utilize Tink in conjunction with the same KMS. This ensures that decryption is only possible for authorized entities possessing the correct cryptographic keys.

Here’s how you can get your hands dirty with it in Striim

Create a connection profile which connects to the KMS
Create a shield component in the flow designer which uses this connection profile
Deploy & Run your application, and sit back and enjoy encrypted real-time data streaming!

It’s important to note that since we send the data encryption key over the network to be encrypted by the KMS, there are performance implications in terms of speed while using this feature.

For simpler encryption than Shield, Striim offers the WITH ENCRYPTION option during application creation. This secures data streams between Striim servers or from Forwarding Agents, especially useful when data sources are external to the cluster or private network. You can apply encryption at the application or flow level to balance security and performance, encrypting only sensitive data streams as needed.

The Striim Vault: Protecting Customer Metadata

Now that we’ve discussed how data in motion is safe in Striim at rest, we’ll dive into how data is kept safe at rest. Striim maintains customer metadata in our Metadata Repository (MDR). This includes all information such as pipeline source & target metadata, user information, pipeline deployment plans and more. While this repository is crucial for storing various configuration details, we understand that certain information—such as passwords and table names—requires an extra layer of security.

When users input passwords for database connections or secure JDBC connections, Striim doesn’t simply store this sensitive information in plain text. Instead, we’ve developed a sophisticated system to protect these credentials – Striim integrates seamlessly with external vaults, providing customers with a secure way to manage their sensitive data. This integration allows customers to avoid storing sensitive information directly in the MDR, even though we do encrypt that before persisting it as well.

Our vault integration includes several advanced security features:

Token Auto-renewal: Customers can configure settings for Striim to automatically renew the vault token periodically. This ensures that even if a token is leaked, it becomes invalid after a certain time.
The Striim server operates a timer thread within the JVM that regulates the frequency at which the HashiCorp Renew Endpoint is accessed.

Reference-based Storage: In Striim, we maintain a reference to the connection in the vault and the key that the customer wants to use as the password for their database or any other sensitive field.
Runtime Connection: Striim makes the connection to the vault at runtime. This means the sensitive value is only stored in memory and never persisted to the MDR.

One of the many vaults we integrate with is the HashiCorp Vault. Here’s how it works:

Customers maintain their keys and values on their HashiCorp server.
In Striim, customers create a reference to their vault. You provide only an access token to connect to the HashiCorp Server, rather than the actual credentials.
Once the vault is created, customers can see their HashiCorp Vault keys, as well as the usage of the vault in Striim Application components such as Sources & Targets.
Striim users can use this vault key as the password in their source/target (e.g., Oracle Reader) – this ensures that the raw password is never stored in our MDR, and only maintained in-memory for this connection.
Deploy and Run Pipeline – This is the final step which allows you to run your data pipeline and drive mission critical business applications securely, and in real-time!

Benefits of Using Vaults in Striim

Enhanced Security: Sensitive data is never stored at rest in Striim’s MDR.
Flexibility: Customers can use their preferred vault solution. We integrate with Google’s Secret Manager, HashiCorp Vault, Azure Key Vault, and even provide our own homegrown Striim Vault.
Reduced Risk: Sensitive credentials existing in a completely segregated system designed for data protection allows for heightened security.

For this post we focused on protecting and data in motion with Shield, and data at rest with vaults. In an era where data breaches can cost companies millions and erode customer trust, Striim’s commitment to secure data processing provides peace of mind. By addressing security at every stage of the data lifecycle, from ingestion to processing to storage, we enable businesses to harness the power of their data without compromising on safety.

This article only scratches the surface in how we keep your data safe. There are so many more amazing features in our new release, including automated PII detection, OAuth connectivity, SSO/SAML support, and more to safely activate your siloed data in real-time.

Ready to learn more? Sign up for a demo today.

Reinventing Data Governance for the AI Era: Embracing Automation and Intelligent Data Protection

Posted on February 18, 2025 by Striim Team | 5 min read | 5 views

As organizations increasingly rely on AI to drive innovation and efficiency, protecting sensitive data has become both a strategic necessity and a regulatory mandate. Traditional security measures, often reactive and manual, no longer suffice. Instead, we now stand at the cusp of a new era where data governance is automatic, intelligent, and built to match the speed of AI.

Let’s explore how AI-driven sensitive data protection is transforming data security. Then, discover how Striim’s AI agents are leading the way in this revolution.

The New Age of Data Governance

Despite the widespread deployment of multiple API security products, recent surveys reveal a staggering statistic: 92% of organizations experienced an API-related security incident in the last year, with 57% encountering multiple incidents. This alarming reality underscores the limitations of traditional security measures and highlights the urgent need for more intelligent, automated solutions.

Historically, safeguarding hackable data required a labor-intensive process—manual audits, constant monitoring, and a reactive approach to threats. However, the reality of today’s fast-moving data environment demands a radical shift. With the advent of AI-driven security, sensitive data can be detected, classified, and protected in real time. This proactive stance eliminates the need for constant manual oversight. Protecting sensitive data helps organizations work towards compliance and reduce the risk of human error.

Imagine a world where sensitive data moves through systems effortlessly, but never without oversight. Striim’s AI-powered approach ensures this by detecting and classifying data before it even reaches storage. Continuous scanning identifies sensitive data the moment it’s created—not after it’s stored—while proactive security mechanisms like real-time masking, encryption, or redaction safeguard the information from exposure. Striim enables businesses to instantly manage and protect sensitive data, making it possible to adhere to regulations like GDPR, CCPA, and HIPAA. The result? Data flows freely and securely, empowering businesses to focus on what matters most.

Enter Striim’s AI Agents Sentinel and Sherlock: Pioneering AI-Powered Data Governance

Striim’s AI agents, Sentinel and Sherlock, are pioneering tools that bring real-time, AI-powered governance to your data pipelines, increasing security without compromising performance.

Sherlock AI offers:

Source Operation: Identifies sensitive data before it enters data pipelines—even in third-party-managed databases and SaaS environments.
Early Detection: Finds sensitive data before it moves, eliminating risk at the earliest stage.
Comprehensive Visibility: Works seamlessly across SaaS, cloud, and third-party environments to ensure full visibility.
Lightweight Scanning: Operates with zero performance impact, ensuring databases aren’t overloaded.
Automated Classification: Classifies financial, health, and identity-related PII automatically, providing real-time security insights.
Data Quality Monitoring: Detects data quality issues in real time, alerting teams when sensitive data appears in unintended locations.

Sentinel AI provides:

In-Motion Protection: Provides real-time detection and protection of sensitive data as it moves across systems.
Accurate Detection: Spots PII anywhere in a record—even if it’s misplaced or mislabeled—beyond the scope of rules-based controls.
Exposure Prevention: Prevents data exposure when transferring information from internal systems to external platforms for analytics or exchange.
Compliance Support: Supports 25+ sensitive data types across the USA, Canada, UK, and India to support various compliance requirements.
Automated Actions: Executes policy-based actions such as encryption and masking (partial, full, regex-based) automatically.
Plug-and-Play UX: Easily integrated into your pipeline with a plug-and-play setup that requires only a few clicks.
Regulatory Governance: Supports businesses on their journey to meet GDPR, CCPA, HIPAA, and other regulatory requirements.

Together, Sherlock AI and Sentinel AI work to prevent sensitive data exposure before it happens, ensuring your operations remain secure and that your team is in full control of its data.

How AI-Powered Data Governance Works

Our process begins with Sherlock AI, which proactively identifies sensitive data at its source—before it moves. By scanning both structured and unstructured data across SQL, NoSQL, SaaS, and cloud databases, it detects and automatically classifies financial, health, and identity-related information that may present compliance challenges.

As data moves, Sentinel AI validates it in real time using advanced pattern recognition and NLP, catching any mislabeled or misplaced data that traditional rules-based systems might overlook. Sentinel AI then applies automated protection measures—encrypting, masking, or blocking data based on business policies—to secure its movement between internal and external systems and prevent unintended processing of regulated information.

Sentinel delivers live reporting via real-time dashboards that continuously monitor sensitive data exposure, security actions, and compliance. It uses predefined identifiers to detect, log, and protect sensitive information, while AI-driven metadata tags each event for effective tracking and auditing. With support for schema evolution, Sentinel easily adapts to new data sources, ensuring ongoing AI-powered data governance.

This continuous monitoring helps organizations stay audit-ready and compliant. Real-time dashboards provide complete visibility into data protection efforts, and Sentinel generates audit logs that align regulations like GDPR, CCPA, HIPAA, and the EU AI Act. Additionally, it integrates with enterprise security tools such as SIEM, DLP, Datadog, and Snowflake Security to ensure a unified security framework.

The Impact of AI-Powered Automation on Data Governance

By automating these processes, organizations no longer need to scramble after a potential data breach. Instead, security becomes a built-in feature of data management. Sensitive information is automatically shielded by AI agents as it moves through the enterprise ecosystem, whether in production environments, during testing, or throughout analytics workflows.

Automated authentication and connection processes also reduces strain on IT teams. This allows security professionals to shift their focus from routine monitoring to strategic initiatives, such as threat intelligence and proactive risk management. With Sentinel AI operating silently in the background, businesses can innovate without fear of compromising their sensitive data.

By ensuring that sensitive data is protected, organizations can also enhance customer trust. In addition, streamlined security processes translate into improved operational efficiency. Data flows remain uninterrupted, and the risk of security incidents is drastically minimized.

Moving Forward in the AI Era

The AI era requires businesses to rethink traditional approaches to data security. With the speed at which data moves and the sophistication of modern cyber threats, it’s clear that reactive measures are no longer sufficient. Automated, intelligent solutions are not just an option—they are a necessity.

Get a demo today and discover how Striim can help you better protect your data.

Real-Time RAG: Streaming Vector Embeddings and Low-Latency AI Search

Posted on February 10, 2025 by Ganesh Bushnam, Mahadevan Lakshminarayanan and John Kutay | 12 min read | 5 views

Imagine searching for products on an online store by simply typing “best eco-friendly toys for toddlers under $50” and getting instant, accurate results—while the inventory is synchronized seamlessly across multiple databases. This blog dives into how we built a real-time AI-powered hybrid search system to make that vision a reality. Leveraging Striim’s advanced data streaming and real-time embedding generation capabilities, we tackled challenges like ensuring low-latency data synchronization, efficiently creating vector embeddings, and automating inventory updates.

We’ll walk you through the design decisions that balanced consistency, efficiency, and scalability and discuss opportunities to expand this solution to broader Retrieval-Augmented Generation (RAG) use cases. Whether you’re building cutting-edge AI search systems or optimizing hybrid cloud architectures, this post offers practical insights to elevate your projects.

What is RAG?

Retrieval-Augmented Generation (RAG) enhances the capabilities of large language models by incorporating external data retrieval into the generation process. It allows the model to fetch relevant documents or data dynamically, ensuring responses are more accurate and context-aware. RAG bridges the gap between static model knowledge and real-time information, which is crucial for applications that require up-to-date insights. This hybrid approach significantly improves response relevance, especially in domains like e-commerce and customer service.

Why Vector Embeddings and Similarity Search?

Vector embeddings translate natural language text into numerical representations that capture semantic meaning. This allows for efficient similarity searches, enabling the discovery of products even if queries differ from stored descriptions. Embedding-based search supports fuzziness, matching results that aren’t exact but are contextually relevant. This is essential for natural language search, as it interprets user intent beyond simple keyword matching. The combination of embeddings and similarity search improves the user experience by providing more accurate and diverse search results.

Why Real-time RAG Instead of Batch-Based Data Sync?

Real-time RAG ensures that inventory changes are reflected instantly in the search engine, eliminating stale or outdated results. Unlike batch-based sync, which can introduce latency, real-time pipelines offer continuous updates, improving accuracy for fast-moving inventory. This minimizes the risk of selling unavailable products and enhances customer satisfaction. Real-time synchronization also supports dynamic environments where product data changes frequently, aligning search capabilities with the latest inventory state.

How We Designed the Embedding Generator for Performance

In designing the Vector Embedding Generator, we addressed the challenges associated with token estimation, handling oversized input data, and managing edge cases such as null or empty input strings. These design considerations ensure that the embedding generation process remains robust, efficient, and compatible with various AI models.

Token Estimation and Handling Large Data

Google Vertex AI

Vertex AI simplifies handling large data inputs by silently truncating input data that exceeds the token limit and returning an embedding based on the truncated input. While this approach ensures that embeddings are always generated regardless of input size, it raises concerns about data loss affecting embedding quality. We have an ongoing effort to analyze how this truncation impacts embeddings and whether improvements can be made to mitigate potential quality issues.

OpenAI

OpenAI enforces strict token limits, returning an error if input data exceeds the threshold (e.g., 2048 or 3092 tokens). To handle this, we integrated a tokenizer library into the Embedding Generator’s backend to estimate token counts before sending data to the API. The process involves:

Token Count Estimation: Input strings are tokenized to determine the estimated token count.
Iterative Truncation: If the token count exceeds the model’s limit, we truncate the input to 75% of its current size and recalculate the token count. This loop continues until the token count falls within the model’s threshold.
Submission to Model: The truncated input is then sent to OpenAI for embedding generation.

For instance, if an OpenAI model has a token limit of 3092 and the estimated token count for incoming data is 4000, the system will truncate the input to approximately 3000 tokens (75%) and re-estimate. This iterative process ensures compliance with the token limit without manual intervention.

Handling Null or Empty Input

When generating embeddings, edge cases like null or empty input can result in API errors or undefined behavior. To prevent such scenarios, we adopted a solution inspired by discussions in the OpenAI developer forum: the use of a default vector.

Characteristics of the Default Vector:

Dimensionality: Matches the size of embeddings generated by the specific model (e.g., 1536 dimensions for OpenAI’s text-embedding-ada-002).
Structure: The vector contains a value of 1.0 at the first index, with all other indices set to 0.0.
- Example:[1.0, 0.0, 0.0, … , 0.0]

By returning this default vector, we ensure the system gracefully handles cases where input data is invalid or missing, enabling downstream processes to continue operating without interruptions.

Summary of Implementation:

Preprocessing
- Estimate token counts and handle truncation for models with strict token limits (OpenAI).
- Allow silent truncation for models like Google Vertex AI but analyze its impact.
Error Handling
- For null or empty data, return a default vector matching the model’s embedding dimensions.
Scalability
- These mechanisms are integrated seamlessly into the Embedding Generator, ensuring compatibility across multiple data streams and models without manual intervention.

This design enables developers to generate embeddings confidently, knowing that token limits and edge cases are managed effectively.

Tutorial: Using the Embeddings Generator for Search

An e-commerce company aims to build an AI-powered hybrid search that enables users to describe their needs in natural language.

Their inventory management system is in Oracle database and the store front search database is maintained in the Azure PostgetSQL.

Current problem statements are:

Data Synchronization: Inventory data from Oracle must be replicated in real-time to the storefront’s search engine to ensure data consistency and avoid stale information.
Vector Embedding Generation: Product descriptions need vector embeddings to facilitate similarity searches. The storefront must support storing and querying these embeddings.
Real-Time Updates: Ongoing changes in the inventory (such as product details or stock updates) need to be reflected immediately in the search engine.
Embedding Updates: Updates to products in the inventory should trigger real-time embedding regeneration to prevent outdated or inaccurate similarity search results.

Solution

Striim has all the necessary features for the use case and the problem statements described.

Readers to capture initial snapshot and real-time changes from Oracle database.
Embedding generator to generate vector embeddings for text content.
Writers to deliver the data along with the embeddings to Postgres database.

Striim Features :

Readers: Capture the initial snapshot and ongoing real-time changes from the Oracle database.
Embedding Generator: Creates vector embeddings for product descriptions.
Writers: Deliver updated data and embeddings to the PostgreSQL database.

PostgreSQL with pgvector:

Supports storing vector embeddings as a specific data type.
Enables similarity search functionality directly within the database.

Note: The search engine for this solution is implemented in Python for demonstration purposes. OpenAI is used for embedding generation and for summarisation of the results.

Design Choices and Rationale

Striim for Data Integration: Chosen for its seamless real-time change capture (CDC) from Oracle to PostgreSQL.
1. Alternative could be to have an independent application/script periodically but would be inefficient and expensive to have this developed for the initial load and change data capture, also would be difficult to maintain over a period of time for various data sources which might also need to be synced to the store front database (Postgres)
OpenAI Embeddings: Ensures high-quality embeddings compatible across pipeline stages.
Striim Embedding generator: Enables in-flight embedding generation while the data is being synced instead of having to move the data separately and then generate and update the embedding.
1. Alternative is to have separate listeners/triggers/scripts in the search front database (Postgres) to generate and update the embeddings every time the data changes. This could be very expensive and not be very accurate as the data changes and the embedding changes can go out of sync.
pgvector: Facilitates native vector searches, reducing system complexity.
1. Alternative design choices were to choose the standalone vector databases but they do not provide the flexibility pgvector provides as we can store the actual data and the embeddings in a relational database setup compared to vector databases where we need to maintain metadata for each vector database and cross lookup for actual data from a different database/source while summarising the similarity search results. Eg., the embeddings would be in one place which needs to be queried using similarity search query whereas price or rating-based filters need to be applied elsewhere.
Python Search Engine: Provides flexibility and integration simplicity.
1. This is a convenient choice to make use of the python libraries, more details are included in the upcoming sections

Future Work

Expand embedding model options beyond OpenAI and make it generic
1. This is already done for the Striim Embedding generator as it supports VertexAI as well. We could consider supporting self-hosted models.
Expand the support for non-textual input.
Expand the implementation to generically cover any use case and have the application interface integrated with Striim UI.
1. Current implementation as a proof of concept is tightly coupled with the e-commerce use case and it’s data set.

Step-by-step instructions

Set up Striim Developer 5.0

Sign up for Striim developer edition for free at https://signup-developer.striim.com/.
Select Oracle CDC as the source and Database Writer as the target in the sign-up form.

Prepare the dataset

The dataset can be downloaded from this link: https://raw.githubusercontent.com/GoogleCloudPlatform/python-docs-samples/main/cloud-sql/postgres/pgvector/data/retail_toy_dataset.csv

(selected set of 800 toys from the Kaggle dataset – https://www.kaggle.com/datasets/promptcloud/walmart-product-details-2020)

Have the above data imported into Oracle. Please use the following table definition to import the data :

A peek into the data :

Create Embedding generator

Go to StriimAI -> Vector Embeddings Generator and create a new Embeddings Generator

Provide a unique name for the embedding generator.
Choose OpenAI as model provider
Enter the API key copied over from your OpenAI platform account – https://platform.openai.com/settings/organization/api-keys
Enter “text-embedding-ada-002” as the model
1. The same model should be used in the Python code for converting the user query into embedding before similarity search as well.
Later in this blog, the above embedding generator will be used in the pipeline for generating vectors using a Striim builtin function – generateEmbeddings()

Setup the automated data pipeline from Oracle to Postgres

Go to Apps -> Create An App -> Choose Oracle as Source and Azure Postgres as Target

Follow the guided wizard flow to configure the pipeline with source, target connection details and the table selection. In the review screen, make sure to choose the “Save & Exit option”

Note: Please follow the prerequisites for Oracle CDC from Striim doc – https://striim.com/docs/en/configuring-oracle-to-use-oracle-reader.html

The table used in Postgres :

Please note that the embedding column is created additionally in Postgres.

Customize the Striim pipeline

Go back to the pipeline using Apps -> View All Apps
Open the IL app RealTimeAIDemo_IL
Open the reader configuration -> Disable “Create Schema” under “Schema and Data Handling” as we are creating the schema on Postgres already.

Click the output stream of the reader and add a Continuous Query component (CQ) to it to generate embeddings for the description column using the embedding generator we created above.

The CQ essentially puts the embeddings into the userdata section of the source event, which can be used to write to the embedding column in Postgres table.

Here the output of the generateEmbedding() function will be placed as part of the ‘embedding’ section (as part of the user data) in the source event.

Click to the target and change the input stream to the output of CQ (OutputWithEmbedding). In the Tables property, also add a mapping for embedding column under Data Selection :

AIDEMO.PRODUCTS,aidemo.products ColumnMap(embedding=@USERDATA(embedding)

This will insert or update the ‘embedding’ column of the Postgres table with the generated vectors from the generateEmbedding() call. This column holds the vector value of the ‘description’ column value.

Perform the same steps for the CDC app in the pipeline as well.

Go back to the pipeline using Apps -> View All Apps
Open the CDC app RealTimeAIDemo_CDC and perform the same steps as done for the RealTimeAIDemo_IL application (i.e calling generateEmbedding function and columnMap())

Create Gen AI application using python

Next step is to build a python application, which does the following:

Accepts user query in natural text for searching a product
Converts the Query to embeddings using Open AI
Performs similarity search using PG vector extension functionality in Azure Database for PostgreSQL
Returns a response generated using LLM service with the top matching product details

Please click here to download this python application.
asyncpg is used to connect to Postgres
OpenAIEmbeddings (langchain.embeddings) is used to generate embeddings for the user query
1. Please note that you need to use the same model in the Striim embedding generator
langchain.chains.summarize is used for summarisation (model used : gpt-3.5-turbo-instruct)
Query used to perform the similarity search :

1. Similarity threshold of 0.7, max matches of 5 are used as experimentation but only one closest result is picked for summarisation

gradio is used to present in a simple UI

Start the pipeline to perform the initial snapshot load

Once the initial load is complete, the pipeline will automatically transition to CDC phase. Verify the data in Postgres table and confirm that embeddings are stored as well.

Run the similarity search python application and verify that it fetches the results using the similarity search of pgvector.

Capture the real-time changes from the Oracle database and generate on the fly vector conversion

Perform a simple change in your source Oracle database to make a better product description for the product with id : ’20e597d8836d9e5fa29f2bd877fc3e0a’

Striim pipeline would instantly capture the change and deliver it to the Postgres table along with the updated vector embedding.

Run the same query in the search interface to notice that a fresh result shows up :

Now the same query would result in a different product id since the query works against the recently updated data in the Oracle database.

There we go! Real-time integration and consistent data everywhere!

Conclusion

Experience the power of real-time RAG with Striim. Get a demo or start your free trial today to see how we enable context-aware, natural language-based search with real-time data consistency—delivering smarter, faster, and more responsive customer experiences!

References and credits

Inspired by the Google notebook which showcased the use case and also had reference to the curated dataset: Building AI-powered data-driven applications using pgvector, LangChain and LLMs