Striim

13 Posts

A Guide to Seamless Data Fabric Implementation

Organizations are grappling with the increasing complexity and diversity of their data sources. Traditional approaches often fall short in addressing the challenges posed by disparate data silos, and there arises a need for a more cohesive and integrated solution. Enter Data Fabric — a paradigm that promises a unified, scalable, and agile approach to managing the intricacies of modern data.

What is Data Fabric? 

description of the elements of data fabric

Data Fabric is a comprehensive data management approach that goes beyond traditional methods, offering a framework for seamless integration across diverse sources. It is not a standalone product but comprises key elements, including data integration, ensuring the smooth merging of data; data quality, maintaining high data standards; metadata management, organizing and understanding data context; and security, safeguarding data integrity. Together, these four elements form a cohesive fabric, unifying disparate data sources and providing organizations with a holistic and coherent perspective on their data landscape.

The 4 Key Pillars of Data Fabric

four key pillars of data fabric

Data Integration: Breaking Down Silos
At the core of Data Fabric is the imperative need for seamless data integration. This element ensures the smooth merging of data from various sources, fostering a unified and comprehensive view. By dismantling data silos, organizations can promote collaboration and unlock valuable insights that were previously hidden in isolated pockets of information.

Data Quality: Building Trust in Information
Maintaining high standards for data quality such as accuracy, consistency, and reliability is paramount. By upholding data quality, organizations can trust the information they rely on for decision-making, fostering a data-driven culture built on dependable insights.

Metadata Management: Navigating the Data Landscape
Effective metadata management is the key to navigating the vast data landscape. This element involves organizing and understanding the context of data, enhancing discoverability and interpretability. With well-managed metadata, users can gain insights into the origin, structure, and relationships of integrated data, facilitating more informed decision-making.

Security: Safeguarding Data Integrity
Security is a non-negotiable aspect of the Data Fabric approach. It involves implementing robust measures to safeguard the integrity of data. By ensuring confidentiality and reliability through stringent security protocols, organizations can protect their data from unauthorized access, instilling trust in their data management practices.

How Striim Supports Data Fabric Implementation

While there are various ways to build a data fabric, the ideal solution simplifies the transition by complementing your existing technology stack. Striim serves as the foundation for a data fabric by connecting with legacy and modern solutions alike. Its flexible and scalable data integration backbone supports real-time data delivery via intelligent pipelines that span hybrid cloud and multi-cloud environments. 

intelligent streaming for a multi-cloud data fabric

  1. Real-Time Data Integration
    Striim provides a powerful streaming integration platform that aligns and employs change data capture (CDC) and streaming data processing to ensure data is captured and processed promptly, minimizing latency and delivering timely insights. Striim continuously ingests transaction data and metadata from on-premise and cloud sources. An in-memory streaming SQL engine transforms, enriches, correlates, and analyzes transaction event streams.
  2. Enhanced Data Quality
    Striim incorporates robust data quality measures such as validation rules and data cleansing processes. By enforcing data quality standards throughout the integration pipeline, Striim ensures the integrity and accuracy of data. Fresh data guarantees the latest insights on operational data to make profitable real-time decisions.
  3. Metadata-Driven Architecture
    Rich metadata management is at Striim’s core platform. It captures and utilizes metadata, including information on data lineage, quality, and transformations, providing a solid backbone for guiding activities within the data management system.
  4. Scalability and Flexibility
    Striim’s architecture is inherently modular, allowing for infinite scalability by adding more processing and storage resources as needed and without any additional planning or cost to execute so you can save time and money. Whether a database schema changes, a node fails, or a transaction is larger than expected — Striim’s Intelligent Integration pipelines take corrective actions the instant a problem arises.
  5. Security Measures
    Striim ensures end-to-end security in data streaming and integration. It offers encryption protocols, access controls, and monitoring features to safeguard sensitive information, addressing the security concerns. Striim’s hybrid and multi-cloud vault securely stores passwords, secrets, and keys. It also integrates seamlessly with third-party vaults such as HashiCorp.
  6. AI Innovation Support
    Striim serves as a crucial component for organizations aiming to harness the power of Artificial Intelligence (AI). Its seamless integration capabilities align with Data Fabric’s role as a bedrock for AI initiatives, providing a unified view essential for training robust machine learning models.

Empowering GenAI Innovation

Data Fabric has emerged as a pivotal framework that goes beyond integration, offering a comprehensive solution for organizations aiming to harness the power of AI. At its core, Data Fabric serves as the bedrock for AI initiatives by seamlessly integrating diverse data sources, providing a unified view essential for training robust machine learning models. Organizations leveraging the synergies of GenAI and data fabric can unlock a multitude of advantages. By enabling natural language access, these technologies empower organizations to democratize data, offering a ChatGPT-like interface for seamless queries. Addressing the complexities of data integration in hybrid and multi-cloud environments, generative AI and LLMs streamline real-time integration through automated code generation, supporting dynamic entity resolution and automated data mapping. Leveraging vector databases, these technologies enable groundbreaking similarity searches based on connected context within the data fabric, fostering data intelligence and uncovering untapped data assets. Furthermore, they address the critical challenge of real-time data quality by automating anomaly detection, data cleansing, and validation, ensuring a heightened overall data quality. Finally, in the realm of data security and governance, GenAI and data fabric automate processes such as discovery, classification, categorization, and data access in real time, establishing a foundation for secure and governed data management.

Implementation Strategies for Data Fabric in Your Organization

While the promises of Data Fabric are compelling, the road to implementation requires careful consideration and strategic planning. Organizations embarking on the journey of adopting Data Fabric should begin by conducting a comprehensive assessment of their existing data landscape. Understanding the current state of data sources, quality, and integration points is crucial to formulating an effective implementation strategy.

Collaboration between IT and business units is key during the implementation phase. Data Fabric is not just a technological solution but a holistic framework that requires alignment with the organization’s business goals. Engaging stakeholders from various departments ensures that the Data Fabric implementation is tailored to meet the specific needs and objectives of the organization.

Additionally, organizations should adopt an iterative approach to implementation, focusing on quick wins and gradually expanding the scope. This allows for continuous feedback and adjustments, ensuring that the Data Fabric evolves alongside the changing needs of the organization.

Real-World Applications of Data Fabric with Striim

To illustrate the real-world impact of Data Fabric, let’s explore a few use cases across different industries.

Revolutionize Patient Care: Seamless Data Integration in Healthcare
Healthcare institutions grapple with fragmented patient data across various systems. Implementing Data Fabric unifies electronic health records, diagnostic tools, and wearable device data in real time. This results in a comprehensive patient view, enhancing medical decision-making, personalized treatment plans, and accelerating medical research for breakthrough innovations.

Elevate Customer Experience: Real-time Insights in Retail Operations
Retail giants aim to enhance customer experience by integrating data from multiple sources, including sales transactions, customer behaviors, and inventory levels. With Data Fabric, the organization achieves real-time data integration, optimizing pricing strategies, improving inventory management, and ultimately delivering a seamless and personalized retail experience.

Ensure Regulatory Compliance: Robust Data Management in Financial Services
Financial institutions face the challenge of meeting stringent regulatory requirements. Data Fabric is implemented to ensure compliance by integrating and managing data with a focus on security and accuracy. This not only streamlines compliance processes but also enhances risk assessment, fraud detection, and personalized customer services in the fast-paced financial landscape.

Enhance Drug Discovery: Data Integration in Pharmaceutical Research
In the pharmaceutical industry, research teams grapple with the integration of diverse datasets critical for drug discovery. Data Fabric accelerates drug development by seamlessly integrating data from clinical trials, research studies, and external sources. This unified data approach promotes collaboration, data-driven decision-making, and accelerates the pace of innovation in pharmaceutical research.

Optimize Supply Chain: Real-time Data for Manufacturing and Logistics
Manufacturing companies seek to optimize their supply chain by integrating data from production processes, logistics, and inventory management. Data Fabric enables real-time data processing, providing a unified and up-to-date view of the entire supply chain. This results in improved operational efficiency, reduced lead times, and enhanced agility in responding to market demands.

Transforming Data Challenges with Data Fabric and Striim

The advent of Data Fabric emerges as a transformative force, offering a unified, scalable, and agile solution to the burgeoning challenges posed by disparate data sources. Comprising essential elements such as data integration, data quality, metadata management, and security, Data Fabric transcends traditional limitations. This cohesive framework not only breaks down data silos but also fosters a culture of collaboration, enabling organizations to make informed decisions based on a unified and comprehensive data landscape.

Ready to build a global and agile data environment that can track, analyze, and govern data across applications, environments, and users? Start using Striim for free today and scale limitlessly!

Enhancing Emergency Room Efficiency with Real-Time Data Analytics

In emergency rooms, where the stakes are highest and every second counts, having access to real-time patient data is not just a convenience—it’s a life-saving necessity. The ability to instantly process and act on critical information can drastically improve patient outcomes and, in many cases, make the difference between life and death. 

Given the fast-paced and unpredictable nature of emergency care, real-time data is a cornerstone of effective decision-making, resource allocation, and patient management. Here’s what you need to know about increasing ER efficiency with the help of real-time data analytics.

The Critical Role of Real-Time Data in Emergency Rooms

Emergency rooms are high-stakes environments where speed, accuracy, and resource management are paramount. However, managing large volumes of patient data, especially across multiple disconnected systems, can prove a cumbersome challenge. With the increasing complexity of patient care and the demand for faster, more precise decision-making, the integration of real-time data is a game-changer in improving emergency room operations.

Real-time data enables healthcare providers to gain immediate insights into patient conditions, providing a clearer picture of care needs and resource availability. This immediate access to data supports timely decisions, helps prioritize care based on urgency, and ensures that resources such as staff and medical equipment are optimally allocated.

Real-Time Emergency Room Data

Improved Decision-Making with Interactive Dashboards

In emergency rooms, clinicians must make critical decisions quickly. Real-time, interactive dashboards offer healthcare teams a dynamic view of patient conditions, available resources, and key operational metrics. These dashboards present data in a way that not only tracks patient flow but also reflects the real-time status of hospital resources like beds, staff availability, and medical equipment, providing healthcare practitioners with the information necessary to make the best decision — all in real time. 

Instead of having to wait for reports or updates from other departments, healthcare organizations have the information they need the moment they need it. Better yet, the data isn’t outdated as it would be with batch processing. With live data at their fingertips, clinicians can prioritize patient care more effectively and coordinate efforts across departments to reduce delays.

Streamlining Communication for Better Collaboration

Another way real-time data analytics enhance emergency room response is through improved communication. Effective communication is key in emergency rooms, where teams of healthcare professionals must work together seamlessly to deliver rapid care. However, without timely data, communication can break down, leading to mistakes and delays. Real-time data integration enhances communication by ensuring that all team members have immediate access to relevant, up-to-date information.

Whether it’s coordinating care with other departments or updating patients on their status, real-time insights allow for better collaboration, enabling healthcare providers to respond quickly and appropriately to changing patient needs.

Optimizing Resource Allocation and Workflow Efficiency

With healthcare facilities facing staffing shortages and growing patient numbers, optimizing resource allocation has never been more important. Real-time data integration allows hospitals to monitor resources in real-time, ensuring that staff, equipment, and treatment areas are allocated where they are needed most.

By leveraging real-time data, hospitals can dynamically adjust their operations to match real-time patient volumes and needs. For example, bed availability and staffing levels can be adjusted as patient conditions evolve, helping to reduce wait times, improve patient care, and prevent overcrowding in emergency departments.

Looking Ahead: The Future of Real-Time Healthcare

The potential of real-time data in healthcare extends far beyond emergency rooms. From pharmacy order monitoring to proactive management of chronic conditions, the benefits of real-time data are transformative for all areas of healthcare. This level of data integration allows for more personalized care, faster treatments, and improved operational efficiency, contributing to both better patient outcomes and a more streamlined healthcare system overall.

Better Data, Better Patient Outcomes 

The ability to integrate and act on real-time data in emergency rooms is not a luxury—it’s a necessity for providing high-quality, patient-centered care. As healthcare systems continue to evolve, embracing real-time data analytics will be crucial in ensuring that hospitals can meet the demands of a modern, fast-paced healthcare environment. This technology not only enables immediate response times but also lays the groundwork for a more efficient, responsive, and patient-focused healthcare system.

Ready to discover how Striim can help your healthcare organization enhance emergency room efficiency and more? Get a demo today.

 

Operationalizing Machine Learning Through Streaming Integration – Part 1

I recently gave a presentation on operationalizing machine learning entitled, “Fast-Track Machine Learning Operationalization Through Streaming Integration,” at Intel AI Devcon 2018 in San Francisco. This event brought together leading data scientists, developers, and AI engineers to share the latest perspectives, research, and demonstrations on breaking barriers between AI theory and real-world functionality. This post provides an overview of my presentation.  

Background

The ultimate goal of many machine learning (ML) projects is to continuously serve a proper model in operational systems to make real-time predictions. There are several technical challenges practicing such kind of Machine Learning operationalization. First, efficient model serving relies on real-time handling of high data volume, high data velocity, and high data variety. Second, intensive real-time data pre-processing is required before feeding raw data into models. Third, static models cannot achieve high performance on dynamic data in operational systems even though they are fine-tuned offline. Last but not the least, operational systems demand continuous insights from model serving and minimal human intervention. To tackle these challenges, we need a streaming integration solution, which:

  • Filters, enriches and otherwise prepares streaming data
  • Lands data continuously, in an appropriate format for training a machine learning model
  • Integrates a trained model into the real-time data stream to make continuous predictions
  • Monitors data evolution and model performance, and triggers retraining if the model no longer fits the data
  • Visualizes the real-time data and associated predictions, and alerts on issues or changes

Striim: Streaming Integration with Intelligence

Figure 1. Overview of Striim

Striim offers a distributed, in-memory processing platform for streaming integration with intelligence. The value proposition of the Striim platform includes the following aspects:

  • It provides enterprise-grade streaming data integration with high availability, scalability, recovery, validation, failover, security, and exactly-once processing guarantees
  • It is designed for easy extensibility with a broad range of sources and targets
  • It contains rich and sophisticated built-in stream processors and also supports customization
  • Striim platform includes capabilities for multi-source correlation, advanced pattern matching, predictive analytics, statistical analysis, and time-window-based outlier detection via continuous queries on the streaming data
  • It enables flexible integration with incumbent solutions to mine value from streaming data

In addition, it is an end-to-end, easy-to-use, SQL-based platform with wizards-driven UI. Figure 1 describes the overall Striim architecture of streaming integration with intelligence. The architecture enables Striim users to flexibly investigate and analyze their data and efficiently take actions, while the data is moving.

Striim’s Solution of Fast-Track ML Operationalization

The advanced architecture of Striim enables us to leverage it to build a fast-track solution for operationalizing machine learning. Let me walk you through the solution in this blog post using a case of network traffic anomaly detection. In this use case, we deal with three practical tasks. First, we detect abnormal network flows using an offline-trained ML model. Second, we automatically adapt model serving to data evolution to keep a low false positive rate. Third, we continuously monitor the network system and alert on issues in real time. Each of these tasks correspond with a Striim application. For a better understanding with a hands-on experience, I recommend you download the sandbox where Striim is installed and these three applications are added. You can also download full instructions to install and work with the sandbox.

Abnormal network flow detection

Figure 2. Striim Flow of Network Anomaly Detection

We utilize one-class Support Vector Machine (SVM) to detect abnormal network flows. One-class SVM is a widely used anomaly detection algorithm. It is trained on data that has only one class, which is the normal class. It learns the properties of normal cases and accordingly predict which instances are unlike the normal instances. It is appropriate for anomaly detection because typically there are very few examples of the anomalous behavior in the training data set. We assume that there is an initial one-class SVM model offline trained on historical network flows with specific features. This model is then served online to identify abnormal flows in real time. This task requires us to perform the following steps.

  1. Ingest raw data from the source (Fig. 2 a);

For ease of demonstration, we use a csv file as the source. Each row of the csv file indicates a network flow with some robust features generated from a tcpdump analyzer. Striim users simply need to designate the file name, and the directory where the file locates, and then select DSVParser to parse the csv file. These configurations can be written in a SQL-based language TQL. Alternatively, Striim web UI can navigate users to make the configurations easily. Note that you can work with virtually any other source in practice, such as NetFlow, database, Kafka, security logs, etc. The configuration is also very straightforward.

  1. Filter the valuable data fields from data streams (Fig. 2 b);

Data may contain multiple fields, and while not all of them are useful for the specific task, Striim enables users to filter the valuable data fields for their tasks using standard SQL within continuous query (CQ). The SQL code of this CQ is as below, where 44 features plus a timestamp field are selected and converted to the specific types, and an additional field “NIDS” is added to identify the purpose of data usage. Besides, we pause for 15 milliseconds at each row to simulate continuous data streams.

SELECT  “NIDS”,TO_DATE(TO_LONG(data[0])*1000), TO_STRING(data[1]), TO_STRING(data[2]), TO_Double(data[3]),TO_STRING(data[4]),TO_STRING(data[5]),TO_STRING(data[6]),TO_Double(data[7]),TO_Double(data[8]), TO_Double(data[9]),TO_Double(data[10]),TO_Double(data[11]),TO_Double(data[12]),TO_Double(data[13]),TO_Double(data[14]),TO_Double(data[15]),TO_Double(data[16]),TO_Double(data[17]),TO_Double(data[18]),TO_Double(data[19]),TO_Double(data[20]),TO_Double(data[21]),TO_Double(data[22]),TO_Double(data[23]),TO_Double(data[24]),TO_Double(data[25]),TO_Double(data[26]),TO_Double(data[27]),TO_Double(data[28]),TO_Double(data[29]),TO_Double(data[30]),TO_Double(data[31]),TO_Double(data[32]),TO_Double(data[33]),TO_Double(data[34]),TO_Double(data[35]),TO_Double(data[36]),TO_Double(data[37]),TO_Double(data[38]),TO_Double(data[39]),TO_Double(data[40]),TO_Double(data[41]),TO_Double(data[42]),TO_Double(data[43]),TO_Double(data[44]) FROM dataStream c WHERE PAUSE(15000L, c)

  1. Preprocess data streams (Fig. 2 c, d);

To guarantee SVM to perform efficiently, the numerical features need to be standardized. The mean and standard deviation values of these features are stored in cache (c) and used to enrich the data streams output from b. Standardization is then performed in d.

  1. Aggregate events within a given time interval (Fig. 2 e);

Suppose that the network administration does not want to be overwhelmed with alerts. Instead, he or she cares about a summary for a given time interval, e.g., every 10 seconds. We can use a time bounded (10-second) jumping window to aggregate the pre-processed events. The window size can be flexibly adjusted according to the specific system requirements.

  1. Extract features and prepare for model input (Fig. 2 f);

Event aggregation not only prevents information overwhelming but also facilitates efficient in-memory computing. Such an operation enables us to extract a list of inputs, where each input contains a specific number of features, and to feed all inputs into the analytics model to get all of the results once. If analytics is done by calling remote APIs (e.g., cloud ML API) instead of in-memory computing, aggregation can additionally decrease the communication cost.

  1. Detect anomalies using an offline-trained model (Fig. 2 g);

We utilize one-class SVM algorithm from Weka library to perform anomaly detection. A SVM model is first trained and fine-tuned offline using historical network flow data. Then the model is stored as a local file. Striim allows users to call the model in the platform by writing a java function specifying model usage and then wrapping it into a jar file. When there are new network flows streaming into the platform, the model can be applied on the data streams to detect anomalies in real time.

  1. Persist anomaly detection results into the target (Fig. 2 h).

The anomaly detection results can be persisted into a wide range of targets, such as database, files, Kafka, Hadoop, cloud environments etc. Here we choose to persist the results in local files. By deploying and running this first application, you will see the intermediary results by clicking each stream in the flow and see the final results continuously being added in the target files, as shown in Fig. 3.

In part 2 of this two-part post, I’ll discuss how you can use the Striim platform to update your ML models. In the meantime, please feel free to visit our product page to learn more about the features of streaming integration that can support operationalizing machine learning.

 

Back to top