October 2021 - Str-Headless

How Striim’s Data-Streaming Capabilities Help Tackle These 4 Data Governance Challenges

Posted on October 26, 2021 by John Kutay | 9 min read | 2 views

Today, organizations are developing data architectures and infrastructures that use real-time streaming data, making data governance more crucial than ever. When a large amount of data has to be rapidly processed in near real-time, data in motion in the form of streaming data is an excellent option.

Governing data at rest (i.e., data stored in databases) wasn’t easy in the first place; now, organizations have to deal with a tougher challenge following the rise of data in motion (i.e., data that is moved between different sources and environments). As more and more data is streamed in real-time, managing streams spread across multiple sources and apps (e.g., databases and CRMs) takes the matter to a whole new level.

There is a gap in the data governance space. Organizations use data governance tools that are more suited for managing data at rest. However, the growing adoption of big data analytics means that managing data in motion is more crucial than ever. Striim has prioritized addressing this vacuum by introducing solutions to some of the most common data governance challenges.

Challenge #1: Lack of visibility into your data
Challenge #2: Loose permissions for data access
Challenge #3: Teams share the same data implementation
Challenge #4: Lack of security for data in motion

Challenge #1: Lack of visibility into your data

Data security is one of the major data governance challenges. But, you can only secure data that you’re aware of. That’s why a data governance team always sets a certain objective: Identify and classify the data that exists within the enterprise.

It’s common for businesses to have complex data environments that are often all over the place. Developing a framework to find data sources on a continuous basis is a tough nut to crack, and keeping data categorized is equally challenging.

Solution: Striim enables data discovery

To discover and classify your data, you have to answer a few questions, such as:

Where is the data located?
Who can access the data?
How long will you keep the data?

Striim can help you resolve your data issues by bringing your streaming data into a centralized location (e.g. a data warehouse) where a data catalog solution can provide a bird’s-eye view of all the data within your organization. Furthermore, Striim allows you to enrich your data with reference data to make your data more meaningful. For example, a B2B company may use a relational database to store order information. With a normalized schema, many of the data fields are in the form of IDs, e.g. the “Orders” table may have a column for “Sales Rep ID”. Striim can add valuable context to the “Orders” data by adding sales rep names and emails (from the “SalesRep” table) to the streaming data en route to a data warehouse.

stream enrichment — Striim enriches the “Orders” stream with cached data from the “SalesRep” table (name, email)

Once the data is centralized, a data catalog arranges data into an easy-to-understand format, allowing data users to use it readily. It also addresses multiple data governance issues. For example, your data catalog can connect siloed data, which can help to fix data inconsistencies and improve data quality. Or, you can use it to control data for compliance.

You can use these catalogs to store metadata and integrate it with data collaboration, management, and search tools (e.g., Tableau, Elasticsearch). This way, your users can locate and utilize relevant data instantly. It provides context to your data roles. For instance, your data scientists can use a data catalog to find and understand a dataset, which can uncover crucial insights. These can be market trends, correlations, hidden patterns, and customer preferences to help your business make informed business decisions.

And if you’re using Confluent’s streaming platform, Striim offers a seamless integration with Confluent’s schema registry and serialization layer. This enables you to stream data (from databases and other sources) into Confluent and leverage their recently-released Stream Governance features.

Challenge #2: Loose permissions for data access

Often, organizations grant extremely broad permissions to their data teams for data access. As a result, the lines between the responsibilities of data governance roles like chief data officer, data custodian, data steward, data trustee, data owner, and data user are blurred. This lack of access control can also make it difficult to minimize data privacy risks and maximize accountability.

Some organizations manage their data governance by tracking data access through access logs, but this presents its own set of challenges. That’s because each data technology comes with its own log system that stores varying information. You also need context to understand these logs, such as who accessed the data and what they did with it. This context is often stored in various tools that are incompatible with access logs. There’s a clear need for a better solution.

Solution: Striim offers role-based access control to enforce better control over data

Roles and responsibilities form the cornerstone of an effective data governance strategy. Data governance holds people accountable for performing the right set of actions at the right time. To do this, Striim can help with the definition and deployment of roles that are suited to the organizational structure and culture. This is done through role-based access control (RBAC), allowing you to control what your business users can do at both granular and broad levels.

For example, you can designate whether the user is a data custodian, data trustee, or data user and assign roles and data access permissions based on employees’ positions in your organization.

The main objective of RBAC is to provide a framework that lets organizations set and enforce access control policies for their data, which helps to streamline data governance. It grants permissions to ensure that employees receive adequate access — good enough to help them do their jobs.

With Striim, you can set roles and privileges to access all objects. An object in Striim can be many things, including sources, targets, streams, flow, and so on.

Your admins can define roles with different access levels and controls on objects, such as:

A group of users who can create and edit any type of object.
A group of users who can copy and read data from objects but aren’t allowed to edit them.

For example, you may have a connector that reads data with PII (‘Personal Identifiable Information’). You can create a specific permission to read the objects that contain PII and assign permission only to users with that degree of authorization.

role based access control to streaming data — In Striim, user permissions control which actions given users can take on different object types (for example, streams or sources).

Challenge #3: Teams share the same data implementation

A common data governance challenge is when different departments use the same application for their data-related tasks. The data team configures a data infrastructure or system that several other departments use for the collection of accurate data. However, a lack of communication can lead to an employee breaking the system.

For instance, suppose multiple teams share the same website and data analytics infrastructure. The goal of the IT team will be to use the data from analytics to fix the website’s functionality and security. On the other hand, a marketing team will be crunching the numbers to find ways to improve customer experience. Unfortunately, the difference in these team goals can lead to ongoing blunders, such as double-tagging or interrupted customer journeys.

Solution: Striim uses apps and app groups to divide workloads

You can use apps and app groups in Striim to divide workloads between teams. Striim supports data orchestration — essentially, you can use Striim’s user interface and REST APIs to automate the data in flight between your event tracking, data loader, modeling, and data integration tools.

Organizations can create a dedicated app for each business group to build a domain-specific view or transformation for analysis. That means that both your marketing and IT teams can have their own data workflows, empowering them to work more freely without one department affecting another.

For example, you can dedicate a group of Striim apps to collect streaming data from streaming databases and transfer it to a data warehouse for data analysis. Similarly, you can have a Striim app for data transformation that uses Python scripts to convert data from your sources to a standardized format.

Striim CDC app — A Striim app to stream data to collect streaming data from MySQL(via Change Data Capture) to Kafka

Challenge #4: Lack of security for data in motion

Data in motion is exposed to a wide range of risks. Unlike data at rest, it travels both inside and outside the organization. These days, it’s important to protect data in motion because modern regulatory guidelines like HIPAA, GDPR, and PCI DSS enforce the protection of data in motion.

Solution: Striim offers advanced security features to protect data in motion

Striim protects data in motion with a number of security initiatives. Some of these include:

Striim helps you to set an encryption policy to encrypt your data with encryption algorithms, such as RSA (Rivest–Shamir–Adleman), PGP (Pretty Good Privacy), and AES (Advanced Encryption Standard). This can especially come in handy in compliance-based industries (e.g., healthcare) and help protect data like PHI (protected health information) and PII (personally identifiable information).
Striim has multi-layered application security for exporting and importing data pipeline applications. For instance, during the import of these applications, you can set a passphrase for applications that contain passwords and other encrypted values. This way, you can incorporate an additional security layer into your application security.
Striim has a secure, centralized repository, Striim Vault, which can serve as a go-tool for storing passwords and encryption keys. Striim’s vault also integrates seamlessly with 3rd party vaults such as HashiCorp

A reliable data governance program is key to addressing your data governance challenges

The value of data governance is understated. A reliable data governance program increases the trust of people in your data analytics, business processes, and systems that power your data-driven decision-making. It offers secure access, enabling IT to successfully oversee the management of data sources and analytical content and meet policy, risk, and compliance requirements. Line-of-business employees can instantly locate the data they need and perform their jobs better.

Streamline your enterprise data governance framework with Striim. Learn more about how Striim can enhance your data governance initiatives by getting a technical demo.

What is a Data Engineer? A Brief Guide to Pursuing This High-Demand Career

Posted on October 20, 2021 by John Kutay | 10 min read | 2 views

Data engineer roles have gained significant popularity in recent years. This study by Dice shows that the number of data engineering job listings has increased by 15% between Q1 2021 to Q2 2021, up 50% from 2019.

In addition to being an in-demand role, working as a data engineer can allow you to solve problems, experiment with large datasets, and understand patterns in our world. Students and professionals looking for a switch to a technology role should consider a career in data engineering.

To help you understand the requirements of a data engineer, we’ve compiled the roles and responsibilities of data engineers, the tools they use, and what you need to get started as a data engineer.

What is a Data Engineer?
Data Engineers vs Data Scientists vs Data Architects: What are the differences?
What Tools do Data Engineers Use?
What Skills do I Need to Learn to be a Data Engineer?
Should I Purse a Career in Data Engineering?

What is a Data Engineer: An Overview of the Responsibilities

Data engineers are responsible for designing, maintaining, and optimizing data infrastructure for data collection, management, transformation, and access. They are in charge of creating pipelines that convert raw data into usable formats for data scientists and other data consumers to utilize. The data engineer role evolved to handle the core data aspects of software engineering and data science; they use software engineering principles to develop algorithms that automate the data flow process. They also collaborate with data scientists to build machine learning and analytics infrastructure from testing to deployment.

Data engineers help organizations structure and access their data with the speed and scalability they need and provide the infrastructure to enable teams to deliver great insights and analytics from that data. Kevin Wylie, a data engineer with Netflix, says his work is about making the lives of data consumers easier and enabling these consumers to be more impactful.

Most times, the format/structure optimal to store data for an application is rarely optimal for data science/reporting/analytics. For example, your application may need to be able to serve one million concurrent requests for individual records. But your data science team might need to access billions of records per time. Both scenarios will require different approaches to solve their problems, and this is where data engineers can help.

The primary responsibility of a data engineer is ensuring that data is readily available, secure, and accessible to stakeholders when they need it. Data engineering responsibilities can be grouped into two main categories:

Data structure and management

Data engineers are responsible for implementing and maintaining the underlying infrastructure and architecture for data generation, storage, and processing. Their responsibilities include:

Building and maintaining data infrastructure for optimal extraction, transformation, and loading of data from a wide variety of sources such as Amazon Web Services (AWS) and Google Cloud big data platforms.
Ensuring data accessibility at all times and implementing company data policies with respect to data privacy and confidentiality.
Improving data systems reliability, speed, and performance.
Creating optimal data warehouses, pipelines, and reporting systems to solve business problems.

Data analysis and insight

Data engineers play an important role in building platforms that enable data consumers to analyze and gain insights from data. They are responsible for:

Cleaning and wrangling data from primary and secondary sources into formats that can be easily utilized by data scientists and other data consumers.
Developing data tools and APIs for data analysis.
Deploying and monitoring machine learning algorithms and statistical methods in production environments.
Collaborating with engineering teams, data scientists, and other stakeholders to understand how data can be leveraged to meet business needs.

Although every organization has slightly different requirements, data engineering job listings from top tech company’s career sites like Netflix and Google and articles from job sites such as Indeed can provide more information on what data engineers are commonly responsible for in an organization.

Data Engineers vs. Data Scientists vs. Data Architects: What are the Differences?

data scientist vs data engineer — From a thankful data scientist to data engineers. Original post here.

These roles vary significantly from company to company and often overlap since their work usually revolves around the same key component: data. Larger companies tend to have separate departments for these roles, and in smaller companies, it’s not uncommon to have one person acting as all three.

This table gives a brief overview of the differences between the three roles.

Data Architect	Data Engineer	Data Scientist
Data architects plan and design the framework the data engineers build. They create the organization’s logical and physical data assets, as well as the data management resources, and they set data policies based on company requirements.	Data engineers are responsible for gathering, collecting, and processing data. They also build systems, algorithms, and APIs to expose datasets to data consumers.	Data scientists are responsible for performing statistical analysis using machine learning and artificial intelligence on collated data in order to gain insight and form new hypotheses.

Unless a company has a large data/engineering team, it’s unlikely to have all three of these roles and will likely employ some combination of the above based on engineering, data, and business needs. Read more: For a deeper dive into how data architects and data engineers differ in responsibilities, skill sets, and career paths, see our comparison: Data Architect vs. Data Engineer.

What Tools Do Data Engineers Use?

There are no one-size-fits-all tools data engineers use. Instead, each organization leverages tools based on business needs. However, below are some of the popular tools data engineers use. You don’t necessarily have to gain mastery of all the tools here, but we recommend you learn the fundamentals of each core tool.

Databases

In our fast-paced world where tools and technologies are constantly evolving, SQL remains central to it all and is a foundational tool for data engineers. SQL is the standard programming language for creating and managing relational database systems (a collection of tables that consist of rows and columns).

NoSQL databases are non-tabular and can take the form of a graph or a document, depending on their data model. Popular SQL databases include MYSQL, PostgreSQL, and Oracle. MongoDB, Cassandra, and Redis are examples of popular NoSQL databases.

Data processing

Today’s businesses recognize the importance of processing data in real-time to enhance business decisions. As a result, data engineers are in charge of building real-time data streaming and data processing pipelines. Apache Spark is an analytics engine used for real-time stream processing; Apache Kafka is a popular tool for building streaming pipelines and is used by more than 80% of fortune 500 companies.

For example, Netflix uses Kafka to process over 500 billion events per day, ranging from user viewing activities to error logs.

Programming languages

Data engineers are typically fluent in at least one programming language to create software solutions to data challenges. Python is regarded as the most popular and widely used programming language in the data engineering community. It’s easy to learn and features a simple syntax and an abundance of third-party libraries geared toward data needs.

Data migration and integration

As more companies leverage cloud-based computing to meet business demands, migrating mission-critical applications can introduce several challenges of which migrating the underlying database is often the most difficult. Data migration and integration refer to the processes involved in moving data from one system or systems to another without compromising its integrity. Data integration specifically is the process of consolidating data from various sources and combining it in a meaningful and valuable way.

Striim is a popular real-time data integration platform used by data engineers for both data integration and migration; it provides modern, reliable data integration and migration across the public and private cloud.

Distributed systems

Because of the massive amount of data in circulation today, a single machine/system cannot meet data processing and storage requirements. Distributed systems are systems that work together to achieve a common goal but appear to the end-user as a single system.

Hadoop is a popular data engineering framework for storing and computing large amounts of data using a network of computers.

Data science and machine learning

Data engineers need a basic understanding of popular data science tools because it enables them better to understand data scientists and other data consumers’ needs. PyTorch is an open-source machine learning library used for deep learning applications using GPUs and CPUs. TensorFlow is a free, open-source machine learning platform that provides tools for teams to create and deploy machine learning-powered applications.

What Skills Do I Need to Learn to be a Data Engineer?

Data engineering is a developing field that bisects software engineering and data science. While there are no defined steps to becoming a data engineer, that doesn’t mean you can’t do it.

Here are some of the necessary skills and knowledge you need to become a successful data engineer.

Understand databases (SQL and NoSQL): An essential skill for data engineers is learning how databases work and how to write queries to manipulate and retrieve data. This free database systems course by freeCodeCamp and Cornell University is an excellent resource to learn how database systems work.
Understand data processing techniques and tools: LinkedIn Learning provides fantastic resources to learn Apache Kafka – a popular tool for data processing.
Know a programming language: Knowing how to program is a must-have skill for data engineers. Programming languages such as Python and Scala are popular with data engineers. The complete Python Bootcamp on Udemy is a popular resource for getting started with Python.
Understand how distributed systems work: Designing Data-Intensive Applications is a great resource to understand the fundamental challenges companies face when designing large data applications.
Learn about cloud computing: With more companies relying on cloud providers for data infrastructure needs, learning how to design and engineer data solutions using popular cloud providers such as Amazon Web Services, Google Cloud, and Azure will help you stand out as a data engineer. Online courses, official tutorials, and certifications from cloud providers (like this one from Google Cloud ) are excellent ways to learn cloud computing.

Many data engineers teach themselves skills through free and low-cost online learning programs. The Data Engineering Career Learning Path by Coursera and the Learn Data Engineering Academy provides practical resources to get you started. If you prefer a more degree-oriented approach, Udacity offers a specialized track dedicated to data engineering.

Should I Pursue a Career in Data Engineering?

Research from Domo estimates that humans generate about 2.5 quintillion bytes of data per day through social media, video sharing, and other means of communication. Furthermore, the World Economic Forum predicts that by 2025, the world will generate 463 exabytes of data per day, the equivalent of 212,765,957 DVDs per day. With the copious amount of data generated, there will be an increase in the demand for data engineers to manage it.

If you love experimenting with data, using it to discover patterns in technology or enjoy building systems that organize and process data to help companies make data-driven decisions, you might consider a career in data engineering. Further, data engineering is a lucrative field, with a median base salary of $102,472. While data engineering can be difficult and complex, and you may need to learn new skills and technology, it is also a rewarding career in a growing field.

Data Fabric: What is it and Why Do You Need it?

Posted on October 11, 2021 by John Kutay and Mariana Park | 10 min read | 2 views

Insight-driven businesses have the edge over others; they grow at an average of more than 30% annually. Noting this pattern, modern enterprises are trying to become data-driven organizations and get more business value out of their data. But the rise of cloud, the emergence of the Internet of Things (IoT), and other factors mean that data is not limited to on-premises environments.

In addition, there are voluminous amounts of data, many data types, and multiple storage locations. As a consequence, managing data is getting more difficult than ever.

One of the ways organizations are addressing these data management challenges is by implementing a data fabric. Using a data fabric is a viable strategy to help companies overcome the barriers that previously made it hard to access data and process it in a distributed data environment. It empowers organizations to manage mounting amounts of data with more efficiency. Data fabric is one of the more recent additions to the lexicon of data analytics. Gartner listed data fabric as one of the top 10 data and analytics trends for 2021.

What is a data fabric?
Why do you need a data fabric in today’s digital world?
Data fabric examples to consider for improving your organization’s processes
Security is key to a successful data fabric implementation
Building your data fabric with Striim
Learn more: on-demand webinar with James Serra

What is a data fabric?

A data fabric is an architecture that runs technologies and services to help an organization manage its data. This data can be stored in relational databases, tagged files, flat files, graph databases, and document stores.

A data fabric architecture facilitates data-centric tools and applications to access data while working with various services. These can include Apache Kafka (for real-time streaming), ODBC (open database connectivity), HDFS (Hadoop distributed file system), REST (representational state transfer) APIs, POSIX (portable operating system interface), NFS (network file system), and others. It’s also crucial for a data fabric architecture to support emerging standards.

A data fabric is agnostic to architectural approach, geographical locations, data use case, data process, and deployment platforms. With data fabric, organizations can work toward meeting one of their most desired goals: having access to the right data in real-time, with end-to-end governance-and all at a low cost.

Data fabric vs. data lake

Often it happens that organizations lack clarity on what makes a data lake different from a data fabric. A data lake is a central location that stores large amounts of data in its raw and native format.

However, there’s an increase in the trend of data decentralization. Some data engineers believe that it’s not practical to build a central data repository, which you can govern, clean, and update effectively.

On the other hand, a data fabric supports heterogeneous data locations. It simplifies managing data stored in disparate data repositories, which can be a data lake or a data warehouse. Therefore, a data fabric doesn’t replace a data lake. Instead, it helps it to operate better.

Why do you need data fabric in today’s digital world?

Data fabrics empower businesses to use their existing data architectures more efficiently without structurally rebuilding every application or data store. But why is a data fabric relevant today?

Organizations are handling challenges of bigger scalability and complexity. Today, their IT systems are advanced and work with disparate environments while managing existing applications and modern applications powered by microservices.

Previously, software development teams went with their own implementation for data storage and retrieval. A typical enterprise data center stores data in relational databases (e.g., Microsoft SQL Server), non-relational databases (e.g., MongoDB), data repositories (e.g., a data warehouse), flat files, and other platforms. As a result, data is spread across rigid and isolated data silos, which creates issues for modern businesses.

Unifying this data isn’t trivial. Apps store data in a wide range of formats, even if they are using the same data. Besides, organizations store data in various siloed applications. Consolidating this data includes going through data deduplication — a process that removes duplicate copies of repeating data. Taking data to the right application at the right time is desirable, but it’s a tough nut to crack. That’s where a data fabric architecture can resolve your problem.

A data fabric helps to:

Handle multiple environments simultaneously, including on-premises, cloud, and hybrid.
Use pre-packaged modules to establish connections to any data source.
Bolster data preparation, data quality, and data governance capabilities.
Improve data integration between applications and sources.

A data fabric architecture allows you to map data from different apps, making business analysis easier. Your team can draw decisions and insights from existing and new data points with connected data. For instance, suppose an authorized user in the sales department wants to look at data from marketing. A data fabric lets them access marketing data seamlessly, in the same way they access sales data.

With a data fabric, you can build a global and agile data environment that can track and govern data across applications, environments, and users. For instance, if objects move from one environment to another, the data fabric notifies each component about this change and oversees the required processes, such as what process to run, how to run, and what’s the object’s state.

Data fabric examples to consider for improving your organization’s processes

The flexibility of a data fabric architecture helps in more ways than one. Some of the data fabric examples include the following:

Enhancing machine learning (ML) models

When the right data is fed to machine learning (ML) models in a timely manner, their learning capabilities improve. ML algorithms can be used to monitor data pipelines and recommend suitable relationships and integrations. These algorithms can obtain information from data while being connected to the data fabric, go through all the business data, examine that data, and identify appropriate connections and relationships.

One of the most time-consuming elements of training ML models is getting the data ready. A data fabric architecture helps to use ML models more efficiently by reducing data preparation time. It also aids in increasing the usability of the prepared data across applications and models. When an organization distributes data across on-premises, cloud, and IoT, it’s the data fabric that provides controlled access to secure data, enhancing ML processes.

Building a holistic customer view

Businesses can employ a data fabric to harness data from customer activities and understand how interacting with customers can offer more value. This could include consolidating real-time data of different sales activities, the time it takes to onboard a customer, and customer satisfaction KPIs.

For instance, an IT consulting firm can consolidate data from customer support requests and rework their sales activities accordingly. The firm receives concerns from its clients regarding the lack of a tool that can help them to migrate their on-premises databases to multi-cloud environments without downtime. The firm can then recognize the need to resolve this issue, find a reliable tool like Striim to address it, and have its sales representatives recommend the tool to customers.

Security is key to a successful data fabric implementation

Over the past few years, cyberattacks, especially ransomware attacks, have grown at a rapid rate. So, it’s no surprise organizations are concerned about the risk these attacks pose to their data security while data is being moved from one point to another in the data fabric.

Organizations can improve data protection by incorporating security protocols to protect their data from cyber threats. These protocols include firewalls, IPSec (IP Security), and SFTP (Secure File Transfer Protocol). Another thing to consider is a dynamic and fluid access control policy, which can be adapted dynamically to tackle evolving cyber threats.

With so many cyberattacks causing damages worth millions, securing your data across all points is integral for successfully implementing your data fabric architecture.

This can be addressed in multiple ways:

Ensuring data at-rest and in-flight are encrypted
Protecting your networking traffic from the public internet by using PrivateLink on services like Azure and AWS
Managing secrets and keys securely across clouds

Building your data fabric with Striim

Now that you know the benefits and some use cases of a data fabric, how can you start the transition towards a data fabric architecture in your organization?

According to Gartner, a data fabric should have the following components:

A data integration backbone that is compatible with a range of data delivery methods (including ETL, streaming, and replication)
The ability to collect and curate all forms of metadata (the “data about the data”)
The ability to analyze and make predictions from data and metadata using ML/AI models
A knowledge graph representing relationships between data

While there are various ways to build a data fabric, the ideal solution simplifies the transition by complementing your existing technology stack. Striim serves as the foundation for a data fabric by connecting with legacy and modern solutions alike. Its flexible and scalable data integration backbone supports real-time data delivery via intelligent pipelines that span hybrid cloud and multi-cloud environments.

Striim secure multi-cloud data fabric — Striim enables a multi-cloud/hybrid cloud data fabric architecture with automated, intelligent pipelines that continuously deliver data to consumers including data warehouses and data lakes.

Striim continuously ingests transaction data and metadata from on-premise and cloud sources and is designed ground-up for real-time streaming with:

An in-memory streaming SQL engine that transforms, enriches, and correlates transaction event streams
Machine learning analysis of event streams to uncover patterns, identify anomalies, and enable predictions
Real-time dashboards that bring streaming data to life, from live transaction metrics to business-specific metrics (e.g. suspected fraud incidents for a financial institution or live traffic patterns for an airport)
Hybrid and multi-cloud vault to store passwords, secrets, and keys. Striim’s vault also integrates seamlessly with 3rd party vaults such as HashiCorp

Continuous movement of data (without data loss or duplication) is essential to mission-critical business processes. Whether a database schema changes, a node fails, or a transaction is larger than expected — Striim’s self-healing pipelines resolve the issue via automated corrective actions. For example, Striim detects schema changes in source databases (e.g. create table, drop table, alter column/add column events), and users can set up intelligent workflows to perform desired actions in response to DDL changes.

As shown below, in the case of an “Alter Table” DDL event, Striim is configured to automatically propagate the change to downstream databases, data warehouses and data lakehouses. In contrast, in the case of a “Drop Table” event, Striim is set up to alert the Ops Team.

automated schema change detection with Striim — How intelligent workflows can be set up to automatically respond to different types of DDL/schema changes.

With Striim at its core, a data fabric functions as a comprehensive source of truth — whether you choose to maintain a current snapshot or a historical ledger of your customers and operations. The example below shows how Striim can replicate exact DML statements to the target system, creating an exact replica of the source:

Striim current snapshot mode — DML propagation to replicate database changes from source to target. This will actually perform updates and deletes on your target system to match it to the source exactly.

And the following example shows how Striim can be used to maintain a historical record of all the changes in the source system:

Striim history mode — History-mode for a record of all changes. This will show the logical change event and the optype including what has changed in the row.

Taken together, Striim makes it possible to build an intelligent and secure real-time data fabric across multi-cloud and hybrid cloud environments. Once data is unified in a central destination (e.g. a data warehouse), a data catalog solution can be used to organize and manage data assets.

Learn More: On-Demand Data Fabric Webinar

Looking for more examples and use cases of enterprise data patterns including data fabric, data mesh, and more? Watch our on-demand webinar with James Serra (Data Platform Architecture Lead at EY) on “Building a Multi-Cloud Data Fabric for Analytics”. Topics covered include:

Pros and cons of multi-cloud vs doubling down on a single cloud
Enterprise data patterns such as Data Fabric, Data Mesh, and The Modern Data Stack
Data ingestion and data transformation in a multi-cloud/hybrid cloud environment
Comparison of data warehouses (Snowflake, Synapse, Redshift, BigQuery) for real-time workloads