Domo arigato, Mr. Roboto, Mata ah-oo hima de Domo arigato, Mr. Roboto, Himitsu wo shiri tai
Every once in a while an analyst is called upon to look at things that are unique, that pique the interest and make the day go by faster. (For other days, there is coffee.) As the web server admin approached my desk, I was just about to put on another pot of coffee. I decided to hold off.
Our admin came to me with a bit of a puzzle. Although there are a myriad of well-known robots spidering our website daily, there was one that he was not happy with. It was called Yandex, a Russian search engine. Given the amount of malware and other less-than-wanted things coming out of Russian networks, the admin was concerned about this indexer accessing our website. With a grin a mile wide, I set aside the coffee and reached for my green tea with honey, and responded with a heartfelt, “I’m on it.”
FIRST STEPS
The first thing I needed to do was to get our web logs into a place where I could use Striim to analyze them. At my request, the admins had implemented SYSLOG-NG and were using a central repository for all of our logs which made it far easier to access them using Striim. Our primary and backup web servers in production resided in /var/log/www-prod-1 and /var/log/www-prod-2 on the central logging system. From there all I had to do was get them into Striim and we could start having fun. From the UI I whipped up a pair of text readers and configured them to take data from the access logs from both production servers.
The next step was to parse the log files so that the information from the log files was organized into fields that could be processed. A little wave of the REGEX wand and we had both logs parsed, and combined into a single flow for Striim to analyze.
From there, I next created a dashboard to show me just the information related to the web requests from Yandex. This would give me a clean and up-to-date view of the data I needed in real time. A quick TQL query combined with a table and I was on my way!
Immediately the information started flowing in. At first, it looked like just normal traffic one would expect from a indexing spider bot. Sure enough, however, my keen eyes spotted something that was not quite right.
As Dorothy Parker would say, “What fresh hell is this?” The spider was making a GET request of the search function of our website! This is not normal behavior for a spider if all it was doing was indexing our site. A little analyst magic performed on that request revealed it was using our own site search feature to look for “Fun HB Slot Machine” and the domain qpyl18.com. A quick check of this domain showed that it was [protected by Cloufflare ( Hi Otto! )], but that the origin server was having issues. A quick check of the IP addresses involved and my spidey-analyst senses were tingling.
The next burning question was how often was this happening, and at what volume? Back to the dashboard I went! I altered my query to show me the requests that were performing the internal searches, and was quickly rewarded with the information I was looking for:
Not only was this happening, but it was happening frequently.
I configured Striim to keep a watch on this, making it part of my overall security application, and using customized queries to create indicators of how many instances in a day, average number of instances over a week, and a special dashboard page with alerts to let me know if it got out-of-hand.
Like any good analyst, I gathered data for 60 days and then presented it to the web admin, and we both decided this was not something we wanted on our network. A few adjustments to the web server, firewall, and IDS, and we were off for a celebratory lunch.
The ease of use, speed, and myriad of tools along with the flexibility of Striim allowed me and the web admin to quickly and efficiently acquire, process, enrich and report the data on the unusual traffic, and create an environment where any of the shift analysts could keep an eye on the activity, both streaming in real time and stored for historical purposes.
So you want to empower your analysts with tools like this? Request a demo today. We will be happy to guide you through all of the features of Striim and help you improve your security footprint.
In this series of blog-based tutorials, we are guiding you through the process of building data flows for streaming integration and analytics applications using the Striim platform. This tutorial focuses on SQL-based stream processing for Apache Kafka with in-memory enrichment of streaming data. For context, please check out Part One of the series where we created a data flow to continuously collect change data from MySQL and deliver as JSON to Apache Kafka.
In this tutorial, we are going to process and enrich data-in-motion using continuous queries written in Striim’s SQL-based stream processing language. Using a SQL-based language is intuitive for data processing tasks, and most common SQL constructs can be utilized in a streaming environment. The main differences between using SQL for stream processing, and its more traditional use as a database query language, are that all processing is in-memory, and data is processed continuously, such that every event on an input data stream to a query can result in an output.
The first thing we are going to do with the data is extract fields we are interested in, and turn the hierarchical input data into something we can work with more easily.
Transforming Streaming Data With SQL
You may recall the data we saw in part one looked like this:
This is the structure of our generic CDC streams. Since a single stream can contain data from multiple tables, the column values are presented as arrays which can vary in size. Information regarding the data is contained in the metadata, including the table name and operation type.
The PRODUCT_INV table in MySQL has the following structure:
LOCATION_IDint(11) PK
PRODUCT_IDint(11) PK
STOCKint(11)
LAST_UPDATEDtimestamp
The first step in our processing is to extract the data we want. In this case, we only want updates, and we’re going to include both the before and after images of the update for stock values.
To do the processing, we need to add a continuous query (CQ) into our dataflow. This can be achieved in a number of ways in the UI, but we will click on the datastream, then on the plus (+) button, and select “Connect next CQ component” from the menu.
Connect Next CQ Component to Add to Our First Continuous Query
As with all components in Striim, we need to give the CQ a name, so let’s call it “ExtractFields”. The processing query defaults to selecting everything from the stream we were working with.
But we want only certain data, and to restrict things to updates. When selecting the data we want, we can apply transformations to convert data types, access metadata, and many other data manipulation functions. This is the query we will use to process the incoming data stream:
Notice the use of the data array (what the data looks like after the update) in most of the selected values, but the use of the before array to obtain the prevStock.
We are also using the metadata extraction function (META) to obtain the operation name from the metadata section of the stream, and a number of type conversion functions (TO_INT for example) to force data to be of the correct data types. The date is actually being converted from a LONG timestamp representing milliseconds since the EPOCH.
</code>
The final step before we can save this CQ is to choose an output stream. In this case we want a new stream, so we’ll call it “ExtractedFields”.
Data-flow with Newly Added CQ
When we click on Save, the query is created alongside the new output stream, which has a data type to match the projections (the transformed data we selected in the query).
After Clicking Save, the New CQ and Stream Are Added
The data type of the stream can be viewed by clicking on the stream icon.
Stream Properties Showing Generated Type Division
There are many different things you can do with streams themselves, such as partition them over a cluster, or switch them to being persistent (which utilizes our built-in Apache Kafka), but that is a subject for a later blog.
If we deploy and start the application (see the previous blog for a refresher) then we can see what the data now looks like in the stream.
Extracted Fields Viewed by Previewing Data Streams
As you can see it looks very different from the previous view and now only contains the fields we are interested in for the remainder of the application.
But at the moment, this new stream currently goes nowhere, while the original data is still being written to Kafka.
Writing Transformed Data to Kafka
To fix this, all we need to do is change the input stream for the WriteToKafka component.
Changing the Kafka Writer Input Stream
This changes the data flow, making it a continuous linear pipeline, and ensures our new simpler data structure is what is written to Kafka.
Linear Data Flow Including Our Process CQ Before Writing to Kafka
Utilizing Caches For Enrichment
Now that we have the data in a format we want, we can start to enrich it. Since the Striim platform is a high-speed, low latency, SQL-based stream processing platform, reference data also needs to be loaded into memory so that it can be joined with the streaming data without slowing things down. This is achieved through the use of the Cache component. Within the Striim platform, caches are backed by a distributed in-memory data grid that can contain millions of reference items distributed around a Striim cluster. Caches can be loaded from database queries, Hadoop, or files, and maintain data in-memory so that joining with them can be very fast.
A Variety of In-Memory Caches Are Available for Enrichment
In this example we are going to use two caches – one for product information loaded from a database, and another for location information loaded from a file.
Setting the Name and Datatype for the ProductInfo Cache
All caches need a name, data type, lookup key, and can optionally be refreshed periodically. We’ll call the product information cache “ProductInfo,” and create a data type to match the MySQL PRODUCT table, which contains details of each product in our CDC stream. This is define in MySQL as:
PRODUCT_IDint(11) PK
DESCRIPTIONvarchar(255)
PRICEdecimal(8,2)
BRANDvarchar(45)
CATEGORYvarchar(45)
WORNvarchar(45)
The lookup key for this cache is the primary key of the database table, or productId in this case.
All we need to do now is define how the cache obtains the data. This is done by setting the username, password, and connection URL information for the MySQL database, then selecting a table, or a query to run to access the data.
Configuring Database Properties for the ProductInfo Cache
When the application is deployed, the cache will execute the query and load all the data returned by the query into the in-memory data grid; ready to be joined with our stream.
Loading the location information from a file requires similar steps. The file in question is a comma-delimited list of locations in the following form:
Location ID, City, State, Latitude, Longitude, Population
We will create a File Cache called “LocationInfo” to read and parse this file, and load it into memory assigning correct data types to each column.
Setting the Name and Datatype for the LocationInfo Cache
The lookup key is the location id.
We will be reading data from the “locations.csv” file present in the product install directory “.” using the DSVParser. This parser handles all kinds of delimited files. The default is to read comma-delimited files (with optional header and quoted values), so we can keep the default properties.
Configuring FileReader Properties for the LocationInfo Cache
As with the database cache, when the application is deployed, the cache will read the file and load all the data into the in-memory data grid ready to be joined with our stream.
Dataflow Showing Both Caches Currently Ready to be Joined
Joining Streaming and Cache Data For Enrichment With SQL
The final step is to join the data in the caches with the real-time data coming from the MySQL CDC stream. This can be achieved by modifying the ExtractFields query we wrote earlier.
Full Transformation and Enrichment Query Joining the CDC Stream with Cache Data
All we are doing here is adding the ProductInfo and LocationInfo caches into the FROM clause, using fields from the caches as part of the projection, and including joins on productId and locationId as part of the WHERE clause.
The result of this query is to continuously output enriched (denormalized) events for every CDC event that occurs for the PRODUCT_INV table. If the join was more complex – such that the ids could be null, or not match the cache entries – we could change to use a variety of join syntaxes, such as OUTER joins, on the data. We will cover this topic in a subsequent blog.
When the query is saved, the dataflow changes in the UI to show that the caches are now being used by the continuous query.
Dataflow After Joining Streaming Data with Caches in the CQ
If we deploy and start the application, then preview the data on the stream prior to writing to Kafka we will see the fully-enriched records.
Results of Previewing Data After Transformation and Enrichment
The data delivered to Kafka as JSON looks like this.
{
“locationId“:9,
“productId“:152,
“stock“:1277,
“prevStock“:1383,
“updateTime“:”2018-03-27T17:28:45.000-07:00”,
“description“:”Dorcy 230L ZX Series Flashlight”,
“price“:33.74,
“brand“:”Dorcy”,
“category“:”Industrial”,
“worn“:”Hands”,
“city“:”Dallas”,
“state“:”TX”,
“longitude“:-97.03,
“latitude“:32.9
}
As you can see, it is very straightforward to use the Striim platform to not only integrate streaming data sources using CDC with Apache Kafka, but also to leverage SQL-based stream processing and enrich the data-in-motion without slowing the data flow.
In the next tutorial, I will delve into delivering data in different formats to multiple targets, including cloud blob storage and Hadoop.
In Part 4 of this blog series, we shared how the Striim platform facilitates data processing and preparation, both as it streams in to Kafka, and as it streams out of Kafka to enterprise targets. In this 5th and final post in the “Making the Most of Apache Kafka” series, we will focus on enabling streaming analytics for Kafka data, and wrap it up with a discussion of some of Striim’s enterprise-grade features: scalability, reliability (including exactly once processing), and built-in security.
Streaming Analytics for Kafka
To perform analytics on streaming Kafka data, you probably don’t want to deliver the data to Hadoop or a database before analyzing it, because you will lose the real-time nature of Kafka. You need to do the analytics in-memory, as the data is flowing through, and be able to surface the results of the analytics through visualizations in a dashboard.
Kafka analytics can involve correlation of data across data streams, looking for patterns or anomalies, making predictions, understanding behavior, or simply visualizing data in a way that makes it interactive and interrogable.
The Striim platform enables you to perform Kafka analytics in-memory, in the same way as you do processing – through SQL-based continuous queries. These queries can join data streams together to perform correlation, and look for patterns (or specific sequences of events over time) across one or more data streams utilizing an extensive pattern-matching syntax.
Continuous statistical functions and conditional logic enable anomaly detection, while built-in regression algorithms enable predictions into the future based on current events.
Of course, analytics can also be rooted in understanding large datasets. Striim customers have integrated machine learning into data flows to perform real-time inference and scoring based on existing models. This utilizes Striim in two ways.
Firstly, as mentioned previously, you can prepare and deliver data from Kafka (and other sources) into storage in your desired format. This enables the real-time population of raw data used to generate machine learning models.
Secondly, once a model has been constructed and exported, you can easily call the model from our SQL, passing real-time data into it, to infer outcomes continuously. The end result is a model that can be frequently updated from current data, and a real-time data flow that can matches new data to the model, spots anomalies or unusual behavior, and enables proactive responses.
The final piece of analytics is visualizing and interacting with data. The Striim Platform UI includes a complete Dashboard builder that an enables custom, use-case-specific dashboards to be rapidly created to effectively highlight real-time data and the results of analytics. With a rich set of visualizations, and simple query-based integration with analytics results, dashboards can be configured to continually update and enable drill-down and in-page filtering.
Putting It All Together
Building a platform that makes the most of Kafka by enabling true stream processing and analytics is not easy. There are multiple major pieces of in-memory technology that have to be integrated seamlessly and tuned in order to be enterprise-grade. This means you have to consider the scalability, reliability and security of the complete end-to-end architecture, not just a single piece.
Joining streaming data with data cached in an in-memory data grid, for example, requires careful architectural consideration to ensure all pieces run in the same memory space, and joins can be performed without expensive and time-consuming remote calls. Continually processing and analyzing hundreds of thousands, or millions, of events per second across a cluster in a reliable fashion is not a simple task, and can take many years of development time.
The Striim Platform has been architected from the ground up to scale, and Striim clusters are inherently reliable with failover, recovery and exactly-once processing guaranteed end-to-end, not just in one slice of the architecture.
Security is also treated holistically, with a single role-based security model protecting everything from individual data streams to complete end-user dashboards.
If you want to make the most of Kafka, you shouldn’t have to architect and build a massive infrastructure, nor should you need an army of developers to craft your required processing and analytics. The Striim Platform enables Data Scientists, Business Analysts and other IT and data professionals to get the most value out of Kafka without having to learn, and code to, APIs.
When delivering data to Kafka, or writing Kafka data to a downstream target like HDFS, it is essential to consider the structure and content of the data you are writing. Based on your use case, you may not require all of the data, only that which matches certain criteria. You may also need to transform the data through string manipulation or data conversion, or only send aggregates to prevent data overload.
Most importantly, you may need to add additional context to the Kafka data. A lot of raw data may need to be joined with additional data to make it useful.
Imagine using CDC to stream changes from a normalized database. If you have designed the database correctly, most of the data fields will be in the form of IDs. This is very efficient for the database, but not very useful for downstream queries or analytics. IoT data can present a similar situation, with device data consisting of a device ID and a few values, without any meaning or context. In both cases, you may want to enrich the raw data with reference data, correlated by the IDs, to produce a denormalized record with sufficient information.
The key tenets of stream processing and data preparation – filtering, transformation, aggregation and enrichment – are essential to any data architecture, and should be easy to apply to your Kafka data without any need for developers or complex APIs.
The Striim Platform simplifies this by using a uniform approach utilizing in-memory continuous queries, with all of the stream processing expressed in a SQL-like language. Anyone with any data background understands SQL, so the constructs are incredibly familiar. Transformations are simple and can utilize both built-in and Java functions, CASE statements and other mechanisms. Filtering is just a WHERE clause.
Aggregations can utilize flexible windows that turn unbounded infinite data streams into continuously changing bounded sets of data. The queries can reference these windows and output data continuously as the windows change. This means a one-minute moving average is just an average function over a one-minute sliding window.
Enrichment requires external data, which is introduced into the Striim Platform through the use of distributed caches (otherwise known as a Data Grid). Caches can be loaded with large amounts of reference data, which is stored in-memory across the cluster. Queries can reference caches in a FROM clause the same way as they reference streams or windows, so joining against a cache is simply a JOIN in a query.
Multiple stream sources, windows and caches can be used and combined together in a single query, and queries can be chained together in directed graphs, known as data flows. All of this can be built through the UI or our scripting language, and can be easily deployed and scaled across a Striim cluster, without having to write any code.
In a recent contributed article for RTInsights, The Rise of Real-Time Data: Prepare for Exponential Growth, I explained how the predicted huge increase in data sources and data volumes will impact the way we need to think about data.
The key takeaway is that, if we can’t possibly store all the data being generated, “the only logical conclusion is that it must be collected, processed and analyzed in-memory, in real-time, close to where the data is generated.”
The article explains general concepts, but doesn’t go into details of how this can be achieved in a practical sense. The purpose of this post is to dive deeper by showing how Striim can be utilized for data modernization tasks, and help companies handle the oncoming tsunami of data.
The first thing to understand is that Striim is a complete end-to-end, in-memory platform. This means that we do not store data first and analyze it afterwards. Using one of our many collectors to ingest data as it’s being generated, you are fully in the streaming world. All of our processing, enrichment, and analysis is performed in-memory using arbitrarily complex data flows.
This diagram shows how Striim combines multiple, previously separate, in-memory components to provide an easy-to-use platform – a new breed of middleware – that only requires knowledge of SQL to be productive.
It is the use of SQL that makes filtering, transformation, aggregation and enrichment of data so easy. Almost all developers, business analysis and data scientists know SQL, and through our time-series extensions, windows and complex event processing syntax, it’s quite simple to do all of these things.
Let’s start with something easy first – filtering. Anyone that knows SQL will recognize immediately that filtering is done with a WHERE clause. Our platform is no different. Here’s an example piece of a large data flow that analyzes web and application activity for SLA monitoring purposes.
The application contains many parts, but this aspect of the data flow is really simple. The source is a real-time feed from Log4J files. In this data flow, we only care about the errors and warnings, so we need to filter out everything but them. The highlighted query does just that. Only Log4J entries with status ERROR or WARN will make it to the next stage of the processing.
If you have hundreds of servers generating files, you don’t need the excess traffic and storage for the unwanted entries; they can be filtered at the edge.
Aggregation is similarly obvious to anyone that knows SQL – you use aggregate functions and GROUP BY. However, for streaming real-time data you need to add in an additional concept – windows. You can’t simply aggregate data on a stream because it is inherently unbounded and continuous. Any aggregate would just keep on increasing forever. You need to set bounds, and this is where windows come in.
In this example on the right, we have a 10-second window of sensor data, and we will output new aggregates for each sensor whenever the window changes.
This query could then be used to detect anomalous behavior, based on values jumping two standard deviations up or down, or extended to calculate other statistical functions.
The final basic concept to understand is enrichment – this is akin to a JOIN in SQL, but has been optimized to function for streaming real-time data. Key to this is the converged in-memory architecture and Striim’s inclusion of a built-in In-Memory Data Grid. Striim’s clustered architecture has been designed specifically to enable large amounts of data to be loaded in distributed caches, and joined with streaming data without slowing down the data flow. Customers have loaded tens of millions of records into memory, and still maintained very high throughput and low latency in their applications.
The example on the left is taken from one of our sample applications. Data is coming from point of sale machines, and has already been aggregated by merchant by the time it reaches this query.
Here we are joining with address information that includes a latitude and longitude, and merchant data to enrich the original record.
Previously, we only had the merchant id to work with, without any further meaning. Having this additional context makes the data more understandable, and enhances our ability to perform analytics.
While these things are important for streaming integration of enterprise data, they are essential in the world of IoT. But, as I mentioned in my previous blog post, Why Striim Is Repeatedly Recognized as the Best IoT Solution, IoT is not a single technology or market… it is an eco-system and does not belong in a silo. You need to think of IoT data as part of the corporate data assets, and increase its value by correlating with other enterprise data.
As the data volumes increase, more and more processing and analytics will be pushed to the edge, so it is important to consider a flexible architecture like Striim’s that enables applications to be split between the edge, on-premise and the cloud.
So how can Striim help you prepare for exponential growth in data volumes? You can start by transitioning, use-case by use-case, to a streaming-first architecture, collecting data in real-time rather than batches. This will ensure that data flows are continuous and predictable. As the data volumes increase, collection, processing and analytics can all be scaled by adding more edge, on-premise, and cloud servers. Over time, more and more processing and analytics is handled in real-time, and the tsunami of data becomes something you have planned for and can manage.
Today, we are thrilled to announce the availability of Hazelcast Striim Hot Cache. This joint solution with Hazelcast’s in-memory data grid uses Striim’s Change Data Capture to solve the cache consistency problem.
With Hazelcast Striim Hot Cache, you can reduce the latency of propagation of data from your backend database into your Hazelcast cache to milliseconds. Now you have the flexibility to run multiple applications off a single database, keeping Hazelcast cache refreshes up-to-date while adhering to low latency SLAs.
Check out this 5-minute Introduction and Demo of Hazelcast Striim Hot Cache:
https://www.youtube.com/watch?v=B1PYcIQmya4
Imagine that you have an application that works by retrieving and storing information in a database. To get faster response times, you utilize a Hazelcast in-memory cache for rapid access to data.
However, other applications also make database updates which leads to inconsistent data in the cache. When this happens, suddenly the application is showing out-of-date or invalid information.
Hazelcast Striim Hot Cache solves this by using streaming change data capture to synchronize the cache with the database in real time. This ensures that both the cache and associated application always have the most up-to-date data.
Through CDC, Striim is able to recognize which tables and key values have changed. Striim immediately captures these changes with their table and key, and, using the Hazelcast Striim writer, pushes those changes into the cache.
We make it easy to leverage Striim’s change data capture functionality by providing CDC Wizards. These Wizards help you quickly configure the capture of change data from enterprise databases – including Oracle, MS SQL Server, MySQL and HPE NonStop – and propagate that data to a Hazelcast cache.
You can also use Striim to facilitate the initial load of the cache.
Competition is stiff. With the onset of Internet protocol TV and “over the top” technology, satellite, telco and cable set-top box providers are scrambling to increase the stickiness of their subscription services. The best way to do this is to provide real-time context marketing for their set-top boxes in order to know the customer’s interests and intentions immediately, and tailor services and offers on-the-fly.
In order to make this happen, these companies need three things:
They need to be able to ingest huge volumes of disparate data from a gazillion set-top boxes around the world.
They need to be able to – in real time – enrich that data with customer information/behavior and historical trends to assess the customer’s interest in-the-moment.
They need to be able to map that enriched data to a set of offers or services while the customer is still present and interested.
The Striim platform helps companies deliver real-time, context marketing applications that addresses all three phases of interaction and analysis. It collects your real-time set top box clickstream data and enriches it with a broad range of contextual data sources such as customer history and past behavior, geolocation, mobile device information, sensors, log files, social media and database transactions.
With Striim’s easy-to-use GUI and SQL-like language, users can rapidly create tailored enterprise-scale, context-driven marketing applications.
The aggregation of real-time and historical information via the set-top box makes it possible for providers to know who is watching right now, where they are, and what their purchasing patterns look like. With this context, providers can instantly deliver the most relevant and effective advertising or offer while the customer is still “present,” giving the provider the best change of motivating the customer to take immediate action.
With the Striim platform, users can deliver a streaming analytics application that constantly integrates real-time actions and location with historical data and trends. Once the customers intentions are identified, they can easily take action to either promote retention or incentivize additional purchases.
Detecting behavior that would be out-of-the-norm may signal a completely new set of advertising opportunities. For example, if a working Mom is at home watching the Disney Channel, it might indicate she is home with a sick child. With streaming analytics and context marketing, this scenario would be detected immediately, and could trigger a set of ads within the customer’s video stream that provide offers for children’s cold and flu medicine.
At its most basic, the goal of log file monitoring is finding things which otherwise would have been missed, such as trends, anomalies, changes, risks, and opportunities. For some firms, log files exist to meet compliance requirements or because software already in use generates them automatically. But for others, analyzing log files – even in real time, as they are created – is incredibly valuable.
In many industries, the speed with which analysis is performed is immaterial. For a personnel-heavy division, for example, looking at employee logs weekly or monthly might provide enough information.
For others, though, the difference between detecting an upsell opportunity while a customer is still on their website, compared to 30 seconds later, could make a difference in what’s purchased. For a smaller subset of applications, real-time monitoring can make the difference between catastrophic failures which could cost millions, and routine maintenance solving the problem.
In general, fields where the mean time to recover from failure is high, and cost of downtime expensive, real-time log file monitoring can prevent costly mistakes and open up otherwise missed opportunities.
Let’s look at two fields that are rapidly adopting real-time analytics: manufacturing and financial services.
Banking & Financial Services
Real-time analysis of log files presents three major opportunities to financial services firms.
First, it allows them the opportunity to make trades faster. Real-time log file monitoring can find network issues and unwanted latency, ensuring that trades are committed when they’re ordered – not later, when the opportunity for arbitrage is entirely passed.
Second, real-time analysis of customer interactions (with ATMs, electronic banking, or even service representatives) provides the opportunity to increase customer satisfaction and even upsell opportunities by noticing trends in behavior as they happen.
Third, real-time analysis of log files is a tremendous boon to security. In a world reliant on technology to support delicate financial systems, real-time analysis may catch network intruders before they can commit crimes. Legacy analysis would find only traces and lost money.
Manufacturing
For manufacturers, especially heavily automated ones, uptime can be critical. Any time that a factory isn’t running because something has gone wrong, it could be losing money both for the company directly, and for any clients downstream who might rely on it to produce intermediate goods.
In these circumstances, real-time monitoring can alleviate risks. Analyzing logs daily, or even every half-hour, wouldn’t notice a machine malfunctioning until potentially too late. On the other hand, real-time analysis can detect failure before it spreads from one machine into the next part of an assembly line.
Real-time analysis can also provide opportunities for manufacturers to streamline operations. In cases where factory equipment is heavily specialized, for example, repair parts can take days or weeks to arrive, all of which is downtime.
Weekly log analysis likely wouldn’t detect parts beginning to wear down until it’s too late. Real-time analysis, on the other hand, allows factory operators to purchase replacement parts preemptively, thereby minimizing or eliminating downtime.
Additionally, real-time log file monitoring in the manufacturing sector can allow companies to keep smaller quantities of inventory or intermediate products on hand. This can help to lower costs and streamline operations.
Ultimately, not every company or business unit will gain tremendous value from real-time analysis. Most, however, will find far more value in under-utilized log files than they expect.
As costs come down and real-time analysis proliferates, it would be prudent for companies to make sure they’re ahead of the curve, or at least tracking it as it evolves.
The key factor that makes real-time visualization preferable to batch or event-driven visualization is the requirement for immediacy of decision making, which tends to be role-based. A C-suite officer, for example, is unlikely to look at one visual representation of any data and change the strategy their company is taking.
Conversely, real-time visualization can be tremendously helpful to individuals who must make tactical or operational decisions on the fly.
But before looking at specific uses for real-time data visualization, let’s consider what kinds of use cases most benefit from visualizingin real time. They can generally be broken down into two categories:
Those which allow individuals or firms to better deal with risk, both managing it and responding when something goes wrong
Those which allow them to exploit rapidly emerging opportunities before they disappear
These circumstances, where action must be taken quickly, are where real-time visualizations shine in providing additional context for decision makers.
Use Case 1: Crisis Management
Perhaps the greatest value of real-time visualization in handling risk comes from informing decision makers who need to respond to emergent events. If a storm is on track to destroy a data center, retail outlet, or any part of a firm’s infrastructure or supply chain, for example, real-time visualization can be tremendously helpful.
Descriptive analytics delivered periodically do little for a decision maker concerned with getting customer services up immediately – by the time any analysis is available, the situation is likely to have changed.
Conversely, real-time visualization of assets in a variety of geographic locations allows decision makers to allocate resources where they’re needed most, which can be the difference between keeping and losing customers in industries where uptime is critical.
Use Case 2: Security and Fraud Prevention
In addition to giving firms options for responding to risky situations, real-time visualizations provide tremendous opportunity for reducing risk in day-to-day operations. The ability to centralize and visualize the output from all the sensors a firm has (for example, security cameras, burglar alarms, RFID tags on valuable assets, etc.) allows a single person to monitor billions of dollars’ worth of globally distributed property from one place.
This also makes it easier to find individuals who are attempting to defraud or otherwise steal from a firm before they’ve gotten away with it, because real-time visualizations can alert managers and decision makers to suspicious behavior before fraud actually occurs.
Use Case 3: Resource Management
This use case sits between risk and opportunity, and represents a unique chance for firms to maximize the value they get from existing resources.
Real-time visualization can aide managers in discovering inefficiencies and correct them long before legacy analysis would have signaled an anomaly. If, for example, a service vehicle goes out of commission midday, real-time visualization allows regional managers to react more efficiently and make better decisions with all the available information in front of them.
Use Case 4: Sales
Real-time data visualization opens up great opportunities for firms attempting to make more sales, both in brick-and-mortar institutions and in ecommerce.
Real-time analytics give firms the option to provide customers with contextual suggestions – for example, a supermarket suggesting a recipe using mostly ingredients already in a customer’s cart.
Combine this with more efficient inventory management (restocking hot items more quickly when they sell out), and real-time visualization gives firms a tremendous amount of flexibility to get more products out to consumers.
Use Case 5: Purchasing Decisions
For firms heavily reliant on the purchasing of commodities for their operations, the ability to visualize market trends in real time provides a great deal of added value. It means utilities can buy oil at its cheapest point, and international firms can capitalize on changes in foreign exchange markets rapidly.
Batch or event-driven visualization could have firms buying hours after prices hit their low, whereas real-time processing will alert firms to cheap inputs, resulting in huge cost savings.
Ultimately, firms across a wide variety of markets would do well to consider real-time visualization technology. Perhaps it won’t change their strategic direction, but operational optimizations have the potential to save real money.
We’d like to demonstrate how you can migrate Oracle data to Microsoft Azure SQL Server running in the cloud, in real time, using Striim and change data capture (CDC).
People often have data in lots of Oracle tables, on-premise. They want to migrate Oracle data into Microsoft Azure SQL Server, in real-time. How do you go about moving data from Oracle to Azure without affecting your production databases?
https://www.youtube.com/watch?v=iglW9aJCUlE
You can’t use SQL queries because typically these would be queries against a timestamp – like table scans that you do over and over again – and that puts a load on the Oracle Database. You might also skip important transactions. You need change data capture (CDC) which enables non-intrusive collection of streaming database change.
Striim provides change data capture as a collector out of the box. This enables real-time collection of change data from Oracle SQL Server and MySQL. CDC works because databases write all the operations that occur into transaction logs. Change data capture listens to those transaction marks, instead of using triggers or timestamps, and directly reads these logs to collect operations. This means that every DML operation – every insert, update, and delete – is written to the logs captured by change data capture and turned into events by our platform.
In this demo, you will see how you can utilize Striim to do real-time collection of change data capture from Oracle Database and deliver that data, in real-time, into Microsoft Azure SQL Server. We also build a custom monitoring solution of the whole end-to-end data flow. The demo starts at the 1:43 mark.
Connect to Microsoft Azure SQL Server
First, we connect to Microsoft Azure SQL Server. In this instance, we have two tables: TCUSTOMER and TCUSTORD, that we can show are currently completely empty. We use a data flow that we’ve built in Striim to capture data from an on-premise Oracle database using change data capture. You can see the configuration properties, and deliver the data (after doing some processing) into Microsoft Azure SQL Server.
To show this, we run some SQL against Oracle. This SQL does a combination of inserts, updates, and deletes against our two Oracle tables. When we run this, you can see the data immediately in the initial stream. That data stream is then split into multiple processing steps and then delivered into a Azure SQL Server. If we redo the query against our Azure tables, you can see that the previously empty tables now have data in them. That data was delivered live and will continue to be delivered in a streaming fashion as long as changes are happening in the Oracle database.
In addition to the data movement, we’ve also built a monitoring application complete with dashboard that shows data flowing through the various tables, the types of operations occurring, and the entire end-to-end transaction lag. This shows the difference between when a transaction was committed on the source system, and when it was captured and applied to the target. You can also see some of the most recent transactions.
This monitoring application was built, again, using a data flow within the Striim platform. This data flow uses the original streaming change data from the Oracle Database and then applies some processing in the form of SQL queries to generate statistics. In addition to generating data for the dashboard, you can also use this as rules to generate alerts for thresholds, etc. The dashboard itself is not hard-coded. It’s generated using a dashboard builder which utilizes queries to connect to the back-end. Each visualization is powered by a query against the back-end data. There are lots of visualizations to choose from.
We hope you have enjoyed seeing how to migrate Oracle data into the cloud using Striim via the Oracle to Azure demo. If you would like a more in-depth look at this application, please request a demo with one of our lead technologists.