Making the Most of Apache Kafka – Data Processing and Preparation for Kafka

In Part 3 of this blog series, we discussed how the Striim platform facilitates moving Kafka data to a wide variety of enterprise targets, including Hadoop and Cloud environments. In this post, we focus on in-stream Kafka data processing and preparation, whether streaming data to Kafka, or from Kafka to enterprise targets.

Kafka Data Processing and Preparation

When delivering data to Kafka, or writing Kafka data to a downstream target like HDFS, it is essential to consider the structure and content of the data you are writing. Based on your use case, you may not require all of the data, only that which matches certain criteria. You may also need to transform the data through string manipulation or data conversion, or only send aggregates to prevent data overload.

Most importantly, you may need to add additional context to the Kafka data. A lot of raw data may need to be joined with additional data to make it useful.

Imagine using CDC to stream changes from a normalized database. If you have designed the database correctly, most of the data fields will be in the form of IDs. This is very efficient for the database, but not very useful for downstream queries or analytics. IoT data can present a similar situation, with device data consisting of a device ID and a few values, without any meaning or context. In both cases, you may want to enrich the raw data with reference data, correlated by the IDs, to produce a denormalized record with sufficient information.

The key tenets of stream processing and data preparation – filtering, transformation, aggregation and enrichment – are essential to any data architecture, and should be easy to apply to your Kafka data without any need for developers or complex APIs.

The Striim Platform simplifies this by using a uniform approach utilizing in-memory continuous queries, with all of the stream processing expressed in a SQL-like language. Anyone with any data background understands SQL, so the constructs are incredibly familiar. Transformations are simple and can utilize both built-in and Java functions, CASE statements and other mechanisms. Filtering is just a WHERE clause.

Aggregations can utilize flexible windows that turn unbounded infinite data streams into continuously changing bounded sets of data. The queries can reference these windows and output data continuously as the windows change. This means a one-minute moving average is just an average function over a one-minute sliding window.

Enrichment requires external data, which is introduced into the Striim Platform through the use of distributed caches (otherwise known as a Data Grid). Caches can be loaded with large amounts of reference data, which is stored in-memory across the cluster. Queries can reference caches in a FROM clause the same way as they reference streams or windows, so joining against a cache is simply a JOIN in a query.

Multiple stream sources, windows and caches can be used and combined together in a single query, and queries can be chained together in directed graphs, known as data flows. All of this can be built through the UI or our scripting language, and can be easily deployed and scaled across a Striim cluster, without having to write any code.

For more information on Striim’s latest enhancements relating to Kafka, please read this week’s press release, “New Striim Release Further Bolsters SQL-based Streaming and Database Connectivity for Kafka.” Or download the Striim platform for Kafka and try it for yourself.

Continue reading this series with Part 5: “Making the Most of Apache Kafka,” – Streaming Analytics for Kafka.

The Rise of Real-Time Data: How Striim Helps You Prepare for Exponential Growth

In a recent contributed article for RTInsights, The Rise of Real-Time Data: Prepare for Exponential Growth, I explained how the predicted huge increase in data sources and data volumes will impact the way we need to think about data.

The key takeaway is that, if we can’t possibly store all the data being generated, “the only logical conclusion is that it must be collected, processed and analyzed in-memory, in real-time, close to where the data is generated.”

The article explains general concepts, but doesn’t go into details of how this can be achieved in a practical sense. The purpose of this post is to dive deeper by showing how Striim can be utilized for data modernization tasks, and help companies handle the oncoming tsunami of data.

The first thing to understand is that Striim is a complete end-to-end, in-memory platform. This means that we do not store data first and analyze it afterwards. Using one of our many collectors to ingest data as it’s being generated, you are fully in the streaming world. All of our processing, enrichment, and analysis is performed in-memory using arbitrarily complex data flows.

This diagram shows how Striim combines multiple, previously separate, in-memory components to provide an easy-to-use platform – a new breed of middleware – that only requires knowledge of SQL to be productive.

It is the use of SQL that makes filtering, transformation, aggregation and enrichment of data so easy. Almost all developers, business analysis and data scientists know SQL, and through our time-series extensions, windows and complex event processing syntax, it’s quite simple to do all of these things.

Let’s start with something easy first – filtering. Anyone that knows SQL will recognize immediately that filtering is done with a WHERE clause. Our platform is no different. Here’s an example piece of a large data flow that analyzes web and application activity for SLA monitoring purposes.

The application contains many parts, but this aspect of the data flow is really simple. The source is a real-time feed from Log4J files. In this data flow, we only care about the errors and warnings, so we need to filter out everything but them. The highlighted query does just that. Only Log4J entries with status ERROR or WARN will make it to the next stage of the processing.

If you have hundreds of servers generating files, you don’t need the excess traffic and storage for the unwanted entries; they can be filtered at the edge.

Aggregation is similarly obvious to anyone that knows SQL – you use aggregate functions and GROUP BY. However, for streaming real-time data you need to add in an additional concept – windows. You can’t simply aggregate data on a stream because it is inherently unbounded and continuous. Any aggregate would just keep on increasing forever. You need to set bounds, and this is where windows come in.

In this example on the right, we have a 10-second window of sensor data, and we will output new aggregates for each sensor whenever the window changes.

This query could then be used to detect anomalous behavior, based on values jumping two standard deviations up or down, or extended to calculate other statistical functions.

The final basic concept to understand is enrichment – this is akin to a JOIN in SQL, but has been optimized to function for streaming real-time data. Key to this is the converged in-memory architecture and Striim’s inclusion of a built-in In-Memory Data Grid. Striim’s clustered architecture has been designed specifically to enable large amounts of data to be loaded in distributed caches, and joined with streaming data without slowing down the data flow. Customers have loaded tens of millions of records into memory, and still maintained very high throughput and low latency in their applications.

The example on the left is taken from one of our sample applications. Data is coming from point of sale machines, and has already been aggregated by merchant by the time it reaches this query.

Here we are joining with address information that includes a latitude and longitude, and merchant data to enrich the original record.

Previously, we only had the merchant id to work with, without any further meaning. Having this additional context makes the data more understandable, and enhances our ability to perform analytics.

While these things are important for streaming integration of enterprise data, they are essential in the world of IoT. But, as I mentioned in my previous blog post, Why Striim Is Repeatedly Recognized as the Best IoT Solution, IoT is not a single technology or market… it is an eco-system and does not belong in a silo. You need to think of IoT data as part of the corporate data assets, and increase its value by correlating with other enterprise data.

As the data volumes increase, more and more processing and analytics will be pushed to the edge, so it is important to consider a flexible architecture like Striim’s that enables applications to be split between the edge, on-premise and the cloud.

So how can Striim help you prepare for exponential growth in data volumes? You can start by transitioning, use-case by use-case, to a streaming-first architecture, collecting data in real-time rather than batches. This will ensure that data flows are continuous and predictable. As the data volumes increase, collection, processing and analytics can all be scaled by adding more edge, on-premise, and cloud servers. Over time, more and more processing and analytics is handled in real-time, and the tsunami of data becomes something you have planned for and can manage.

Real-Time Collection, Enrichment and Analysis of Set-Top Box Data

Competition is stiff. With the onset of Internet protocol TV and “over the top” technology, satellite, telco and cable set-top box providers are scrambling to increase the stickiness of their subscription services. The best way to do this is to provide real-time context marketing for their set-top boxes in order to know the customer’s interests and intentions immediately, and tailor services and offers on-the-fly.

In order to make this happen, these companies need three things:

  • They need to be able to ingest huge volumes of disparate data from a gazillion set-top boxes around the world.
  • They need to be able to – in real time – enrich that data with customer information/behavior and historical trends to assess the customer’s interest in-the-moment.
  • They need to be able to map that enriched data to a set of offers or services while the customer is still present and interested.

The Striim platform helps companies deliver real-time, context marketing applications that addresses all three phases of interaction and analysis. It collects your real-time set top box clickstream data and enriches it with a broad range of contextual data sources such as customer history and past behavior, geolocation, mobile device information, sensors, log files, social media and database transactions.

With Striim’s easy-to-use GUI and SQL-like language, users can rapidly create tailored enterprise-scale, context-driven marketing applications.

The aggregation of real-time and historical information via the set-top box makes it possible for providers to know who is watching right now, where they are, and what their purchasing patterns look like. With this context, providers can instantly deliver the most relevant and effective advertising or offer while the customer is still “present,” giving the provider the best change of motivating the customer to take immediate action.

With the Striim platform, users can deliver a streaming analytics application that constantly integrates real-time actions and location with historical data and trends. Once the customers intentions are identified, they can easily take action to either promote retention or incentivize additional purchases.

Detecting behavior that would be out-of-the-norm may signal a completely new set of advertising opportunities. For example, if a working Mom is at home watching the Disney Channel, it might indicate she is home with a sick child. With streaming analytics and context marketing, this scenario would be detected immediately, and could trigger a set of ads within the customer’s video stream that provide offers for children’s cold and flu medicine.

+ READ MORE

Real-World Examples of Real-Time Log File Monitoring

 

 

At its most basic, the goal of log file monitoring is finding things which otherwise would have been missed, such as trends, anomalies, changes, risks, and opportunities. For some firms, log files exist to meet compliance requirements or because software already in use generates them automatically. But for others, analyzing log files – even in real time, as they are created – is incredibly valuable.

In many industries, the speed with which analysis is performed is immaterial. For a personnel-heavy division, for example, looking at employee logs weekly or monthly might provide enough information.

For others, though, the difference between detecting an upsell opportunity while a customer is still on their website, compared to 30 seconds later, could make a difference in what’s purchased. For a smaller subset of applications, real-time monitoring can make the difference between catastrophic failures which could cost millions, and routine maintenance solving the problem.

In general, fields where the mean time to recover from failure is high, and cost of downtime expensive, real-time log file monitoring can prevent costly mistakes and open up otherwise missed opportunities.

Let’s look at two fields that are rapidly adopting real-time analytics: manufacturing and financial services.

Banking & Financial Services

Real-time analysis of log files presents three major opportunities to financial services firms.

First, it allows them the opportunity to make trades faster. Real-time log file monitoring can find network issues and unwanted latency, ensuring that trades are committed when they’re ordered – not later, when the opportunity for arbitrage is entirely passed.

Second, real-time analysis of customer interactions (with ATMs, electronic banking, or even service representatives) provides the opportunity to increase customer satisfaction and even upsell opportunities by noticing trends in behavior as they happen.

Third, real-time analysis of log files is a tremendous boon to security. In a world reliant on technology to support delicate financial systems, real-time analysis may catch network intruders before they can commit crimes. Legacy analysis would find only traces and lost money.

Manufacturing

For manufacturers, especially heavily automated ones, uptime can be critical. Any time that a factory isn’t running because something has gone wrong, it could be losing money both for the company directly, and for any clients downstream who might rely on it to produce intermediate goods.

In these circumstances, real-time monitoring can alleviate risks. Analyzing logs daily, or even every half-hour, wouldn’t notice a machine malfunctioning until potentially too late. On the other hand, real-time analysis can detect failure before it spreads from one machine into the next part of an assembly line.

Real-time analysis can also provide opportunities for manufacturers to streamline operations. In cases where factory equipment is heavily specialized, for example, repair parts can take days or weeks to arrive, all of which is downtime.

Weekly log analysis likely wouldn’t detect parts beginning to wear down until it’s too late. Real-time analysis, on the other hand, allows factory operators to purchase replacement parts preemptively, thereby minimizing or eliminating downtime.

Additionally, real-time log file monitoring in the manufacturing sector can allow companies to keep smaller quantities of inventory or intermediate products on hand. This can help to lower costs and streamline operations.

Ultimately, not every company or business unit will gain tremendous value from real-time analysis. Most, however, will find far more value in under-utilized log files than they expect.

As costs come down and real-time analysis proliferates, it would be prudent for companies to make sure they’re ahead of the curve, or at least tracking it as it evolves.

5 Uses for Real-Time Visualization

 

 

The key factor that makes real-time visualization preferable to batch or event-driven visualization is the requirement for immediacy of decision making, which tends to be role-based. A C-suite officer, for example, is unlikely to look at one visual representation of any data and change the strategy their company is taking.Real-time visualization for financial services security, fraud

Conversely, real-time visualization can be tremendously helpful to individuals who must make tactical or operational decisions on the fly.

But before looking at specific uses for real-time data visualization, let’s consider what kinds of use cases most benefit from visualizingin real time. They can generally be broken down into two categories:

  1. Those which allow individuals or firms to better deal with risk, both managing it and responding when something goes wrong
  2. Those which allow them to exploit rapidly emerging opportunities before they disappear

These circumstances, where action must be taken quickly, are where real-time visualizations shine in providing additional context for decision makers.

Use Case 1: Crisis Management

Perhaps the greatest value of real-time visualization in handling risk comes from informing decision makers who need to respond to emergent events. If a storm is on track to destroy a data center, retail outlet, or any part of a firm’s infrastructure or supply chain, for example, real-time visualization can be tremendously helpful.

Descriptive analytics delivered periodically do little for a decision maker concerned with getting customer services up immediately – by the time any analysis is available, the situation is likely to have changed.

Conversely, real-time visualization of assets in a variety of geographic locations allows decision makers to allocate resources where they’re needed most, which can be the difference between keeping and losing customers in industries where uptime is critical.

Use Case 2: Security and Fraud Prevention

In addition to giving firms options for responding to risky situations, real-time visualizations provide tremendous opportunity for reducing risk in day-to-day operations. The ability to centralize and visualize the output from all the sensors a firm has (for example, security cameras, burglar alarms, RFID tags on valuable assets, etc.) allows a single person to monitor billions of dollars’ worth of globally distributed property from one place.

This also makes it easier to find individuals who are attempting to defraud or otherwise steal from a firm before they’ve gotten away with it, because real-time visualizations can alert managers and decision makers to suspicious behavior before fraud actually occurs.

Use Case 3: Resource Management

This use case sits between risk and opportunity, and represents a unique chance for firms to maximize the value they get from existing resources.

Real-time visualization can aide managers in discovering inefficiencies and correct them long before legacy analysis would have signaled an anomaly. If, for example, a service vehicle goes out of commission midday, real-time visualization allows regional managers to react more efficiently and make better decisions with all the available information in front of them.

Use Case 4: Sales

Real-time data visualization opens up great opportunities for firms attempting to make more sales, both in brick-and-mortar institutions and in ecommerce.

Real-time analytics give firms the option to provide customers with contextual suggestions – for example, a supermarket suggesting a recipe using mostly ingredients already in a customer’s cart.

Combine this with more efficient inventory management (restocking hot items more quickly when they sell out), and real-time visualization gives firms a tremendous amount of flexibility to get more products out to consumers.

Use Case 5: Purchasing Decisions

For firms heavily reliant on the purchasing of commodities for their operations, the ability to visualize market trends in real time provides a great deal of added value. It means utilities can buy oil at its cheapest point, and international firms can capitalize on changes in foreign exchange markets rapidly.

Batch or event-driven visualization could have firms buying hours after prices hit their low, whereas real-time processing will alert firms to cheap inputs, resulting in huge cost savings.

Ultimately, firms across a wide variety of markets would do well to consider real-time visualization technology. Perhaps it won’t change their strategic direction, but operational optimizations have the potential to save real money.

Demo: Migrate Oracle Data to Azure in Real Time

Overview

We’d like to demonstrate how you can migrate Oracle data to Microsoft Azure SQL Server running in the cloud, in real time, using Striim and change data capture (CDC).

People often have data in lots of Oracle tables, on-premise. They want to migrate Oracle data into Microsoft Azure SQL Server, in real-time. How do you go about moving data from Oracle to Azure without affecting your production databases?

https://www.youtube.com/watch?v=iglW9aJCUlE

You can’t use SQL queries because typically these would be queries against a timestamp – like table scans that you do over and over again – and that puts a load on the Oracle Database. You might also skip important transactions. You need change data capture (CDC) which enables non-intrusive collection of streaming database change.
Migrate Oracle Data to Azure in Real Time

Striim provides change data capture as a collector out of the box. This enables real-time collection of change data from Oracle SQL Server and MySQL. CDC works because databases write all the operations that occur into transaction logs. Change data capture listens to those transaction marks, instead of using triggers or timestamps, and directly reads these logs to collect operations. This means that every DML operation – every insert, update, and delete – is written to the logs captured by change data capture and turned into events by our platform.

Migrate Oracle Data to Azure in Real Time

In this demo, you will see how you can utilize Striim to do real-time collection of change data capture from Oracle Database and deliver that data, in real-time, into Microsoft Azure SQL Server. We also build a custom monitoring solution of the whole end-to-end data flow. The demo starts at the 1:43 mark.

Connect to Microsoft Azure SQL Server

First, we connect to Microsoft Azure SQL Server. In this instance, we have two tables: TCUSTOMER and TCUSTORD, that we can show are currently completely empty. We use a data flow that we’ve built in Striim to capture data from an on-premise Oracle database using change data capture. You can see the configuration properties, and deliver the data (after doing some processing) into Microsoft Azure SQL Server.

To show this, we run some SQL against Oracle. This SQL does a combination of inserts, updates, and deletes against our two Oracle tables. When we run this, you can see the data immediately in the initial stream. That data stream is then split into multiple processing steps and then delivered into a Azure SQL Server. If we redo the query against our Azure tables, you can see that the previously empty tables now have data in them. That data was delivered live and will continue to be delivered in a streaming fashion as long as changes are happening in the Oracle database.

In addition to the data movement, we’ve also built a monitoring application complete with dashboard that shows data flowing through the various tables, the types of operations occurring, and the entire end-to-end transaction lag. This shows the difference between when a transaction was committed on the source system, and when it was captured and applied to the target. You can also see some of the most recent transactions.

Migrate Oracle Data to Azure in Real Time

This monitoring application was built, again, using a data flow within the Striim platform. This data flow uses the original streaming change data from the Oracle Database and then applies some processing in the form of SQL queries to generate statistics. In addition to generating data for the dashboard, you can also use this as rules to generate alerts for thresholds, etc. The dashboard itself is not hard-coded. It’s generated using a dashboard builder which utilizes queries to connect to the back-end. Each visualization is powered by a query against the back-end data. There are lots of visualizations to choose from.

We hope you have enjoyed seeing how to migrate Oracle data into the cloud using Striim via the Oracle to Azure demo. If you would like a more in-depth look at this application, please request a demo with one of our lead technologists.

Making In-Memory Computing Enterprise Grade – Overview

4 Major Components for Mission-Critical IMC Processing

This is the first blog in a six-part series on making In-Memory Computing Enterprise Grade. Read the entire series:

  1. Part 1: overview
  2. Part 2: data architecture
  3. Part 3: scalability
  4. Part 4: reliability
  5. Part 5: security
  6. Part 6: integration

If you are looking to create an end-to-end in-memory streaming platform that is used by Enterprises for mission critical applications, it is essential that the platform is Enterprise Grade. In a recent presentation at the In-Memory Computing Summit, I was asked to explain exactly what this means, and divulge the best practices to achieving an enterprise-grade, in-memory computing architecture based on what we have learned in building the Striim platform.

Making In-Memory Computing Enterprise Grade

There are four major components to an enterprise-grade, in-memory computing platform: namely scalability, reliability, security and integration.

Scalability is not just about being able to add additional boxes, or spin up additional VMs in Amazon. It is about being able increase the overall throughput of a system to be able to deal with an expanded workload. This needs to take into account not just an increase in the amount of data being ingested, but also additional processing load (more queries on the same data) without slowing down the ingest. You also need to take into account scaling the volume of data you need to hold in-memory and any persistent storage you may need. All of this should happen as transparently as possible without impacting running data flows.

For mission-critical enterprise applications, Reliability is an absolutely requirement. In-memory processing and data-flows should never stop, and should guarantee processing of all data. In many cases, it is also imperative that results are generated once-and-only-once, even in the case of failure and recovery. If you are doing distributed in-memory processing, data will be partitioned over many nodes. If a single node fails, the system not only needs to pick up from where the failed node left off, it also needs to repartition over remaining nodes, recover state, and know what results have been written where.

Another key requirement is Security. The overall system needs an end-to-end authentication and authorization mechanism to protect data flow components and any external touch points. For example, a user who is able to see the end results of processing in a dashboard may not have the authority to query an initial data stream that contains personally identifiable information. Additionally any data in-flight should be encrypted. In-memory computing, and the Striim platform specifically, generally does not write intermediate data to disk, but does transmit data between nodes for scalability purposes. This inter-node data should be encrypted, especially over standard messaging frameworks such as Kafka that could easily be tapped into.

The final Enterprise Grade requirement is Integration. You can have the most amazing in-memory computing platform, but if it does not integrate with you existing IT infrastructure it is a barren data-less island. There are a number of different things to consider from an integration perspective. Most importantly, you need to get data in and out. You need to be able to harness existing sources, such as databases, log files, messaging systems and devices, in the form of streaming data, and write the results of processing to existing stores such as a data warehouse, data lake, cloud storage or messaging systems. You also need to consider any data you may need to load into memory from external systems for context or enrichment purposes, and existing code or algorithms you may have that may form part of your in-memory processing.

Enterprise Grade Means Scalable, Reliable, Secure, & Integrates Well With Existing Resources

You can build an in-memory streaming platform without taking into account any of these requirements, but it would only be suitable for research or proof-of-concept purposes. If software is going to be used to run mission-critical enterprise data flows, it must address these criteria and follow best practices to play nicely with the rest of the enterprise.

Striim has been designed from the ground-up to be Enterprise Grade, and not only meets these requirements, but does so in an easy-to-use and robust fashion.

In subsequent blogs I will expand upon these ideas, and provide a framework for ensuring your streaming integration and analytics use cases make the grade.

Big Data Streaming Analytics – A Leap Forward!

Here at Striim, we have been living and breathing Big Data Streaming Analytics for four years now. We believe that no Enterprise Data Strategy is complete without Streaming Integration AND Streaming Analytics. In fact, we are successfully helping organizations of all sizes discover the benefits of leveraging streaming integration and intelligence (the two i’s of Striim) to deliver the real-time insights they need.

Striim Recognized by the Forrester Wave as a Strong Performer in Big Data Streaming AnalyticsI therefore find it very encouraging that some of the world’s most respected analysts are also seeing value in this space. Recently Forrester Research published, “The Forrester Wave™: Big Data Streaming Analytics, Q1 2016.” 15 vendors were covered in this report, and it is encouraging to see how thought around this space has matured.

An example of this is the importance of Context. In the latest report, there are a dozen mentions of “context,” including in the subtitle of the report: “Streaming Analytics Are Critical To Building Contextual Insights For Internet-of-Things, Mobile, Web, and Enterprise Applications.”

We started Striim with Context as one of our most critical objectives, and the importance of Context cannot be over-emphasized. Most often the raw data feeds derived from enterprise databases via change data capture (CDC), log files, or IoT do not contain sufficient information to make decisions. In order to ready the data for querying, or to deliver relevant insights, it is almost always necessary to join the raw data with reference or historical information to add context. Striim has been architected from day one to perform this task without slowing down your data flow.

As a relative newcomer to the space, we were very pleased to be considered a Strong Performer in this report, and were impressed by the authors’ keen understanding of what we believe to be our top differentiators.

The only reference to Change Data Capture (CDC) in the entire report relates to Striim. In any streaming architecture, the most effective way to extract real-time information from enterprise applications is to capture the change in their underlying databases as it happens. Whether the application is an in-house CRM solution, Billing System, Point of Sale, or ATM Transactions Processor, the end result of the application is to update a database.

Striim included in the The Forrester Wave™: Big Data Streaming Analytics, Q1 2016 as a strong performer.

Most DBAs strictly forbid running SQL against a production database, so if you want to know what’s happening in these applications, without having to intrusively modify them, you need CDC. Striim is the only streaming analytics platform to provide CDC as a fully integrated component of the platform.

We believe that Streaming Integration is a pre-requisite for Streaming Analytics, and a platform isn’t complete without it. As such, we have ensured that we provide a great number of data collectors (including CDC and IoT) and targets (including Kafka and Cloud), and we made the internal processing of the data easy through our SQL-like language.

We found it extremely astute that the Forrester report cited Complex Event Processing (CEP) capabilities.. This is the ability to spot patterns of events over time across one or more streams; patterns that may indicate something important is happening. We believe that CEP won’t survive as a standalone technology, and is instead a key component of any streaming analytics platform.

There is one aspect of our product that wasn’t highlighted, and that is Streaming Visualization. Anyone who has tried it knows that it is extremely difficult to build dashboards and reports to truly analyze your streaming data in real time, unless that capability is integrated into the platform.

Striim’s real-time dashboards can be built easily using a drag-and-drop interface, and rapidly deliver insights into your analysis. You don’t even need full-blown analytics to use our visualizations. We have customers, for example, who are performing streaming integration from enterprise databases via CDC to Kafka, who simply want to monitor this integration and drill down into specifics through our dashboards.

If you are thinking about Big Data Streaming Analytics, it is important to consider the entire eco-system. The actual analysis part is, in fact, a small piece of the puzzle, and requires that you can first collect, process, enrich and correlate the data in a real-time fashion. Once you have analyzed it, you most likely also need to visualize and report on it, and send alerts for critical events. It’s hard to piece together multiple technologies to achieve this, or to focus all of your efforts on coding when you would rather empower your analysts. Instead, please consider a single end-to-end streaming analytics platform, like Striim, that enables all of this, and more.

Real-Time Financial Transaction Monitoring

 

 

Financial Monitoring Application

Building complex, financial transaction monitoring applications used to be a time-consuming task. Once you had the business case worked out, you needed to work with a team of analysts, DBAs and engineers to design the system, source the data, build, test, and rollout the software. Typically it wouldn’t be correct the first time, so rinse and repeat.

Not so with Striim. In this video you will see a financial transaction monitoring application that was built and deployed in four days. The main use case is to spot increases in the rate at which customer transactions are declined, and alert on that. But a whole host of additional monitoring capabilities were also built into the application. Increasing decline rates often indicate issues with the underlying ATM and Point of Sale networks, and need to be resolved quickly to prevent potential penalties and decline in customer satisfaction.

The application consists of a real-time streaming dashboard, with multiple drill-downs, coupled with a continuous back-end dataflow that is performing the analytics, driving the dashboard and generating alerts. Streaming data is sourced in real time from a SQL Server database using Change Data Capture (CDC), and used to drive a number of analytics pipelines.

The processing logic is all implemented using in-memory continuous queries written in our easy to work with SQL-like language, and the entire application was built using our UI and dashboard builder. The initial CDC data collection goes through some initial data preparation, and is then fed into parallel processing flows. Each flow is analyzing the data in different ways, and storing the results of the processing in our built-in results store to facilitate deeper analysis later.

If you want to learn how to build complex monitoring and analytics applications quickly, take 6 minutes to watch this video.

 

Data Deluge — Striim and NonStop Manage Overwhelming Volume, Variety and Velocity of Data

The time for speculation is long gone. Predictions about the likely future challenges facing IT have become a reality, at least in one aspect. Simply put, we are facing a wall of data coming, rushing towards us and there are few places to hide. The pundits tell us that wisely used it is this data that will separate winners in business from losers. But is it possible to extol the virtue of water to a drowning man?

In Sydney’s famed Opera House, spread across a long curved wall in the vestibule at the rear of the concert hall, facing an amazing view of the harbor is a mural called Five Bells. However, to the locals it’s simply called the last thoughts of a drowning man as it drew inspiration from a poem of many years before by another artist that fell into the harbor and drowned. A nearby war ship evidently sounded five bells at the time the artist was drowning – hence the mural’s official name.

This past week I was driving to the ATMIA US Conference in New Orleans when the skies east of the city opened unleashing a deluge of water, I have not experienced anything like that in many years. Visibility evaporated and traffic was reduced to a crawl and it was a stark reminder of just how great the volume of data heading towards us has become. And the land quickly became water-logged!

Take just a few industries where the increase volume of data is becoming very apparent. This past holiday season ecommerce really took it to bricks and mortar retailers and from the data deluge there’s rising water where great waves are forming that show no signs of breaking any time soon. To the contrary, it is continuing to climb in height and the shoreline is nowhere to be seen. In the January 20, 2016, article Why brick-and-mortar retail faces a shakeout to the publication, Retail Dive, came the hews that, “‘We are right now in the middle of the biggest, most profound transformation in the history of retail,’ Robin Lewis, CEO of the Robin Report and a former executive at VF Corp. and Women’s Wear Daily.”

Furthermore, according to Retail Dive, “‘We’ve now gone to a business where your best customer can be standing in your best store and with three touches of their thumb to a piece of glass, they can buy from your biggest competitor,’ Fred Argir, Chief Digital Officer for Barnes & Noble, told Retail Dive in an interview. ‘That’s changed everything.’” To be fair, the publication also quoted Steve Barr, a partner and the US Retail and Consumer Sector leader at PricewaterhouseCoopers. “The great news is the retail store is not dead,” Barr said. “But the retail store that does not have a meaningful relationship with the consumer is dead.”

Meaningful relationships? Yes, a much better understanding of the behavior of consumers even as we determine the trends these consumers will likely embrace. This means tapping into data as its being generated and picking up the information gems we need as they pass by. And one sure data source clearly is the byproduct of the transactions customers initiate particularly should it involve your competition. I admit, I did hit “Purchase” button on amazon.com while looking at a book at Barnes & Noble, not only because the price was a tad lower but, more importantly, because the site provided the book’s reviews and more information!

When it comes to thumbing a piece of glass, the change under way from simply having a device from which to make a phone call to where communication is visual and global, is sapping resources from all over the digital footprint surrounding each and every smartphone. But the usage patterns that can be derived not only helps mobile phone operators to better tailor messages to individual subscribers but allows them to accumulate metadata on an unprecedented scale and it’s becoming extremely valuable to not just retailers or even bankers, but to industry as a whole.

Whether it’s the purchase of a new car or just an old pair of shoes, ensuring the right product is presented to a consumer motivated to buy it is paramount, but even so, making sense of so much data coming from each and every one of us makes the data deluge all the more difficult to process. Clearly, the former model of capture, store and process antiquated when the challenge lies in integration with the real time world of transaction processing.

With so much being written about the Internet of Things (IoT) and even the Internet of Everything (IoE) there is recognition that as more and more sensors come online the data deluge is going to become even greater. However, much closer to home for all of us – the human body – as we are beginning to hear about, is one giant collection of almost infinite sensors. Imagine the plight of the medical profession, including researchers of every discipline, as steps are taken to mandate real time processing of everything our body generates from its vast array of sensors?

The new model involving capture, process and store offers the only real way to make sense of it all. This is not to say that there remains little of value for data scientists to exploit when data is finally stored but rather, by industry and markets, data by necessity needs to be parsed with only pertinent data ever making it to storage. For the HPE NonStop community this is of growing importance, as NonStop systems are home to many of the most important real time transaction processing business applications on the planet.

Striim is beginning to gain a small foothold within the NonStop community, with its first deployments taking hold. And this is good news for everyone in the NonStop community as even NonStop users are finding it necessary to fight back to keep consumers engaged with their products and services. The data deluge is upon us and the waters are rising fast – bigger and bigger waves are breaking over companies old and new.

We may be in the biggest and most profound transformation in history and it’s not going away and it’s not a time to simply redirect the waters to lakes hidden from view. To survive we need to know our customers and we need to be sensitive to their ever changing behavior and doing so, even as they instigate a transaction, is critical. Drowning is simply not a viable business option!

Back to top