Striim Team

223 Posts

Shifting Data Quality Left, New O’Reilly Book, and Data Contracts with Chad Sanderson & Mark Freeman

Join us as we catch up with Chad Sanderson and Mark Freeman from Gable, live from Big Data London. Discover Chad’s insights from his well-attended talk and why the data scene in London has everyone buzzing. We’re diving deep into the concept of shifting data quality left, ensuring upstream data producers are as invested in data governance, privacy, and quality as their downstream counterparts. Chad and Mark also give us a sneak peek into their upcoming O’Reilly book on Data Contracts, complete with the charming Algerian racer lizard as its symbolic mascot.

In this engaging conversation, Chad and Mark offer practical advice for data operators ready to embark on the journey of data contracts. They emphasize the importance of starting small and nurturing a strong cultural initiative to ensure success. Listen as they share strategies on engaging leadership and fostering a collaborative environment, providing a framework not just for implementation but also for securing leadership buy-in. This episode is packed with expert advice and real-world experiences that are a must-listen for anyone in the data field.

John Kutay chimes in with examples of innovative data operators such as George Tedstone deploying Data Contracts at National Grid. Data Contracts and shifting data quality left will certainly be an area that many data teams prioritize as their workloads become increasingly operational.

Download a preview of “Data Contracts”: https://www.gable.ai/data-contracts-book
Learn more about Gable: https://www.gable.ai/
Follow Chad Sanderson on LinkedIn: https://www.linkedin.com/in/chad-sanderson/
Follow Mark Freeman on LinkedIn: https://www.linkedin.com/in/mafreeman2/ 

Joe Reis at Big Data LDN

Join us as we sit down with Joe Reis, live at Big Data LDN (London) 2024. Joe shares his partnership with DeepLearning.ai and AWS through his new course on Data Engineering. Joe’s new course promises to elevate your data skills with hands-on exercises that marry foundational knowledge with cutting-edge practices. We dive into how this course complements his seminal book, “Fundamentals of Data Engineering,” and why certification is valuable for those looking for foundational, hands-on knowledge to be a data practitioner.

But that’s not all; we also dissect the hurdles of adopting modern data architectures like data mesh in traditionally siloed companies. Using Conway’s Law as a lens, Joe discuss why businesses struggle to transition from outdated infrastructures to decentralized systems and how cross-disciplinary skills—a concept inspired by mixed martial arts—are crucial in this endeavor as he cleverly calls it ‘Mixed Model Arts’.

Check out Joe’s Work:

Fundamentals of Data Engineering book on Amazon: https://a.co/d/8yvabfO
New Coursera courses by Joe Reis:
https://www.coursera.org/instructor/j…

What’s New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What’s New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Unlocking Actionable Insights: Morrisons’ Digital Transformation with Striim and Google Cloud

In the fast-paced world of retail, the ability to harness data effectively is crucial for staying ahead. On September 18, 2024, at Big Data London, Morrisons shared its digital transformation journey through the presentation, “Learn How Morrisons is Accelerating the Availability of Actionable Data at Scale with Google and Striim.”

Peter Laflin, Chief Data Officer at Morrisons, outlined the supermarket chain’s strategic partnership with Striim, a global leader in real-time data integration and streaming, and Google Cloud. This collaboration is pivotal in optimizing Morrisons’ supply chain, improving stock management, and enhancing customer satisfaction through the power of real-time data analytics.

By harnessing Striim’s advanced data platform alongside Google Cloud’s robust infrastructure, Morrisons has effectively integrated and streamlined data from its vast network of over 2,700 farmers and growers supplying raw materials to its manufacturing plants across the UK. This initiative has enabled seamless information flow and real-time visibility across its operations, allowing the supermarket to make quicker, data-driven decisions that directly impact customer experience. Tata Consultancy Services (TCS), Morrisons’ long-standing systems integration partner, has been instrumental in the success of this transformation. TCS worked closely with Morrisons’ teams to ensure the seamless implementation of Striim’s platform, facilitating smooth integration and alignment across operations.

The keynote featured insights from industry experts, including John Kutay, Head of Products at Striim, and Mike Reed, Retail Account Executive at Google, who underscored the transformative impact of innovative data strategies in the retail sector.

As Morrisons continues to embrace this data-driven approach, it sets a new standard for enhancing customer satisfaction and operational efficiency in the competitive retail environment.

Check out the Recap: 

Revolutionizing Data Queries with TextQL: Insights from Co-Founder Ethan Ding

Can AI really make your data analysis as easy as talking to a friend? Join us for an enlightening conversation with Ethan Ding, the co-founder and CEO of TextQL, as he shares his journey from Berkeley graduate to pioneering the text-to-SQL technology that’s transforming how businesses interact with their data. Discover how natural language queries are breaking down barriers, making data analysis accessible to everyone, regardless of technical skill. Ethan delves into the historical hurdles and the game-changing advancements that are pushing the boundaries of AI and large language models in data querying.

Ever wondered how the quest for full autonomy in self-driving cars relates to data querying? We draw fascinating parallels between these two cutting-edge fields, emphasizing the importance of structured systems over chaotic, AI-driven approaches. This chapter reveals the often-overlooked limitations of current data management practices and underscores the critical need for high-quality data and robust modeling. Through a comparison of traditional business intelligence tools and advanced AI-driven solutions, we explore what truly makes data querying effective and insightful.

Hear from Ethan Deng, co-founder and CEO of TextQL, as he explains how their innovative tool integrates seamlessly with existing BI infrastructures, boosting productivity without the need for disruptive overhauls. Tune in to find out how TextQL is making data-driven decisions faster and smarter, paving the way for a future where data is everyone’s best friend.

Follow Ethan Ding and TextQL at:

Small Data, Big Impact: Insights from MotherDuck’s Jacob Matson

What makes MotherDuck and DuckDB a game-changer for data analytics? Join us as we sit down with Jacob Matson, a renowned expert in SQL Server, dbt, and Excel, who recently became a developer advocate at MotherDuck.

During this episode, Jacob shares his compelling journey to MotherDuck, driven by his frequent use of DuckDB for solving data challenges. We explore the unique attributes of DuckDB, comparing it to SQLite for analytics, and uncover its architectural benefits, such as utilizing multi-core machines for parallel query execution. Jacob also sheds light on how MotherDuck is pushing the envelope with their innovative concept of multiplayer analytics.

Our discussion takes a deep dive into MotherDuck’s innovative tenancy model and how it impacts database workloads, highlighting the use of DuckDB format in Wasm for enhanced data visualization. Jacob explains how this approach offers significant compression and faster query performance, making data visualization more interactive. We also touch on the potential and limitations of replacing traditional BI tools with Mosaic, and where MotherDuck stands in the modern data stack landscape, especially for organizations that don’t require the scale of BigQuery or Snowflake. Plus, get a sneak peek into the upcoming Small Data Conference in San Francisco on September 23rd, where we’ll explore how small data solutions can address significant problems without relying on big data. Don’t miss this episode packed with insights on DuckDB and MotherDuck innovations!

Small Data SF Signup
Discount Code: MATSON100

 

Harnessing Continuous Data Streams: Unlocking the Potential of Online Machine Learning

The world is generating an astonishing amount of data every second of every day. It reached 64.2 zettabytes in 2020, and is projected to mushroom to over 180 zettabytes by 2025, according to Statista

Modern problems require modern solutions — which is why businesses across industries are moving away from batch processing and towards real-time data streams, or streaming data. Moreover, the concept of ‘online machine learning’ has emerged as a potential solution for organizations working with data that arrives in a continuous stream or when the dataset is too large to fit into memory.

Today, we’ll walk you through the close connection between successful machine learning and streaming data. You’ll learn potential applications and why online machine learning is an excellent idea.

What is Online Machine Learning? 

Online machine learning is an approach that feeds data to the machine learning model in an incremental manner, which can leverage continuous streams. Instead of being trained on a complete data set all at once, online machine learning allows models to receive data points one at a time or in small batches. This method is especially helpful in scenarios where data is generated continuously, as this enables the model to learn and adapt in real time. 

Applying machine learning to streaming data can help organizations with a wide range of applications. These include fraud detection from real-time financial transactions, real-time operations management (e.g., stock monitoring in the supply chain), or sentiment analysis over live social media trends on Facebook, Twitter, etc. 

“Online ML is the only way forward as old ways of using schedules to run batches do not fit with the growing data volumes and real time expectations,” shares Dmitriy Rudakov, Director of Solution Architecture at Striim. 

Simson Chow, Sr. Cloud Solutions Architect, adds, “Online machine learning allows models to continuously learn from new data and adapt in real-time. This will allow models to rapidly adjust to changing environments and produce accurate, up-to-date predictions. This dynamic approach is crucial in a constantly changing environment, where static models can quickly become outdated and ineffective.” 

What are Potential Use Cases for Online Machine Learning? 

Some instances where online machine learning is particularly impactful include: 

  • When your data has no end and is effectively continuous
  • When your training data is sensitive due to privacy issues, and you are unable to move it to an offline environment
  • When you can’t transfer training data to an offline environment due to device or network limitations
  • When the size of training datasets is too large, making it impossible to fit into the memory of a single machine at a specific time

Online vs Offline Machine Learning: Why Offline Machine Learning Is Not Ideal for Streaming Data

To effectively utilize streaming data for machine learning, traditional batch processing methods fall short. 

These methods, usually referred to as offline or batch learning, can handle static datasets, processing them all at once. However, they’re not equipped to deal with the continuous flow of data in real time. Due to this, taking such an approach is not only resource-intensive but also time-consuming, making it unsuitable for dynamic environments where timely updates are crucial. Let’s dive deeper. 

Online vs Offline Machine Learning: Offline Learning Limitations 

Offline learning systems are limited by their inability to learn incrementally. Each time new data becomes available, the entire model must be retrained from scratch, incorporating both the old and new data into a single dataset. 

“Because traditional batch processing relies on frequently updating models with massive batches of data, it can result in redundant predictions and inadequate responses to new patterns, changes in the data, and more costs as a result of the model’s retraining and re-deployment, requiring significant infrastructure and compute resources,” says Chow. “This makes it unsuitable for various machine learning use cases. Because of this latency, it is not appropriate for real-time applications like online personalization, fraud detection, or autonomous systems where quick decisions are necessary.” 

This process consumes significant computational resources and can result in prolonged downtime as the model is retrained, re-evaluated, and redeployed. While automated tools can streamline this process, the delay in retraining limits the model’s responsiveness, particularly in time-sensitive applications such as financial forecasting.

“There are 2 main reasons traditional batch systems don’t work for customers anymore,” says Dmitriy Rudakov. “The first one is the growing need to act in real time. For example, can you imagine using Uber without a fast real-time response today?” Dmitriy Rudakov also adds that, while traditionally data administrators have tried to time this process to occur at night so it doesn’t interfere with daily operations, “Growing volumes of data [means] batch based training just doesn’t fit the time windows provided.” 

Online vs Offline Machine Learning: Online Learning Advantages

On the contrary, online machine learning can handle streaming data by feeding the model data incrementally. This approach allows the model to update itself in real time as new data arrives, making it highly adaptable to changes and reducing the latency associated with batch learning. For example, in stock price forecasting, where real-time data is crucial, an online learning model can continuously refine its predictions without the need for complete retraining, ensuring that forecasts are always based on the most current information.

 

How Does Online Machine Learning Work? 

Now that you know why online machine learning is the better option, here’s how it works from a technical perspective — and how stream processing plays a role. 

Think of stream processing as the backbone that enables online machine learning to function effectively. It provides the infrastructure to ingest, process, and manage continuous data flows in real-time. This is where Striim comes into play, offering a robust platform designed to handle the complexities of stream processing and real-time data integration.

Striim also captures and processes real-time data from various sources, such as databases, IoT devices, and cloud environments. By leveraging the platform, organizations can seamlessly feed this real-time data into their online machine learning models, allowing them to learn and adapt continuously. Striim’s low-latency data streaming ensures that the online learning models are always working with the most current data, enabling timely and accurate decision-making.

How Online Machine Learning Can Make a Difference

Online machine learning is an approach in which training occurs incrementally by feeding the model data continuously as it arrives from the source. The data from real-time streams are broken down into mini-batches and then fed to the model. Here’s how it can make a difference. 

 

Save Computing Resources 

Online learning is accessible regardless of computing resources. If you have minimal computing resources and a lack of space to store streaming data, you can still leverage it successfully. 

Once an online learning system is done learning from a data stream, it can discard it or move the data to a storage medium, saving your business a significant amount of money and space. Online machine learning doesn’t require powerful and heavy-end hardware to process streaming data. That’s because only one mini-batch is processed in the memory at a time, unlike offline machine learning, where everything has to be processed at once. As a result, you can even use an affordable piece of hardware like Raspberry Pi to perform online machine learning.

“ML can be applied with data streaming systems in two ways,” shares Dmitriy Rudakov. “Model inference, i.e., calling the model in real time, can be done via different CDC techniques. This process does not require a lot of computing resources as the model is already trained, and the real-time app is just accessing it to generate some useful insights. Incidentally, if there is a change of properties in time (drift), the real-time system can make calls to calculate model accuracy scores and initiate retraining via automation. 

Alternatively, training models can be done via the initial load phase, where, for a short period, the system can read and process all relevant data or subsets of data to train the model of choice. Training can also be done in real-time by sending event batches broken into chunks, according to use case needs, to the training modules, which will save computing resources and ensure freshness of models, thus addressing the drift problem.” 

Prevent the occurrence of concept drifts

Online machine learning can also address concept drift — a known problem in machine learning. In machine learning, a ‘concept’ refers to a variable or a quantity that a machine learning model is trying to predict.

The term ‘concept drift’ refers to the phenomenon in which the target concept’s statistical properties change over time. This can be a sudden change in variance, mean, or any other characteristics of data. In online machine learning, the model computes one mini-batch of data at a time and can be updated on the fly. This can help to prevent concept drift as new streams of data are continuously used to update the model.

Learning from large amounts of data streams can help with applications that deal with forecasting, spam filtering, and recommender systems. For example, if a user buys multiple products (e.g., a winter coat and gloves) within a space of minutes on an e-commerce website, an online machine learning model can use this real-time information to recommend products that can complement their purchase (e.g., a scarf). 

Online learning is closely connected to another concept called operationalizing machine learning, as both involve the continuous updating and adaptation of models with real-time data. Online learning enables models to refine their predictions on-the-fly, which is essential for maintaining accuracy in live environments. With this connection in mind, let’s explore how Striim supports these processes to enhance decision-making and operational efficiency.

Operationalizing Machine Learning with Striim

Operationalizing machine learning involves integrating models into live environments to leverage real-time data for continuous predictions and decision-making. This approach tackles challenges like handling high volumes of data, managing the speed at which data is generated and collected, and addressing the variety of data formats. For businesses, operationalizing machine learning translates into real-time insights, agility, improved accuracy, and enhanced operational efficiency.

Striim is an ideal platform for this task, offering comprehensive data movement capabilities crucial for digital transformation. It ingests and processes streaming data in real-time, performing essential transformations, filtering, and enrichment before the data is fed into online learning models. “ The only way to keep the model fresh is leveraging data provided in real time,” shares Dmitriy Rudakov. By continuously feeding these models with fresh data, Striim ensures they can adapt in real-time, keeping predictions and decisions accurate as conditions change.

The connection between operationalizing machine learning and online machine learning is crucial. Online machine learning, which incrementally updates models with new data, ensures continuous learning and adaptation—exactly what’s needed for operationalizing machine learning in dynamic, real-world environments.

To address the challenges of data variety and ensure models stay current, Striim can help you with:

  • Event-driven data capture and processing to train models incrementally.
  • Capturing schema changes from source systems and managing data drift.
  • Handling large volumes of streaming data from multiple sources.
  • Performing filtering, enriching, and data preparation on streaming data.
  • Providing data-driven insights and predictions by integrating trained models with real-time data streams.
  • Tracking data evolution and assessing model performance, enabling automatic retraining with minimal human intervention.

With these capabilities, Striim provides a robust foundation for operationalizing machine learning, supporting continuous, real-time learning and adaptation. Learn more in our guide to operationalizing machine learning

Leverage Striim for Online Machine Learning Use Cases

By combining the strengths of Striim’s real-time data integration with online machine learning, your organization can effectively tackle the challenges of modern data environments. Striim’s platform not only supports seamless data streaming but also enhances the accuracy and relevance of your machine learning models by providing continuous, up-to-date insights. Whether you need to adapt to shifting data patterns or optimize resource usage, Striim equips you with the tools to maintain a competitive edge. Get a demo today to learn how Striim can empower your online machine learning initiatives and drive smarter, faster decisions.

The Future of AI is Real-Time Data

To the data scientists pushing the boundaries of what’s possible, the AI experts and enthusiasts who see beyond the horizon, and the techies building tomorrow’s solutions today — this manifesto is for you. The key to unlocking AI’s full potential lies in real time data. Traditional methods no longer suffice in a world that demands instant insights and immediate action.

Real-Time AI as the New Competitive Battleground

AI and ML are more than just buzzwords; they are driving substantial economic growth, creating new job opportunities, and shaping the future. The AI market is projected to reach a staggering $1,339 billion by 2030. This exponential growth underscores the widespread adoption and integration of AI across various industries. Furthermore, AI is on track to boost the US GDP by 21% by 2030. This highlights the profound economic impact AI will have. By automating routine tasks, optimizing operations, and providing deep insights through data analysis, AI enables businesses to increase productivity while reducing costs. And contrary to common fears that AI will eliminate jobs, it is expected to create 20-50 million positions by 2030. These roles will span various sectors, including data science, AI ethics, machine learning engineering, and AI-related research and development.

Real-Time Data — The Missing Link

What is Real-Time Data?

In the realm of data processing, real-time data refers to information that is delivered and processed almost instantaneously as it is generated. Unlike batch processing, which involves collecting and processing data in bulk at scheduled intervals, real-time data ensures immediate availability and actionability. This immediacy allows for decisions and responses to be made in the moment, offering a dynamic edge over traditional methods.

The Death of Traditional Batch Processing

The shift from batch processing to real-time data marks a crucial technological evolution driven by the need for speed and efficiency. Batch processing resulted in significant delays between data generation and actionable insights. As the demand for faster decision-making grew, the limitations of traditional batch processing became glaringly apparent. Traditional methods introduced latency, making it impossible to act on data immediately, a critical issue in environments requiring timely decisions.

Furthermore, batch processing systems were rigid and inflexible, struggling to scale as data volumes grew and needing substantial reengineering to adapt to new data types or sources. The advent of real-time data processing revolutionized this paradigm, providing the means to analyze and act on data as it flows, thereby minimizing latency to sub-second and offering unparalleled scalability and adaptability to modern data streams. This transformation is responsible for enabling real-time decision-making and fostering innovation across industries, cementing real-time data as the cornerstone of AI algorithms and advancements.

Dispelling Misconceptions and Demonstrating Value

In the world of AI and ML, there are a few common objections to the adoption of real-time data processing. Let’s dive into these misconceptions and demonstrate the true value of real-time capabilities.

Misconception: Batch Processing Suffices

Objection: Many AI/ML tasks can be handled with batch processing. Models trained on historical data can make predictions without needing real-time updates. The necessity of real-time data is highly specific to certain use cases, and not all industries or applications benefit equally.

Reality Check: While batch processing works for some tasks, it falls short in dynamic environments requiring high responsiveness and timely decision-making. Real-time data integration allows models to process the most recent data points, reducing lag between data generation and actionable insights. This is crucial in fields like finance, where market conditions shift rapidly, or e-commerce, where user behavior and inventory status constantly change. For example, fraud detection models relying on batch data might miss real-time anomalies, whereas real-time data can detect and respond to fraud within milliseconds. In healthcare, real-time patient monitoring can provide immediate insights for timely interventions, improving patient outcomes. The notion that real-time data is only useful in specific cases is outdated as countless industries increasingly leverage real-time capabilities to stay competitive and responsive.

Misconception: Complexity and Cost

Objection: Implementing real-time data systems is complex and costly. The infrastructure required for real-time data ingestion, processing, and analysis can be significantly more expensive than batch processing systems.

Reality Check: While real-time systems require an investment, the ROI is substantial. Modern cloud-based architectures and scalable platforms like Striim and Apache Kafka have reduced the complexity and cost of real-time data processing. Real-time systems drive higher revenues and better customer experiences by enabling immediate responses to emerging trends and anomalies. For instance, real-time inventory management in retail can prevent stockouts and overstock, directly impacting sales and customer satisfaction. The initial investment in real-time capabilities is outweighed by the long-term gains in efficiency, responsiveness, and competitive advantage.

Misconception: Data Quality and Stability

Objection: Real-time data can be noisy and unstable, leading to potential inaccuracies in model predictions. Batch processing allows for more thorough data cleaning and preprocessing.

Reality Check: Real-time data does not mean compromising on quality. Advanced real-time analytics platforms incorporate robust data cleaning and anomaly detection, ensuring models receive high-quality, stable inputs. Tools like Apache Beam and Spark Streaming provide mechanisms for real-time data validation and cleansing. Real-time data pipelines can also integrate seamlessly with existing ETL processes to maintain data integrity. By leveraging these technologies, organizations can ensure that their real-time data is as reliable and accurate as batch-processed data, while gaining the added advantage of immediacy.

Misconception: Model Retraining Frequency

Objection: Many models do not need to be retrained frequently. The insights gained from real-time data might not justify the cost and effort of constant retraining.

Reality Check: The pace of change in today’s world demands models that can adapt quickly. Real-time data enables continuous learning and incremental updates, ensuring models remain relevant and accurate. Techniques like online learning and incremental model updates allow models to evolve without the need for complete retraining. For example, recommendation systems can benefit from real-time user behavior data, continuously refining their suggestions to enhance user engagement. By integrating real-time data, organizations can maintain high model performance and accuracy, adapting swiftly to new patterns and trends.

Industry Disruption through Real-Time AI

Real-time AI is redefining how businesses operate by providing up-to-the-second information that enhances predictive accuracy, supports continuous learning, and automates complex decision-making processes. This integration allows AI to adapt instantly to new data, which is essential for applications where split-second decision-making is critical, including fraud detection, autonomous vehicles, and financial trading. It also powers real-time anomaly detection in cybersecurity and manufacturing, identifying threats and malfunctions as they occur. Additionally, real-time data empowers personalized customer experiences by analyzing interactions on the fly, delivering tailored recommendations and services. The scalability and adaptability of real-time data platforms ensure AI systems are always equipped with the most current information, driving innovation and efficiency across industries.

Real-Time AI & ML in the Real World

Predictive Maintenance in Manufacturing

ML algorithms, often powered by sensors and IoT devices, continuously monitor equipment health. Anticipating failures, predictive maintenance minimizes downtime and optimizes productivity by analyzing historical data and real-time sensor readings, enabling proactive scheduling and preventing disruptions in production.

Customer Churn Prediction in Telecom

ML models may consider factors such as customer demographics, usage patterns, customer service interactions, and billing history. By identifying customers at risk of churn, telecom companies can implement targeted retention strategies, such as personalized offers or improved customer support.

Fraud Detection in Finance

ML algorithms learn from historical data to identify patterns associated with fraudulent transactions. Real-time monitoring allows financial institutions to detect anomalies and trigger immediate alerts or interventions. This proactive approach helps prevent financial losses due to fraudulent activities.

Personalized Marketing in E-commerce

ML algorithms analyze not only purchase history but also browsing behavior and preferences. This enables e-commerce platforms to deliver personalized product recommendations through targeted advertisements, email campaigns, and website interfaces, enhancing the overall shopping experience.

Healthcare Diagnostics and Predictions

ML models, particularly in medical imaging, can assist healthcare providers by identifying subtle patterns indicative of diseases. Predictive analytics also help healthcare providers anticipate patient health deterioration, enabling early interventions and personalized treatment plans.

Dynamic Pricing in Retail

ML algorithms consider a multitude of factors, including competitor pricing, inventory levels, historical sales data, and customer behavior. By dynamically adjusting prices in real time, retailers can optimize revenue, respond to market changes, and maximize profitability.

Supply Chain Optimization

ML-driven demand forecasting considers historical data, seasonality, and external factors like economic trends and geopolitical events. This enables accurate inventory management, reduces excess stock, and ensures timely deliveries, ultimately improving the overall efficiency of the supply chain.

Human Resources and Talent Management

ML tools assist in resume screening by identifying relevant skills and qualifications. Predictive analytics can assess employee satisfaction, helping organizations identify areas for improvement and implement strategies to enhance employee retention and engagement.

UPS Success Story: Where Real-Time Data Supercharged Real-Time AI


Safeguarding shipments with AI and real-time data

UPS Capital® is leveraging Google’s Data Cloud and AI technologies to safeguard packages from porch piracy. With more than 300 million American consumers turning to online shopping, UPS Capital has witnessed the significant challenges customers face in securing their package delivery ecosystem. Now, the company is leveraging its digital capabilities and access to data to help customers rethink traditional approaches to combat shipping loss and deliver better customer experiences.

https://youtu.be/shreurvc28U?si=2rVZTIO0YWnMR2W-

DeliveryDefense™ Address Confidence utilizes real-time data and machine learning algorithms to safeguard packages. By assigning a confidence score to potential delivery locations, it enhances the assessment of successful delivery probabilities while mitigating loss or theft risks. Every address is allocated a confidence score on a scale from 100 to 1000, with 1000 indicating the highest probability of delivery success. These scores are based on customer reports of package theft. Shippers can integrate this score into their shipping workflow through an API to take proactive, preventative actions on low-confidence addresses. For instance, if a package is destined for an address with a low confidence score, the merchant can proactively reroute the shipment to a secure UPS Access Point location. These locations typically have a confidence score of around 950 due to their high chain of custody security precautions.

Striim’s real-time data integration platform works in tandem with Google Cloud’s modern architecture by dynamically embedding vectors into streaming information, enhancing data representation, processing efficiency, and analytical accuracy. Striim also integrates structured and unstructured data pulled from diverse sources and applies a variety of AI models from OpenAI and Vertex AI to generate embeddings that establish similarity scores between data points to reveal possible relationships.

UPS Capital brings significant operational rewards, evidenced by over 280,000 claims paid annually. With $236 billion in declared value and 690k shippers protected, its solutions offer robust protection for shippers, ensuring peace of mind and financial security in every shipment.

The Future of AI is Now — And It’s Real-Time

Real-time data and AI are significantly improving existing processes and impacting the bottom line across industries. From retail and finance to healthcare and beyond, the integration of real-time data is driving greater efficiency, more personalized customer experiences, and continuous innovation. This shift is creating new opportunities and setting higher standards.

Businesses are encouraged to embrace real-time data and AI to stay competitive in the future. By adopting these technologies, companies can fully leverage AI, stay ahead of the competition, and navigate the evolving technological landscape. The future of AI is real-time, and the time to act is now.

Sovereign AI, Redpanda vs Apache Kafka, The Future of Data Streaming with Alex Gallego (CEO of Redpanda)

Prepare to transform your understanding of data and cloud architecture with visionary CEO Alex Gallego of Redpanda. Discover how Alex’s journey from building racing motorcycles and tattoo machines as a child led him to revolutionize stream processing and cloud infrastructure. This episode promises invaluable insights into the shift from batch to real-time data processing, and the practical applications across multiple industries that make this transition not just beneficial but necessary.

Explore the intricate challenges and groundbreaking innovations in data storage and streaming. From Kafka’s distributed logs to the pioneering Redpanda, Alex shares the operational advantages of streaming over traditional batch processing. Learn about the core concepts of stream processing through real-world examples, such as fraud detection and real-time reward systems, and see how Redpanda is simplifying these complex distributed systems to make real-time data processing more accessible and efficient for engineers everywhere.

Finally, we delve into emerging trends that are reshaping the landscape of data infrastructure. Examine how lightweight, embedded databases are revolutionizing edge computing environments and the growing emphasis on data sovereignty and “Bring Your Own Cloud” solutions. Get a glimpse into the future of data ownership and AI, where local inferencing and traceability of AI models are becoming paramount. Join us for this compelling conversation that not only highlights the evolution from Kafka to Redpanda but paints a visionary picture of the future of real-time systems and data architecture.

What’s New In Data is a data thought leadership series hosted by John Kutay who leads data and products at Striim. What’s New In Data hosts industry practitioners to discuss latest trends, common patterns for real world data patterns, and analytics success stories.

Back to top