WiseAnalytics | Architecting for Real-time automated decisions

Insights

Architecting for Real-time automated decisions

15 min read

Julien Kervizic

In an era where businesses must adapt to fast-paced environments, real-time automated decision-making has become a cornerstone of operational efficiency and competitive advantage. This capability involves a seamless blend of decision logic, cutting-edge infrastructure, robust data strategies, and precise application handling. Organizations striving for agility and responsiveness must master the art of designing systems that enable instant, data-driven decisions while ensuring consistency, accuracy, and scalability.

This article explores the critical elements required to architect systems for real-time automated decisions. From documenting and codifying decision logic to building resilient infrastructures and optimizing data for real-time use, each section highlights practical approaches and frameworks. Key areas such as real-time infrastructure, data modeling, application handling, and performance measurement are covered to provide a comprehensive roadmap for implementing these systems.

The article also delves into the challenges inherent in real-time decisioning, such as data consistency, query loads, and latency, while offering strategies to overcome them. By addressing these complexities, businesses can unlock the potential of automated decision-making and ensure that their systems remain adaptable, reliable, and high-performing.

New Zealanders uneasy over automated decision-making

‍

Decisioning

Decisioning Canvas & Decision Process

To effectively manage decision-making within an organization, it is essential to thoroughly document the set of decisions that need to be made. This documentation serves as a foundation for a structured approach, ensuring that decisions are consistent, well-informed, and aligned with the organization’s goals. One of the frameworks that can be employed for this purpose is the Decisioning Canvas.

The Decisioning Canvas is a versatile tool that helps organizations map out the decision-making process by considering all relevant inputs and outputs. This framework ensures that every aspect of the decision-making process is carefully considered, from the information needed to make a decision to the actions that result from it.

The canvas is typically composed of several key elements. These include decision tables, which outline the conditions and rules that guide each decision; the specific rules that must be applied during the decision-making process; and any additional processing steps that might be required to support or refine the decision.

Additionally, the canvas incorporates a knowledge base, such as a list of available products or offers, which provides the necessary context and information for making informed decisions. Finally, the canvas also defines the actions to be taken as a result of the decision, such as triggering a different workflow, sending an SMS, or making an offer. By integrating these components, the Decisioning Canvas offers a comprehensive approach to structuring and managing the decision-making process within an organization.

Rule Logic Definition

To ensure effective implementation, the rules that govern decision-making processes should be systematically codified. This codification not only brings consistency and transparency but also facilitates automation and scalability. By clearly defining and documenting decision rules, organizations can ensure that their processes are repeatable and can be easily understood and maintained by both business and technical teams.

Standards for codifying decision rules, such as Decision Model and Notation (DMN), provide a structured and visual approach to this task. DMN allows organizations to represent decision logic in a way that is both accessible and precise. This standard helps bridge the gap between business needs and technical execution, ensuring that decision-making processes are both robust and adaptable.

DMN can be applied to capture the current state of decision-making (“as-is”), providing a clear overview of existing business rules and logic. It can also be used to model the desired state after automation, allowing organizations to visualize and plan how decision processes will evolve. By leveraging DMN, businesses can effectively transition from manual or inconsistent decision-making to streamlined, automated processes.

Infrastructure

Real-time decisioning infrastructure

To enable real-time decision-making, a robust and highly available architecture is required. At the core of this architecture is the ability to process data with minimal latency, making it imperative to choose the right data processing framework. Two of the most common architectural patterns that support real-time processing are the Kappa and Lambda architectures. These frameworks are designed to handle the demands of processing continuous streams of data, ensuring that decisions can be made in real time.

The architecture must also include components that facilitate efficient message transfer. Depending on the specific requirements, this might involve a combination of streaming, reactive events, and transactional message processing. These components work together to ensure that data flows seamlessly through the system, allowing for timely and accurate decision-making.

In many cases, a batch layer is incorporated into the architecture to handle data reconciliation or to process data with greater accuracy. This layer complements the real-time processing capabilities, ensuring that the system can also handle larger volumes of data or perform more complex analyses that are not time-sensitive.

At the heart of the architecture is the processing engine, which could be powered by technologies such as Kafka SQL, Apache Spark, Apache Flink, or custom Kubernetes applications. These engines are responsible for executing the real-time data processing tasks that drive decision-making.

Supporting the processing engine are decision engines and a robust data infrastructure, including databases, in-memory grids, and caching layers, which are crucial for storing and retrieving data at high speeds. Additionally, a machine learning (ML) serving layer, often coupled with feature stores, is integrated into the architecture to enable predictive analytics and AI-driven decision-making. This comprehensive setup ensures that the system can support a wide range of real-time decisioning use cases, from simple rule-based decisions to complex machine learning models.

Cost & Latency of information retrieval

When designing a system for data-driven decision-making, careful consideration must be given to information retrieval processes. It is important to avoid costly queries that could strain production systems and introduce additional latencies. To mitigate these issues, mechanisms like caching should be implemented where appropriate. Caching can significantly reduce the load on production systems by storing frequently accessed data, thus minimizing the need for repetitive, resource-intensive queries.

Additionally, the impact of latency on data integration and serving must be carefully considered. Ensuring that data can be retrieved and processed quickly is crucial for maintaining the efficiency and responsiveness of the decision-making system. By addressing these factors, organizations can optimize information retrieval, reduce system strain, and ensure that decisions are made with minimal delay.

Number of systems to be integrated

A tangled and complex integration setup, often referred to as a “spaghetti mess,” can significantly increase the complexity and latency of information retrieval within a system. This disorganized approach can make it both costly and inefficient to retrieve specific pieces of information, while also raising the likelihood of failures during information retrieval.

To address these challenges, implementing a decisioning data layer that abstracts the underlying information and integrations is essential. This layer serves as an intermediary, streamlining access to data and simplifying the retrieval process. By providing a clear, organized structure for accessing information, a decisioning data layer not only reduces the risk of failures but also enables more efficient and cost-effective information retrieval, ultimately supporting smoother and faster decision-making processes.

Query Loads

Constant flows of information can place significant stress on production systems, particularly during peak loads. Inefficient queries, when combined with heavy traffic, can exacerbate this issue by passing the increased load onto transactional systems that store and manage the data. This can lead to these critical systems becoming overwhelmed, potentially causing them to crash or become ineffective.

To mitigate these risks, several mechanisms can be employed. Throttling, for instance, can help control the flow of information and prevent systems from being overloaded. However, a more robust and effective approach is to separate transactional systems from operational decision-making concerns. This can be achieved by enhancing the decisioning data layer with infrastructure components designed to handle the demands of processing and retrieval workloads.

Introducing in-memory processing and caching within the decisioning data layer can significantly improve system resilience and performance. These enhancements ensure that data retrieval and decision-making processes are handled efficiently, without placing undue stress on transactional systems. By offloading these tasks from the core systems, organizations can maintain stability and effectiveness even under peak load conditions.

Consistency challenges

In many scenarios, it is impractical to have all necessary data available in a synchronous, real-time manner. Data may arrive with delays or in batches, creating challenges for maintaining consistency across systems. These non-synchronous data integrations can lead to discrepancies between the data used in decision-making and the most current information available, potentially impacting the accuracy and effectiveness of decisions.

To address these consistency challenges, adjustments to the decisioning logic may be necessary. For instance, decision models might need to incorporate mechanisms for handling slightly outdated or inconsistent data, such as using fallback rules, applying data validation checks, or incorporating time-based tolerances. These adjustments ensure that decisions remain reliable, even when the data is not fully up-to-date.

By proactively adapting the decisioning logic, organizations can minimize the impact of non-synchronous data on their processes, ensuring that decisions are as accurate and effective as possible under the circumstances.

Data

Data Decisioning Requirements Analysis

In the context of data-driven decision-making, it is crucial to conduct a thorough Data Decisioning Requirements Analysis. This process ensures that the data used in decision-making is appropriately refined and meets the necessary standards. By setting clear expectations for the data inputs, organizations can better support the decision-making process and achieve more accurate outcomes.

A key aspect of this analysis involves defining the specific data points required to drive the decision-making process. This means identifying the attributes or data elements that are essential for making informed decisions. Additionally, it’s important to establish the level of data quality and freshness needed to ensure that the decisions are both timely and reliable.

By having an initial understanding of what data points are necessary, and the standards they must meet, organizations can better structure their data pipelines and ensure that the decision-making process is supported by high-quality, relevant data. This approach lays the foundation for effective, data-driven decisioning and helps to minimize the risks associated with poor data quality or outdated information.

Data Quality Analysis

In order to ensure the effectiveness of data-driven decision-making, it is essential to conduct a thorough Data Quality Analysis. This process involves assessing the data against the requirements that were defined during the Data Decisioning Requirements Analysis. By doing so, organizations can verify that the data meets the necessary standards and is suitable for driving accurate and reliable decisions.

During the Data Quality Analysis, the data should be mapped against the desired inputs outlined in the decisioning framework. This mapping allows organizations to identify any discrepancies or gaps between the available data and what is required for decision-making. Once these gaps are identified, decisions need to be made on how to handle values that do not fit within the expected parameters. This might involve cleansing the data, supplementing it with additional sources, or adjusting the decision-making process to accommodate the available datasets.

By systematically evaluating data quality in this way, organizations can ensure that their decision-making processes are based on accurate, relevant, and high-quality data, ultimately leading to better outcomes and more informed decisions.

Data “Readiness” Assessment

Before incorporating data into an automated decisioning process, it is essential to thoroughly assess its readiness. This evaluation ensures that the data is reliable and suitable for driving accurate and effective decisions. Several key factors should be considered during this assessment:

Data Quality: The integrity and accuracy of the data are paramount. Data that contains errors, inconsistencies, or gaps can lead to flawed decisions. Assessing data quality involves checking for completeness, accuracy, consistency, and relevance. High-quality data is a critical foundation for any decisioning process.
Presence of Controls: Adequate controls must be in place to ensure that data remains reliable over time. This includes mechanisms for validating and cleansing data, monitoring for anomalies, and ensuring that data is sourced and processed according to established standards. Robust controls help maintain data integrity and trustworthiness.
Frequency of Changes: Understanding how often the data changes is crucial for determining its suitability in decision-making. Data that changes frequently may require more sophisticated handling, such as real-time integration or frequent updates, to ensure that decisions are based on the most current information.
Latency of Integration: The speed at which data can be integrated into the decisioning process is another critical factor. High latency can delay decision-making, reducing its effectiveness, particularly in real-time or near-real-time scenarios. Assessing whether the latency of data integration aligns with the needs of the decisioning process is essential for maintaining responsiveness.
Capabilities of Source Systems: The scalability, reliability, and performance of the systems providing the data must be considered. Source systems that are unable to scale or that suffer from frequent downtimes can introduce bottlenecks or disruptions in the decisioning process. Understanding these capabilities helps in designing a robust and resilient data pipeline.

By thoroughly assessing these factors, organizations can determine whether the data is ready to be used in automated decision-making. This careful evaluation helps ensure that the decisioning process is built on a foundation of reliable, high-quality data, leading to more accurate and effective outcomes.

Data Modelling

Message formats

When designing systems that rely on messaging, it’s crucial to understand the various message formats and patterns for data transmission, as these choices significantly influence how data is processed and managed. Common patterns for defining events include event notification, event-carried state transfer, and event sourcing. Each pattern comes with distinct implications for message processing and state management.

Event Notification: In this pattern, a simple notification is sent to indicate that something has happened, without carrying detailed data about the event. The recipient must then fetch the necessary data from the source. This pattern is lightweight but requires the recipient to handle additional steps to retrieve and manage state.
Event-Carried State Transfer: Here, the event message itself carries the relevant state information. This pattern simplifies the recipient’s task, as it doesn’t need to query the source system for data, but it also increases the size of the messages and may involve more complex state management if the state is large or frequently changes.
Event Sourcing: This pattern records every change in state as a series of events, allowing the system to reconstruct the state at any point by replaying these events. It provides a complete audit trail and can support complex state recovery, but it requires a robust infrastructure for event storage and replay.

Choosing the right message format and event pattern has significant implications for system design. It impacts how messages are processed, the complexity of state management, and the overall system performance. The decision depends on various factors, such as the capabilities of the systems involved, the volume and velocity of events, and the limitations of source applications. Properly aligning the messaging strategy with these constraints ensures that the system is efficient, reliable, and scalable.

Database structures & Architecture

When designing database structures and tables for decision-making processes, optimizing for efficient information retrieval is essential. This involves structuring the data to ensure quick access, often requiring calculated fields to be pre-computed or cached to meet the required latency for retrieval requests. Ensuring that data is readily available for decisioning processes is key to maintaining performance, especially when decisions need to be made in real-time.

Different types of data stores play a crucial role in decisioning, including Event Stores, Feature Stores, and Customer Profile Stores. Each of these stores serves a specific purpose and has unique requirements for data storage and retrieval. For instance:

Event Stores capture every change in the system, making it possible to replay events or reconstruct the state at any point in time.
Feature Stores manage features used in machine learning models, where the data needs to be quickly accessible and possibly aggregated from various sources.
Customer Profile Stores maintain detailed customer data, often requiring high availability and quick read/write capabilities to support personalized decision-making.

Depending on the nature of the data and the specific use case, these data stores might be implemented in transactional relational database management systems (RDBMS) or optimized for NoSQL databases. The choice between these database types has significant implications for system performance, availability, and consistency.

RDBMS is often preferred when strong consistency and complex transactions are needed, but it may not scale as well for high-velocity, unstructured data.
NoSQL databases are better suited for handling large volumes of data with flexible schema requirements, offering high scalability and performance. However, they might sacrifice some consistency in favor of availability and partition tolerance.

Choosing the right database architecture involves balancing these trade-offs based on the specific requirements of the decisioning processes, such as the need for real-time data, the complexity of transactions, and the desired levels of consistency and availability. This careful selection ensures that the decisioning system is both efficient and reliable, capable of supporting the organization’s strategic goals.

Applications handling

When designing producing applications that push information downstream, it is crucial to consider the various trade-offs involved to ensure the system aligns with the desired outcomes of the entire ecosystem. These applications must be architected with a holistic view, taking into account the capabilities and limitations of the downstream systems and the messaging infrastructure.

For example, in scenarios where there are specific capacity constraints in the messaging system or downstream consumers, producing applications may need to implement back pressure mechanisms. These mechanisms help manage the flow of data, preventing downstream systems from being overwhelmed by controlling the rate at which data is sent. In some cases, if the system cannot handle all messages, voluntary message loss might be considered, depending on the criticality of the data and the desired outcome. This approach can help maintain overall system stability, even if some non-essential data is sacrificed.

Beyond capacity and performance, other critical dimensions must also be considered. For instance, in situations where data consistency is a priority, implementing patterns like the Outbox Pattern can be beneficial. This pattern ensures that messages are only sent when the data is reliably stored, reducing the risk of inconsistencies between the application’s state and the downstream systems.

Similarly, when considering delivery guarantees, the application might need to implement mechanisms like message acknowledgment or a reconciliation system. These approaches ensure that messages are delivered and processed correctly, even in the face of failures or interruptions, by confirming receipt and addressing discrepancies between what was sent and what was received.

By carefully evaluating these and other dimensions — such as latency, fault tolerance, and data retention — producing applications can be designed to support the overall ecosystem’s goals effectively. This thoughtful architecture ensures that the system remains robust, responsive, and aligned with the broader objectives, even as it scales or adapts to new challenges.

Measurements

When implementing an automated decisioning process, it is crucial to continuously monitor and measure both the outputs and outcomes of the decisions made. This ongoing evaluation is vital for ensuring that the decisioning process remains accurate, effective, and aligned with the intended goals.

Monitoring for Data Drift and Degradation: Over time, the quality and characteristics of the data feeding into the decisioning process may change, leading to potential drift or degradation. Such changes can result in inaccurate or suboptimal decisions, negatively impacting outcomes. Continuous monitoring helps detect these shifts early, allowing for timely interventions such as data correction, transformation, or adjustments to the decision logic to accommodate the new data reality.
Identifying and Addressing Bugs: Automated systems are not immune to errors, and bugs can occur at any point in the decisioning value chain, from data ingestion to the application of decision logic. These bugs can have significant consequences, including financial losses or detrimental impacts on customer experience. Implementing robust monitoring and error detection mechanisms is essential to identify and rectify these issues promptly, minimizing any negative effects.
Adapting to Changes in Behavioral Data: When leveraging behavioral data, it’s important to recognize that behavioral patterns can evolve over time. This evolution can cause models built on historical data to become less predictive or relevant, leading to regression in outcomes. Regularly updating models to reflect current behavioral trends is necessary to maintain their effectiveness. This requires a proactive approach, where models are periodically retrained and validated against new data to ensure they remain accurate and aligned with present conditions.
Importance of Strict Measurement: To manage these challenges effectively, strict and continuous measurement is necessary. This includes tracking key performance indicators (KPIs), monitoring for anomalies, and conducting regular audits of the decisioning process. By establishing a comprehensive measurement framework, organizations can ensure that their automated decisioning processes remain robust, responsive, and capable of delivering consistent, positive outcomes.

Continuous measurement and monitoring are essential for maintaining the integrity and effectiveness of automated decisioning processes. By addressing data drift, detecting and correcting bugs, and adapting to changes in behavioral patterns, organizations can ensure that their decision-making systems remain accurate and aligned with their strategic objectives.

Conclusion

Architecting for real-time automated decision-making requires a holistic approach that integrates decision logic, advanced infrastructure, and data-driven strategies. By leveraging tools like the Decisioning Canvas, adopting robust frameworks like DMN, and employing cutting-edge technologies for data processing and storage, organizations can ensure their systems are prepared to meet the demands of modern business environments.

The successful implementation of such systems not only enhances decision speed and accuracy but also drives scalability and innovation. Addressing challenges such as data consistency, query optimization, and system latency is critical to maintaining resilience and efficiency in real-time operations.

As businesses continue to evolve, real-time automated decisions will remain a key enabler of agility and growth. By following the frameworks and best practices outlined in this article, organizations can build systems that not only meet today’s requirements but also anticipate the needs of tomorrow, paving the way for sustained success in an increasingly dynamic world.