The basics of analyzing business transactions


mining-magnifier-300x213

Analyzing business transactions

While most companies manage their services on the physical component level some are beginning to manage them at the transaction and business process level. This because they want to understand their customer journey or need to have a complete audit trail in order to meet compliance requirements. In this post i will describe the basics of analyzing business transactions. 

What is a transaction

otov_1

Tracing the transaction

First we have to understand what a transaction is and how we can collect the required data for monitoring them.

As an example a client from the web requests the execution of a business transaction thereby using several services (authorization, billing and resource). In order to follow what happened a tracing system is installed that traces every call between all components. The principles of transaction tracing can be found at opentracing.io.

Popular distributed tracing systems:

web-screenshot

Visualizing the call trace

These distributed tracing systems collect call traces from all your services from the frontend (website) trough all services right to the backend. The details of these kind of transactions can be visualized in a call trace.

Before diving into the detail of each and every transaction you should first learn how data is collected, processed and verify the quality of your monitoring data.

For those interested a typical trace in Zipkin will look like this:

  [ { "traceId": "bd7a977555f6b982",
      "name": "get",
      "id": "bd7a977555f6b982",
      "timestamp": 1458702548467000,
      "duration": 386000,
      "annotations": [ { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548467000, "value": "sr" }, { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548853000, "value": "ss" } ], "binaryAnnotations": [] }, { "traceId": "bd7a977555f6b982", "name": "get-traces", "id": "ebf33e1a81dc6f71", "parentId": "bd7a977555f6b982", "timestamp": 1458702548478000, "duration": 354374, "annotations": [], "binaryAnnotations": [ { "key": "lc", "value": "JDBCSpanStore", "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 } }, { "key": "request", "value": "QueryRequest{serviceName=zipkin-query, spanName=null, annotations=[], binaryAnnotations={}, minDuration=null, maxDuration=null, endTs=1458702548478, lookback=86400000, limit=1}", "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 } } ] }, { "traceId": "bd7a977555f6b982", "name": "query", "id": "be2d01e33cc78d97", "parentId": "ebf33e1a81dc6f71", "timestamp": 1458702548786000, "duration": 13000, "annotations": [ { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548786000, "value": "cs" }, { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548799000, "value": "cr" } ], "binaryAnnotations": [ { "key": "jdbc.query", "value": "select distinct `zipkin_spans`.`trace_id` from `zipkin_spans` join `zipkin_annotations` on (`zipkin_spans`.`trace_id` = `zipkin_annotations`.`trace_id` and `zipkin_spans`.`id` = `zipkin_annotations`.`span_id`) where (`zipkin_annotations`.`endpoint_service_name` = ? and `zipkin_spans`.`start_ts` between ? and ?) order by `zipkin_spans`.`start_ts` desc limit ?", "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 } }, { "key": "sa", "value": true, "endpoint": { "serviceName": "spanstore-jdbc", "ipv4": "127.0.0.1", "port": 3306 } } ] }, { "traceId": "bd7a977555f6b982", "name": "query", "id": "13038c5fee5a2f2e", "parentId": "ebf33e1a81dc6f71", "timestamp": 1458702548817000, "duration": 1000, "annotations": [ { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548817000, "value": "cs" }, { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548818000, "value": "cr" } ], "binaryAnnotations": [ { "key": "jdbc.query", "value": "select `zipkin_spans`.`trace_id`, `zipkin_spans`.`id`, `zipkin_spans`.`name`, `zipkin_spans`.`parent_id`, `zipkin_spans`.`debug`, `zipkin_spans`.`start_ts`, `zipkin_spans`.`duration` from `zipkin_spans` where `zipkin_spans`.`trace_id` in (?)", "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 } }, { "key": "sa", "value": true, "endpoint": { "serviceName": "spanstore-jdbc", "ipv4": "127.0.0.1", "port": 3306 } } ] }, { "traceId": "bd7a977555f6b982", "name": "query", "id": "37ee55f3d3a94336", "parentId": "ebf33e1a81dc6f71", "timestamp": 1458702548827000, "duration": 2000, "annotations": [ { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548827000, "value": "cs" }, { "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 }, "timestamp": 1458702548829000, "value": "cr" } ], "binaryAnnotations": [ { "key": "jdbc.query", "value": "select `zipkin_annotations`.`trace_id`, `zipkin_annotations`.`span_id`, `zipkin_annotations`.`a_key`, `zipkin_annotations`.`a_value`, `zipkin_annotations`.`a_type`, `zipkin_annotations`.`a_timestamp`, `zipkin_annotations`.`endpoint_ipv4`, `zipkin_annotations`.`endpoint_port`, `zipkin_annotations`.`endpoint_service_name` from `zipkin_annotations` where `zipkin_annotations`.`trace_id` in (?) order by `zipkin_annotations`.`a_timestamp` asc, `zipkin_annotations`.`a_key` asc", "endpoint": { "serviceName": "zipkin-query", "ipv4": "192.168.1.2", "port": 9411 } }, { "key": "sa", "value": true, "endpoint": { "serviceName": "spanstore-jdbc", "ipv4": "127.0.0.1", "port": 3306 } } ] } ]

How to analyze transactions

To analyse millions of these traces you need some tooling. I use a PostgreSQL database to store the raw traces and intermediate results, Pentaho Spoon to process the data  and Neo4j for analysis and visualisation.

Spoon_prepare

Data preparation with opensource tools

In order to visualize transaction traces a sample of data stored in the database was processed and imported into the Neo4j graph database.

Neo4j_Transaction

Typical transaction

This setup will allow you to do offline analysis. For real-time processing i use IBM Streams.

A typical transaction should look like this:

  • A frontend activity (red dot) that starts the transaction. There should be exactly one frontend activity for each transaction.
  • Optional activity in one or more intermediate services (green dots)
  • Activity in services communicating with the Backend (blue dot) such as datastores

Visualizations like these allows us to explore how transactions (should) look like. We found out there is much more variety than we expected. This kind of analysis can be automated with Neo4j CYPHER scripts.

Data corruption

Neo4j_Orphans

A typical case of missing and orphaned events

We found several situations (a very significant percentage) where monitoring events where lost somewhere in the monitoring stack resulting in orphaned service call events. In this example the purple event misses its relation to service call by its predecessor. Also the right branch is completely gone.

This is unacceptable because it means your in the dark on what happened (or not) with your business transactions. In effect you are no longer “In control” as required by the regulators.

In upcoming posts i will dive deeper into analysis:

  • Complex transactions
  • Error handling
  • Detecting regulatory compliance issues
  • Relations between services, operation, infra structure component and the teams that manage them
  • Determine root cause of failures
  • Determine impact of failures

Lessons learned:

The toolset chosen here (Neo4j, Postgress and Pentaho Spoon) proved to be very useful for exploring the structure of transactions and deepened my knowledge on transaction tracing.

Transaction analysis can be used to detect situations that might affect regulatory compliance

We should focus on designing a reliable monitoring framework
where data quality is checked and
every event is accounted for by using event counter metrics.

To make you even more interested a cool visualisation of business transactions:

For the theory behind all this take a look at process mining.

Process mining techniques allow for extracting information from event logs. For example, the audit trails of a workflow management system or the transaction logs of an enterprise resource planning system can be used to discover models describing processes, organizations, and products. Moreover, it is possible to use process mining to monitor deviations (e.g., comparing the observed events with predefined models or business rules in the context of SOX).

Advertisements

One thought on “The basics of analyzing business transactions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s