IBM Power10 Business Inferencing at Scale with Matrix Math Accelerator (MMA)

Introduction

With IBM Power10, AI can be deployed and integrated without disruption to the enterprise attributes of the platform. Power10 provides a dramatic improvement in inferencing capability over IBM Power9, making it possible to add inferencing capabilities to your enterprise application without requiring additional hardware. As a first step we can deploy AI asynchronously, leveraging an Apache Kafka stream, within the same Power10 platform. This non-invasive way of introducing AI functionality generates a data stream from the database transactions and performs inferencing-based analysis in a separate partition that does not disrupt operations. That is what this example is focused on.

IBM i provides an efficient means of generating a Kafka stream from a set of database transactions, and in this example, we analyze the data stream in a separate Linux partition. We use a deep learning model for time-series (N-Beats) to forecast future prices of transactions.

Besides being an example of AI integration, this demonstration of Business Inferencing at Scale is also an example of application modernization. The example uses numerous open source components and leverages the flexibility of the IBM Power platform to seamlessly run different operating systems side-by-side. It allows introducing new and advanced functionality side-by-side with existing enterprise applications, in a non-disruptive manner.

Offline analysis using AI can be a precursor to a tighter, inline, integration of AI function with the enterprise application. Because Power10 provides the AI functionality within the processor core rather than in a separate system or accelerator, the Power10 hardware allows non-disruptive integration of AI functionality after it is developed and validated. Thus, for example, after a fraud detection model has been developed and validated using the approach in this example, a user may in the future deploy the fraud detection AI as an inline step and prevent a fraudulent transaction from being committed in the first place.

The purpose of this tutorial is to explain with an example how the Power10 Matrix Math Accelerator (MMA) feature can be used to accelerate enterprise AI inferencing with the data on IBM i.

Use case description

In the example Daytrader7 application, the sequence consists of trades, buying, and selling stock. We have modified the Daytrader7 application to replay a sequence of actual trades in order to make it more realistic, and we use an inferencing model to make a prediction about the future stock price in each time interval. Of course, stock prices are one of the most difficult to predict sequences; the intent here is to provide an integration example. We make no claims as to the ability of the example model to predict stock price! Other example uses of similar models could be inventory management (to predict future purchases), customer management (to predict customer churn), and so on.

On IBM i, we use Camel-integrated other modernization applications to stream IBM Db2 transactions to Kafka. For detailed information about how and what we implemented, you can go through the blog: Using Power10’s Superfast AI With Event Streaming

Stock market prediction is an application of time-series forecasting. Machine learning and deep learning have shown impressive results and solved multiple forecasting tasks (with applications on product sales, server utilization, meteorology, and so on), and multiple attempts exist to predict stock market values. In our case, we work with univariate time-series: we train our predictions on one variable, the previous prices (as opposed to multivariate where we could predict the price based on covariates: previous prices, volumes, transactions frequency, and so on). For details about what the approach, details on how the libraries are used, the data collecting and cleaning process, and the model we use for running our predictions, see the “Stock market prediction” section.

Power10 provides four (MMA) engines per core, and the IBM Power E1080 server supports a maximum of 15 cores per socket compared to a maximum of 12 cores per socket for the IBM Power E980 server. For large single-precision floating-point (fp32) based inferencing models similar in complexity to the N-Beats model used in this demonstration, we have demonstrated a 5x throughput advantage on Power10 relative to Power9. Models based on lower-precision data types (bfloat16 and int8) are expected to see even higher speedups.

Refer to Figure 1 for an overview of the architecture.

Figure 1 . Infrastructure of time series prediction with modernized IBM i

You can also have a quick view of the case in the following demo video.

Stock market prediction

This section describes our approach and details the libraries used, the data collection and cleaning process, and the model we used for running our predictions.

Software

The library we used is PyTorchForecasting. This library is built on top of Pytorch Lightning and provides a flexible and high-level API to work with time-series.

It provides a TimeSeriesDataset class that converts a Pandas DataFrame and automates holding, encoding, and normalization of time-series data. It also embeds multiple state-of-the-art models for time-series forecasting.

Their introduction article is an interesting read to get more details on the capabilities of the library.

Data

Data source

We used data from Yahoo Finance which provides a stock market data API. It can be easily queried through the pandas.DataReader module:

data = pandas.DataReader(“AAL”, “yahoo”, start_date, end_date)

We repeat that operation for around 20 stocks, fetch daily data between 1990 and 2020 , and store them in CSV files. The data returned contains the daily opening, closing, highest, and lowest prices and trade volume – out of which we will keep only the closing price as reference value for the day.

For the demo, we will then use these daily variations as second variations to have a faster evolution of the stock price.

Note that we picked stocks that have a relatively small variation on the period, so what we can then concatenate them (to have a long enough time-series) and avoid exploding stock market values or too unusual variations. The overall plot of the data set looks as shown in Figure 2.

Figure 2. The overall plot of the data set

Data pre-processing

We then concatenate the stock values (simply making sure to align the last price of a stock with the first price of the following one).

We finally apply a light smoothing with a rolling window of size 10.

Note that a real application with the aim of achieving the best possible accuracy would probably not merge time-series that way. In our use case, ease of use was a criterion, and the resulting data set looks realistic and therefore suits our use.

Model and training

N-Beats

We picked the N-Beats model or Neural basis expansion analysis for interpretable time series forecasting. Released by Joshua Bengio’s team in 2020, it is one of the state-of-the-art models for univariate time-series prediction and won the M4 competition.

This model is built-in in the PyTorchForecasting library and therefore easy to test and use.

We chose N-Beats because it was both well-suited for the univariate time-series forecasting problem we aimed at solving and it showed very good MMA instructions rate during our tests (see “Parameters tuning” section below). We also evaluated the Temporal Fusion Transformer model on which MMA instructions usage was a bit lower.

Training

We trained that model on the aggregated data set of stock prices described above (with 80%/20% split for training and testing sets). We used the following parameters for the model architecture:

num_blocks = [3, 3]
num_block_layers = [4, 4]
widths = [4096, 4096]

Note that we intentionally picked very large layers to maximize the use of MMA (see “Parameters tuning” section below); because of that, the training sometimes suffers from instability (gradient exploding) and architecture could probably be improved with regularization techniques to adjust that. However, it was sufficient for our use case and we could train a model without gradient issues.

We trained for 20 epochs with a batch size of 128 using an IBM Power AC922 server with an NVIDIA Tesla V100 GPU (with 16 GB memory).

Parameters tuning

We ran tests of the N-Beats model with multiple parameters set and compared the impact in terms of speedup compared to both a Power9 system and a Power10 system without MMA; and found out that the set of parameters described above yields the best results.

Figure 3. N-Beats model test results with Power9 and Power10

Multiple stocks

We also have the option to predict multiple stocks. To do so, we have two options:

Run a single model instance and increase the batch size. This option will bring better throughput results but increase the latency (as inference of a bigger batch is longer and we would need to buffer the inputs until a batch is ready)
Run multiple model instances in parallel, each with a 1 or low batch size. This option will show the best latency results.

Prerequisites

Hardware

It is possible to achieve best performance on Power10 systems, but the demonstration is functional also on earlier generations of Power hardware that support the required OS distributions. Note that RHEL 8.4 requires systems with little endian support for Linux (which started with IBM Power8).

IBM i: IBM i 7.3 or IBM i 7.4
Linux: Red Hat 8.4

Software

On IBM i: To use the techniques described in this tutorial to deploy DayTrader with Db2 for i and stream data to a Kafka broker on IBM i, the following software must be installed:

The following open source packages to be installed using RPM:maven
wget
ca-certificates-mozilla
unzip

A suitable Java runtime, running Java version 8 or later. Most systems already have this capability through 5770-JV1 option 17.
PTF group SF99703 Level 16 or later for IBM i 7.3 or SF99704 Level 4 or later for IBM i 7.4.
If you choose to run the Kafka broker on IBM i, download Kafka directly from the Apache Kafka website.

On Linux: To try the time series prediction AI case, the following software must be installed using RPM:

wget
git
podman or Docker

Estimated time

If you have your IBM i and Linux systems ready (network can access internet, prerequisite software installed), it should take about 1 hour to finish the tasks mentioned in this tutorial.

Steps

Deploy DayTrader, Kafka, and JMeter on IBM i automatically in a simple way.
Refer to the README.md file of this example’s GitHub repository Kafka-based AI example to complete the deployment of DayTrader, Kafka, and JMeter on IBM i.
Deploy OpenCE with the Stock Price Prediction AI case related software and model on Linux.
Refer to the Environment setup section of the README for the AI components to manually setup the environment.
Run the demo on Linux.
Refer to the remaining steps in the [Manual setup] How to run section of the README for the AI components to run the demo.

Summary

Artificial intelligence and machine learning technologies are quickly becoming the new standard. This trend is particularly interesting to IBM i clients, because the data housed and processed on IBM i has mind-numbing potential. It can add plenty of new business insights if processed with an artificial intelligence algorithm. Thankfully, Power10 brings new on-chip optimizations for AI computations. This means much better performance without introducing unnecessary complexity to your infrastructure. Now, AI can be done efficiently and in a non-disruptive manner. Like the concept of real-time predictions? If so, take some time to explore the sample application outlined in this tutorial!

Acknowledgments

We acknowledge Joe McClure from the Websphere Performance team for helping us customize the Daytrader application for this use case; Karthik Swaminathan from the Efficient and Resilient Systems Research for helping us evaluate the model performance on Power10; and Stu Cunliffe from the EMEA Lab Services team for providing us the Linux and IBM i systems during the PoC phase.