Today’s task is to analyze stock prices, specifically price anomalies. Recently ML.NET hit version 1. So what better way than to use F# and ML.NET.
As always, the preliminaries. For the initial setup, make sure you have .NET Core version 2.2 installed. If you don’t, head out to the .NET Core Downloads page. Select SDK for your platform. The specific methods will use the ML.NET TimeSeries package. At the time of this writing, it is v0.12, so it hasn’t hit version 1 yet, but it works well enough. Once that is complete, create a console F# project, then add the necessary ML.NET and Charting packages.
1 | dotnet new console --language F# --name MLNet_StockAnomaly |
In order to not pick on one particular stock, the Dow Jones index over the past year will be the target of interest. Below is a sample of what the data extract looks like. It is the basic stock price data to be expected, including date, prices, and volume. This article will only need Date and Close price. It was exported from Yahoo! Finance.
1 | # Data Rows |
Time to start the code. First, I need to setup the necessary namespaces and types. These cover the ML.NET namespaces as well as XPlot for charting the results. When using ML.NET, the easiest way to interact with the data is by defining two types. PriceData
matches the datafile schema. PricePrediction
is for the model results, in this case I’ll use it for both anomaly detection and change point detection results. The Prediction
field is an array containing a 0 or 1 for a detected event, the value at that datapoint, and its respective confidence level.
1 | open Microsoft.ML |
Once that is done, it is time for the processing pipeline. This includes creating the pipeline context and hooking up the data to the file.
To process the data, there will technically be two pipelines. The first will use the IidSpike
trainer for anomaly detection. The second will use the IidChangePoint
trainer for change point detection. To get the best results, these aren’t really fire and forget approaches. There are a couple dials to adjust. The pvalueHistoryLength
defines the sliding window size that is applied when looking for events. Since this is stock data, multiples of 5 roughly correlate to weeks. So at anomalies over 6 week windows, and change points over 2 week windows. Additionally, confidence
is on a scale 0-100, higher values requiring a higher level of confidence to trigger an event. Another dial to turn is AnomalySide
to detect either only postive, only negative, or all anomalies. The default is all, but it’s nice to have options. All of these values should be adjusted to best meet the needs of the dataset and desired data analysis.
Once the pipelines are created, they need to be trained with the Fit
method. Now there is a model that can be used. Transform
will take the dataset and apply the model to build out predictions for the events.
1 | let dataPath = "dji.csv" |
Now that the data has been processed, it is time to build some charts and look at the results. At this point, it is an exercise of formatting the data for charts, a (Date * float32) list
. There are 3 datasets: prices, anomalies, and change points. Using XPlot, they are combined into a single chart. The only trick here is I remove the non-events from the prediction datasets and plot their points directly on the price line chart.
1 | // Build chart data |
Here is the resulting Dow Jones price chart for the last year, using the defined models. Based on the sliding windows and required confidence levels, there are now potentially useful events.
Charts offer a convenient way to see how some of those earlier parameters can impact the result. I’ve reduced the sliding windows by half to 15 (3 weeks) and 5 (1 week), anomaly and changepoints, respectively. The below chart shows the results of the change. The anomalies haven’t changed too much, but the changepoints are much more reactive to direction changes. One key take away here is there isn’t a single right configuration. It is imperative to understand what types of outliers and attributes are important.
I hope you have found this short look into timeseries processing using ML.NET useful.