My recent post about Dynamic Time Warping used an external library. It inspired me to implement the algorithm in F#. This is mostly just to see it in F#. My last implementation was in Racket, and I’m interested in the different language implementations. I use a pretty basic Algorithm, nothing fancy. As part of this process I’ll be doing comparisons between NDtw and my code. To be upfront, its not a perfect comparison. NDtw has additional options and tracking that will reduce it’s max performance capabilities. But for hacking around, the implementations will be close enough for alittle fun. For anyone interested, unless otherwise specified, all of my results will be from the REPL in VS Code + Ionide using Mono version 4.6.2.
Using Paket, here is a sample paket.dependencies file.
1 | source https://nuget.org/api/v2 |
Here is the basic setup stuff.
1 | #time |
First, I need to get some data. I’m in the mood for some stock data. That should give me simple signals with lots of data points. I also decide to pull it into local files.
1 | curl http://ichart.finance.yahoo.com/table.csv?s=IBM -o stock_ibm.csv |
Here is the code to load the csv data into arrays. I only pull 5,000 records, and the signal of interest will the “Open” value. This will be enough data to run long enough, but not too long.
1 | type Stock = CsvProvider<"../data/stocks/stock_ibm.csv"> |
Before I get started, I want to get a baseline for the NDtw implementation. I ran this a couple times, and the below results are representative.
1 | let dtw = new Dtw(ibmData, fordData) |
Now it is time to implement the algorithm. This is mostly a copy paste directly off of the wikipedia page. I just need to do minor F# translation.
1 | // Distance calculation between 2 points |
Here is my first pass. Again, I ran this multiple times, and this is a representative result. And ouch. I know functional languages have a reputation for slower performance, but I need to be able to do better. Not ony is it slower, but the GC #s are 10x worse than my baseline.
1 | let cost = myDtwArray2D_A ibmData fordData |
Take 2: When I initially wrote the function I decided to leverage built-ins as much as possible. As a result I used List.min
when determining which step in the path to take next. List construction seems like a possible expensive process. So I’ll take that out and see how that does. The below code is identical to above except for the path.[i,j] <- cost + (min path.[i-1,j] path.[i,j-1] path.[i-1,j-1])
call.
1 | // Min of three values |
Well, this is alot better. The performance is even close to NDtw. This is decent for a small amount effort. Taking a closer look, the GC numbers are troubling. My implementation numbers are high in comparison. Maybe there is something I can do about that.
1 | let cost = myDtwArray2D_B ibmData fordData |
What can I do? Well, F# is a functional language with optimized tail calls. For those not familar, the TLDR version is to write a recursive call in such a way that the last operation in the function is the return value. When written in such a way it allows the compiler to effectively unwrap the recursion as a loop with no additional stack frames being allocated. This is a brief, and unperfect, explanation, so it’s worth investigating further. It is a really powerful construct. In this particular case my hope is reduced stack allocations means less object creation, so less to garbage collect. That means it is time to rework the function to be recursive, and more to the point, leverage tail call optimization.
1 | let myDtwArray2DRecursive (d1:float[]) (d2:float[]) = |
This is exciting. The code is flying now (over 5x faster). Also note the GC numbers, it appears this last modification worked like a charm. The code is even running significantly faster than the NDtw code. To be fair, that isn’t an apples to apples comparison, but it is a very encouraging result. The important take away from this is a minor replace of loops for optimized tail calls can give a pretty satisifying performance boost.
1 | let cost = myDtwArray2DRecursion ibmData fordData |
I like to cut numbers a couple different ways. Now I build a mini framework to run each algorithm and report its Stopwatch time. I store these in a list and then report on average performance. Just to make sure my datasets and calls don’t accidently get memoized or cached, I modify the datasets on each iteration run. It probably isn’t necessary, but its an easy way to protected against black magics trying to help me when I don’t want the performance help.
1 | // Test algorithm, time with stopwatch, add to performance results |
Here are the average/min/max runtimes for each function. I like to keep an eye on differences in run environments, so I run a couple different iterations using the REPL, fsharpi (Mono), and fsi (.NET CLR). The results are below.
1 | // REPL |
1 | // fsharpi dtwcompare.fsx |
1 | // fsi dtwcompare.fsx (fsi version 4.40.23020.0) |
The comparative performance is similar, although NDtw and version B flip flop positions. Those two seem to run about the same, so that probably has more to do with what’s going on my machine at the time. I did expect faster performance with fsi than fsharpi, so that is a bit of surprise. It is just a reminder that assumptions should always be tested. Investigating that further may be worthy of a blog post itself. This has been an interesting examination into implementing a DTW algorithm. It turned into more of an optimization exercise than I expected, which was a pleasant turn of events. I hope this has been useful, and inspired more F# algorithm implementations, more dynamic time warping, and more tail calls!