Recently Ted Unangst wrote about his tool, watc, to extract line count and file size statistics to support some of his work. Chris Wellons followed up with his take on watc. Inspired by both posts, I thought it would be an interesting tool to add to my own toolbox. It pairs nicely with some of my current work on extracting useful information from code repositories. This feels like a good way to put together a quick tool using F#. I’ll also use this as an opportunity to show some F# along the way.
Like Chris, I tend to favor non-interactive apps for this time of tooling. I have my own personal additions, but I follow his design a bit closer. At a high level, the app is a relatively simple matter of iterating a directory structure and aggregating line count and file sizes. Since the goal is analyzing source code, it will filter binaries, .git, build artifacts, etc; allowing me to stay focused on what I immediately care about. Command line parameters allow me to dictate summary level, sorting, and report formatting. You can find the full code here, but I’m just going to focus on a couple small aspects. Before I get to the point, below is a small example of what the results look like.
1 | $ ./watc --depth=2 --sort=lines ~/projects/fsharp/src |
With some of the demonstration out of the way, time to get to the point. Improving application performance is a complicated and nuanced topic; obvious statement I know. Seeing the hoops some languages need to jump through to support parallelism is a good reminder is it doesn’t always have to be difficult. This leads me to F#. Today’s post is a pretty shallow view, looking for a quick win, but sometimes that’s all you need. For relatively simple tasks, parallelism can be simple to acheive with F#. A conversion of Array.map
to Array.Parallel.map
gives quick access to parallelism out of the box. To illustrate this, I’ll pull the related section out of the code.
Before, single-threaded:
1 | let processDir maxDepth showFiles dir = |
After, multi-threaded:
1 | let processDir maxDepth showFiles dir = |
Above you’ll see four line changes, resulting in a faster application. At this point, it is worth noting this is a cool trick, with caveats. When it fits the needs, it is a simple way to get a performance improvement. But, not all situations are the same. Sometimes design dictates a need for more control over the implementation. It is also something you need to test to ensure you’re getting the proper benefits, and making the correct tradeoffs. There are many, particularly large scale apps, where this won’t necessarily work and you’d have to use other techniques. But I do enjoy how for many cases, this is a quick win.
I mentioned testing earlier. This is such a small project, I didn’t break out more advanced benchmarks. I just ran some quick sanity checks to see how the changes impacted runtime. I performed tests using two different directories, the F# and Rust language github repos. I ran it multiple times, clearing system caches between tests. In a very unscientific fashion, below are representative results of running time
using a serial versus parallel version of watc. It shows the app running faster in elapsed time (real time), which is what I’m aiming for.
1 | # time ./watc ~/projects/fsharp |
That’s all I have for today. Array.Parallel
has given me a nice performance boost when I’m doing repo recon, and I’ll take it. Beyond that, I just wanted to give a quick view into watc, F#-style. Until next time.