Today I look at leveraging F# to perform text analysis using Microsoft’s Cognitive Services.
If you want to follow along, you’ll need to first get a free account from Cognitive Services. Then request the api access you want, for this post it is “Text Analytics”. The apikey they provide can be used in the upcoming code if you want to make your own calls. Microsoft offers a nice amount of free calls, providing plenty of room to play in the environment. Time to get started.
Using Paket, here is a sample paket.dependencies file.
1 | source https://nuget.org/api/v2 |
Below is the library loading code. This is also where I put my Cognitive Services apikey.
1 | #r "../packages/FSharp.Data/lib/net40/FSharp.Data.dll" |
For this experiment I leverage just two of the available Text Analytics capabilties, sentiment analysis and key phrase extraction. In the spirit of organizing my code, I create modules to compartmentalize structure. In a normal project I would structure code slightly differently and put these modules into their own respective files, but for a quick demo I can get away with throwing them in the same file. I’ve also found making some of these concessions can help the setup process be a little easier to follow.
I leverage F#’s types to specify the webapi interface. The sentiment webapi expects a request object that contains a list of documents. It returns a response object that has a list of scores (one per document), and a list of errors (if any). The KeyPhrases api also expects a request with a list of documents. It returns a response object with a list of phrases per document, and a list of errors (if any). The Document
, Request
, and ResponseError
types are identical between sentiment and keyphrases. The Response
object is nearly identical; the list of results Score
versus KeyPhrases
being the only difference.
Note: For those not familiar with F#, modules are a common method to organize code into logical components. They are often broken into one module per file, but as obvious by this code, don’t necessarily need to be.
1 | module Sentiment = |
I put together a couple helper functions. First is TextToRequestJson
which converts a string into the complext json that the api expects. The api accepts a list of documents, but for this little test I just send one text block. This code is easy enough modify if I wanted to process multiple documents at once. Second, I make an webapi call wrapper AnalyticsHttp
to handle standard headers, etc. The .NET libraries have several ways to make Http calls. I’m using FSharp.Data Http to make the api call. It’s interface is really simple to use, which is a good argument for a quick iteration. Third is a wrapper around Newtonsoft’s JsonConvert, this is just so I can call it in F# style later. Fourth, is KeyPhraseReplace
, a recursive function that marks keyphrases in a string.
Note: F#’s match: []
matches an empty list (words
in this case) to return the given string. It matches x::xs
where x
is head of the list and xs
is the tail of the list. The marking is done by wrapping keyphrases.
1 | open Sentiment |
Once the helper functions are setup, getting sentiment and keyphrases functions are pretty simple pipeline processes. Take a string, convert to a json string, send it to the analytics api call, take the response string and convert into it’s appropriate type.
Note: Partial Function Application, one of the joys of F#, and other functional languages, is partial application. This allows me to apply some of the parameters to a function, making another function. In the case below, it allows me to apply the first two parameters to the function AnalyticsHttp
, resulting in a new anonymous function (AnalyticsHttp Sentiment.url apiKey)
. I can then pipe directly into this sentiment-specific function, like any other function that takes 1 parameter.
1 | // Get sentiment score for text |
Once all of the components are in place, it’s time to do some analytics. I capture the results of sentiment in s
and the key phrases in kp
. All that’s left to do is print out the basic stats and mark the keyphrases. The target of interest is the beginning of Mary Shelley’s “Frankenstein”.
1 | // Source: "Frankenstein, or the Modern Prometheus", Mary Wollstonecraft (Godwin) Shelley |
Below are the results. If the first paragraph is any indication, this should be a delightfully cheery book. Perhaps it’s not fair to judge a book by it’s first paragraph, but I think the analysis of the text provided is correct. Not only is a positive score correlated to the input text, we can also see the breakdown of key phrases and their relationship within the text itself.
1 | Score (0 = very negative, 1 = very positive): |
Spoiler alert, here is the sentiment score: 0.398727 of the last paragraph.
Hopefully you have found this useful. With very little code it’s easy to get started with text analytics. There is much more to the Microsoft Cognitive Services, but this is a good template to use as a jumping point. Since it’s accessible through a webapi you can get to it using about anything, but there is something about F# that just feels right.