2017-01-20

F# and Text Analytics

Read Time: 9 minutes

Today I look at leveraging F# to perform text analysis using Microsoft’s Cognitive Services.

If you want to follow along, you’ll need to first get a free account from Cognitive Services. Then request the api access you want, for this post it is “Text Analytics”. The apikey they provide can be used in the upcoming code if you want to make your own calls. Microsoft offers a nice amount of free calls, providing plenty of room to play in the environment. Time to get started.

Using Paket, here is a sample paket.dependencies file.

1
2
3

source https://nuget.org/api/v2
nuget FSharp.Data
nuget Newtonsoft.Json

Below is the library loading code. This is also where I put my Cognitive Services apikey.

#r "../packages/FSharp.Data/lib/net40/FSharp.Data.dll"
#r "../packages/NewtonSoft.Json/lib/net40/Newtonsoft.Json.dll"

open System
open FSharp.Data
open Newtonsoft.Json

let apiKey = "your apikey here"

For this experiment I leverage just two of the available Text Analytics capabilties, sentiment analysis and key phrase extraction. In the spirit of organizing my code, I create modules to compartmentalize structure. In a normal project I would structure code slightly differently and put these modules into their own respective files, but for a quick demo I can get away with throwing them in the same file. I’ve also found making some of these concessions can help the setup process be a little easier to follow.

I leverage F#’s types to specify the webapi interface. The sentiment webapi expects a request object that contains a list of documents. It returns a response object that has a list of scores (one per document), and a list of errors (if any). The KeyPhrases api also expects a request with a list of documents. It returns a response object with a list of phrases per document, and a list of errors (if any). The Document, Request, and ResponseError types are identical between sentiment and keyphrases. The Response object is nearly identical; the list of results Score versus KeyPhrases being the only difference.

Note: For those not familiar with F#, modules are a common method to organize code into logical components. They are often broken into one module per file, but as obvious by this code, don’t necessarily need to be.

module Sentiment =
    let url = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment"
    type Document = { language: string; id: string; text: string }
    type Request = { documents: Document list }
    type Score = { score: float; id: string }
    type ResponseError = { id: string; message: string }
    type Response = { documents: Score list; errors: ResponseError list }

module KeyPhrases =
    let url = "https://westus.api.cognitive.microsoft.com/text/analytics/v2.0/keyPhrases"
    type Document = { language: string; id: string; text: string }
    type Request = { documents: Document list }
    type KeyPhrases = { keyPhrases: string list; id: string }
    type ResponseError = { id: string; message: string }
    type Response = { documents: KeyPhrases list; errors: ResponseError list }

I put together a couple helper functions. First is TextToRequestJson which converts a string into the complext json that the api expects. The api accepts a list of documents, but for this little test I just send one text block. This code is easy enough modify if I wanted to process multiple documents at once. Second, I make an webapi call wrapper AnalyticsHttp to handle standard headers, etc. The .NET libraries have several ways to make Http calls. I’m using FSharp.Data Http to make the api call. It’s interface is really simple to use, which is a good argument for a quick iteration. Third is a wrapper around Newtonsoft’s JsonConvert, this is just so I can call it in F# style later. Fourth, is KeyPhraseReplace, a recursive function that marks keyphrases in a string.

Note: F#’s match: [] matches an empty list (words in this case) to return the given string. It matches x::xs where x is head of the list and xs is the tail of the list. The marking is done by wrapping keyphrases.

open Sentiment
open KeyPhrases

// Convert text document into request body in json format
let TextToRequestJson text =
    let document = {
        language = "en";
        id = Guid.NewGuid().ToString();
        text = text}
    let request = { Request.documents = [ document ] }
    JsonConvert.SerializeObject(request)

// Perform http call
let AnalyticsHttp url apiKey body =
    Http.RequestString(
        url,
        httpMethod = "POST",
        headers = [
            "Content-Type", "application/json";
            "Ocp-Apim-Subscription-Key", apiKey],
        body = TextRequest body)

// Wrapper so I can call it F#-style
let JsonDeserialize<'T> s = JsonConvert.DeserializeObject<'T>(s)

// Mark key phrases in the originally provided text
let rec KeyPhraseReplace (text:string) (words:list<string>) =
    match words with
    | [] -> text
    | x::xs -> (KeyPhraseReplace text xs).Replace(" " + x, " *" + x + "*")

Once the helper functions are setup, getting sentiment and keyphrases functions are pretty simple pipeline processes. Take a string, convert to a json string, send it to the analytics api call, take the response string and convert into it’s appropriate type.

Note: Partial Function Application, one of the joys of F#, and other functional languages, is partial application. This allows me to apply some of the parameters to a function, making another function. In the case below, it allows me to apply the first two parameters to the function AnalyticsHttp, resulting in a new anonymous function (AnalyticsHttp Sentiment.url apiKey). I can then pipe directly into this sentiment-specific function, like any other function that takes 1 parameter.

// Get sentiment score for text
let GetSentiment text =
    text
    |> TextToRequestJson
    |> (AnalyticsHttp Sentiment.url apiKey)
    |> JsonDeserialize<Sentiment.Response>

// Get keyphrases from text
let GetKeyPhrases text =
    text
    |> TextToRequestJson
    |> (AnalyticsHttp KeyPhrases.url apiKey)
    |> JsonDeserialize<KeyPhrases.Response>

Once all of the components are in place, it’s time to do some analytics. I capture the results of sentiment in s and the key phrases in kp. All that’s left to do is print out the basic stats and mark the keyphrases. The target of interest is the beginning of Mary Shelley’s “Frankenstein”.

// Source: "Frankenstein, or the Modern Prometheus", Mary Wollstonecraft (Godwin) Shelley
// Url   : http://www.gutenberg.org/cache/epub/84/pg84.txt
let text = "You will rejoice to hear that no disaster has accompanied the commencement of an enterprise which you have regarded with such evil forebodings.  I arrived here yesterday, and my first task is to assure my dear sister of my welfare and increasing confidence in the success of my undertaking."

let s = 
    text
    |> GetSentiment

let kp = 
    text
    |> GetKeyPhrases

printfn "\nScore (0 = very negative, 1 = very positive):"
s 
|> (fun x ->
    x.documents
    |> List.iter (fun d -> printfn "Score: %f" d.score))

printfn "\nKeyphrases:"
kp
|> (fun d ->
    d.documents
    |> List.iter (fun d ->
        d.keyPhrases
        |> List.iter (printfn "%s")))

KeyPhraseReplace text kp.documents.[0].keyPhrases

Below are the results. If the first paragraph is any indication, this should be a delightfully cheery book. Perhaps it’s not fair to judge a book by it’s first paragraph, but I think the analysis of the text provided is correct. Not only is a positive score correlated to the input text, we can also see the breakdown of key phrases and their relationship within the text itself.

Score (0 = very negative, 1 = very positive):
Score: 0.947906

Keyphrases:
commencement
increasing confidence
sister
success
welfare
evil forebodings
disaster
enterprise
undertaking
task

// Source: "Frankenstein, or the Modern Prometheus", Mary Wollstonecraft (Godwin) Shelley
// Url   : http://www.gutenberg.org/cache/epub/84/pg84.txt
You will rejoice to hear that no *disaster* has accompanied the *commencement* 
of an *enterprise* which you have regarded with such *evil forebodings*.  I 
arrived here yesterday, and my first *task* is to assure my dear *sister* of 
my *welfare* and *increasing confidence* in the *success* of my *undertaking*.

Spoiler alert, here is the sentiment score: 0.398727 of the last paragraph.

Hopefully you have found this useful. With very little code it’s easy to get started with text analytics. There is much more to the Microsoft Cognitive Services, but this is a good template to use as a jumping point. Since it’s accessible through a webapi you can get to it using about anything, but there is something about F# that just feels right.