As the future rushes upon us, there is the growing desire to have a more integrated interaction with our computers. One way is to have our computers recognize us; enter facial recognition. Often this can be done with complex tools, but it is encouraging to be able to do it with something as simple as F# and EmguCV. With these tools in hand, facial recognition can be built into personal projects with ease.
For those not familiar, EmguCV is a one of the available OpenCV .NET wrapper packages. OpenCV has facial recognition built in, so this post will mostly devolve into the details of wiring it up using F#. But once complete, this is a good integration point for additional functionality. Time to get started.
Using Paket, here is a sample paket.dependencies file.
1 2 3
source https://nuget.org/api/v2
nuget EmguCV
Note: This project requires an additional step. I prefer Paket for package management, but that has it’s own set of implications. For this project there is a manual step after Paket has downloaded the packages. To get all things to work, the native dlls must be copied into the same directory as the EmguCV dlls. For me, the command looked like:
open System open System.Drawing open System.Drawing.Imaging open System.IO open Emgu open Emgu.CV open Emgu.CV.CvEnum open Emgu.CV.Structure open Emgu.CV.UI open Emgu.Util
Aiming toward my goal of higher interactivity, I put together a little starter app that uses the webcam to see who is sitting at the computer. If it doesn’t recognize the person, it prompts to add them to its database so they can be recognized in the future. Once it knows who they are, it just says hi. There are also some exploratory commands. I know, it isn’t much, but its a nice start to a larger, long-term project.
Building from the ground up, the first part is interacting with the camera. A simple capture.QueryFrame().Bitmap pulls a bitmap. I use a couple wrapper functions to support returning an image appropriate (Image<Gray,Byte>) for the FacialRecognition calls as well as the ability to save a single and series of images from the camera.
/// Camera/photo code module Camera = let rand = new Random()
/// Capture image from webcam as a bitmap let captureImageBitmap () = let (capture:Capture) = new Capture() let (imageBitmap:Bitmap) = capture.QueryFrame().Bitmap imageBitmap
/// Capture image from webcam and return a FacialRecognition image let captureImage () = let (image:Image<Gray,Byte>) = new Emgu.CV.Image<Gray,Byte>(captureImageBitmap()) image
/// Capture image from webcam and save as jpg let captureAndSaveImage (filename:string) = let imageBitmap = captureImageBitmap() imageBitmap.Save(filename, ImageFormat.Jpeg)
/// Take photos of person and return list of result photo files let takePhotos count (delayMs:int) dir person = [1..count] |> List.map (fun i -> let filename = Path.Combine(dir, (sprintf "%s_%d.jpg" person (rand.Next(1000000000)))) captureAndSaveImage filename System.Threading.Thread.Sleep(delayMs) filename)
Note: One extra take-away from the above code is that F# supports triple slash comments that show up in tooltips.
Next, I setup a simple database of photos and people. There are two components to the database module. The first is general db-ish type things. This includes person id/name lookup (person.txt), as well as a list of all photos taken (photos.txt). I wrap a couple lookup functions using Map as well as the ability to add to the files. In the future, these will be refactored into a real database, but simple text files work well for a demo. The second component is the OpenCV trained facial recognizer. It consists of the ability to train, as well as save the training results (trained.txt).
/// Trained data filename let trainedFilename = Path.Combine(dataDir, "trained.txt")
/// Person-level summary of prediction match v total result typeValidationResult= { name:string; matchCount:int; totalCount:int }
/// Create a map from a lookup file letprivate makeMap filename fn = if File.Exists(filename) then File.ReadAllLines(filename) |> Array.map (fun x -> x.Split [| '|' |]) |> Array.map fn |> Map.ofArray else Map.ofList [] /// Lookup person's id (using name) letprivate lookupId () = makeMap personFilename (fun x -> (x.[1], int x.[0]))
/// Lookup person's name (using id) letprivate lookupName () = makeMap personFilename (fun x -> (int x.[0], x.[1]))
/// Get a new id for a person letprivate getNewId () = let ids = Map.toArray (lookupName()) |> Array.map fst if Array.length ids = 0then1else (Array.max ids) + 1
/// Get person's id (by name) letprivate getPersonId (map:Collections.Map<string,int>) (name:string) = if map.ContainsKey name then Some (map.Item name) else None
/// Get person's name (by id) letprivate getPersonName (map:Collections.Map<int,string>) (id:int) = if map.ContainsKey id then Some (map.Item id) else None
/// Append to new entries to photo db letprivate appendPhotosLines photosFilename lines = IO.File.AppendAllLines(photosFilename, lines)
/// Add a person to the person lookup file letprivate addPersonId filename person id = File.AppendAllLines(filename, [| sprintf "%d|%s" id person |])
/// Add new photos just taken to photos file letprivate appendToPhotosFile dbFilename imageNames person = let id = match getPersonId (lookupId()) person with | Some(x) -> x | None -> let id' = getNewId() addPersonId personFilename person id' |> ignore id'
imageNames |> List.map (fun x -> sprintf "%d|%s" id x) |> appendPhotosLines dbFilename
/// ... Face Detection Functions (see below) ...
/// List of people in the db let PersonList () = lookupId() |> Map.toSeq |> Seq.map fst
/// Take a photo and lookup person's name /// Return their name let lookupPerson () = // take photo let image = Camera.captureImage()
// Lookup in db // if found, return name, else return none let trainer = getTrainer() match trainer with | Some(trainer) -> getPersonName (lookupName()) (trainer.Predict(image).Label) | _ -> None
/// Take photos and add person to database let addPerson name = // take photos let photoList = Camera.takePhotos photosToTake delayMs dataDir name
// Add to photosfile appendToPhotosFile photosFilename photoList name
// Train with new photos let trainer = trainFaceDetector photosFilename trainedFilename trainer.Save trainedFilename
/// Create a blank user (with a smiley face photo) let createBlankUser () = let blankUserId = 0 let blankUserName = "blank" let blankImageName = Path.Combine(dataDir, "blank.jpg")
let createBlankImage (filename:string) = let blankImage = new Bitmap(640, 480) let g = Graphics.FromImage(blankImage) g.FillEllipse(new SolidBrush(Color.Yellow), 220, 140, 200, 200) g.FillEllipse(new SolidBrush(Color.White), 260, 180, 40, 40) g.FillEllipse(new SolidBrush(Color.White), 340, 180, 40, 40) g.DrawArc(new Pen(Color.White, 5.F), new Rectangle(260, 230, 120, 70), 10.F, 160.F) blankImage.Save(filename, ImageFormat.Jpeg)
if not (File.Exists(personFilename)) then addPersonId personFilename blankUserName blankUserId createBlankImage blankImageName appendPhotosLines photosFilename [ sprintf "%d|%s" blankUserId blankImageName ] else ()
/// Db initial setup let init() = createBlankUser()
The application is a fun way to demonstrate functionality, but the really interesting parts (the OpenCV calls) can get lost of big blocks of code. I’ve pulled them out to make scanning for the interesting bits easier. Training turns out to be pretty easy. I’ve opted to use the FisherFace model, but Emgu also supports EigenFace and LBPHFace models. To train, call the CV.Face.FischerFaceRecognizer.Train function. It takes two arrays; one of images, and a corresponding one of int labels. It also assumes you have multiple classes, which makes sense, since what would you be training otherwise? To accomodate always having at least two classes, I created a Db.init() function that creates a blank user with a single smiley face image. Hopefully I don’t get classified as a ☺. Currently the training is one big pass, a future refactor will include iterative training.
Once trained, face prediction is done with the CV.Face.FischerFaceRecognizer.Predict call. It returns the predicted int label of the face. This maps to id, so its a simple lookup at that point for the name. All of the other stuff is boilerplate to load images and return results. The last part of this puzzle is loading saved predictions, using CV.Face.FisherFaceRecognizer.Load.
/// Train face detector letprivate trainFaceDetector photosFilename trainedFilename = // Get labels and photos for training let (ids, photos) = IO.File.ReadAllLines(photosFilename) |> Array.map (fun x -> let columns = x.Split [| '|' |] (int columns.[0], columns.[1])) |> Array.map (fun (id, photoFilename) -> let image = new Image<Gray, Byte>(photoFilename) (id, image)) |> Array.unzip
// Train based on photos let trainer = new CV.Face.FisherFaceRecognizer() trainer.Train<Gray, Byte>(photos, ids)
// Save trained data trainer.Save(trainedFilename)
trainer
/// Perform prediction validation for a set of photos and trainer let validatePredictions photosFilename (trainer:Face.FisherFaceRecognizer) = File.ReadAllLines(photosFilename) |> Array.map (fun x -> x.Split [| '|' |]) |> Array.map (fun x -> let image = new Image<Gray, Byte>(x.[1]) let predicted = trainer.Predict(image) { ValidationResult.name = lookupName().Item predicted.Label; matchCount = (if int x.[0] = predicted.Label then1else0); totalCount = 1 }) |> Array.groupBy (fun x -> x.name) |> Array.map (fun x -> { ValidationResult.name = fst x; matchCount = snd x |> Array.map (fun y -> y.matchCount) |> Array.sum; totalCount = snd x |> Array.map (fun y -> y.totalCount) |> Array.sum })
/// Load the trained face recognizer let getTrainer () = let trainer = new CV.Face.FisherFaceRecognizer() if File.Exists(trainedFilename) then trainer.Load(trainedFilename) Some trainer else None
Next, putting it all together. By this point the interesting things are already complete, all that’s left is wrapper code. In the App module I build out the commands as well as the main loop. I also have a small Db.init() call to create the blank image that I mentioned earlier. Beyond that, the functions speak for themselves (and there are comments), so I won’t go into detail here.
/// Add a person to the photo db let addPerson () = printfn "Name to add (ensure person is in front of the camera): " let name = Console.ReadLine() printfn "Taking photos and training..." Db.addPerson name
/// Lookup person currently in front of camera let whoAmI () = let person = Db.lookupPerson() match person with | Some(person) -> printfn "You are %s" person | None -> printfn "I don't recognize you. Sorry."
/// Display a list of known people in the db let reportPeople () = printfn "People" printfn "------" Db.PersonList() |> Seq.iter (printfn "%s")
/// Display a validation report for recognition let reportValidation () = // run validation against existing photos let trainer = Db.getTrainer() match trainer with | Some(trainer) -> Db.validatePredictions Db.photosFilename trainer |> Array.iter (fun x -> printfn "%10s %5d %5d %5.2f" x.name x.matchCount x.totalCount ((float x.matchCount) / (float x.totalCount))) | _ -> printfn "No training data"
/// Show available commands let showHelp() = printfn "Commands: [addperson|whoami|people|validate|help|exit]"
/// Get a person's name (by lookup or prompt to add to db) let getName() = let person = Db.lookupPerson() match person with | Some(person) -> printfn "Hi %s" person Some person | None -> printfn "I don't recognize you. What is your name? " let name = Console.ReadLine() printfn "Taking photos and training..." Db.addPerson name // Note: Could return name, but I want to explicitly force a lookup //Some name None
/// Main letrec main name = match name with | Some(name) -> Console.Write("> ") let line = Console.ReadLine() let keepGoing = doCommand line if keepGoing then main (Some name) else () | None -> let name = getName() main name
Db.init() App.main None
The code is all together, it’s time to take the application for a test drive.
Great! It can tell the difference between people. (Sidebar: the detection isn’t perfect; but more, and better quality, data often helps with accuracy.) But I can’t leave well enough alone. It feels too impersonal, if only it knew how I was feeling. Lucky for me, Amazon’s Rekognition api holds the key to some fun bits. The TLDR; the api provides, among other things, its prediction of how a person in a picture is feeling. Other interesting components are: age range, gender, do they have glasses, and feature location.
Before I get into the code, the first requirement is an AWS account. Second, an IAM must be created with Rekognition service permissions. Third, add the IAM credentials to the credentials file, ~/.aws/credentials.
Note: I’ll make a quick mention here about SSL certs. There was an easy to overcome snag when making the AWS call. When running the code in VSCode/Ionide, it ran fine. When running it from the commandline using fsharpi, I got an error. Specifically this: Amazon.Runtime.AmazonServiceException: A WebException with status TrustFailure was thrown. ---> System.Net.WebException: Error: TrustFailure (The authentication or decryption has failed.) ---> System.IO.IOException: The authentication or decryption has failed. ---> System.IO.IOException: The authentication or decryption has failed. ---> Mono.Security.Protocol.Tls.TlsException: Invalid certificate received from server. Error code: 0xffffffff800b010a at Mono.Security.Protocol.Tls.RecordProtocol.EndReceiveRecord (System.IAsyncResult asyncResult) .... There are several ways to resolve this error. I opt’d to solve this by importing the mozilla certs for mono using mozroots.exe. Other options can be found at (http://www.mono-project.com/docs/faq/security/); your mileage may vary.
Once these components are in place, a couple small modifications are required. First, add a new package to my package.dependencies file.
I’m going to put all the code into a new module. To get what we need out of the api, the basic workflow is:
Take photo as Bitmap.
Create a request object (with the image attached).
Call the Rekognition api with the request object.
Grab the attributes of the response object I care about.
For details on the api, I recommend looking at the Rekognition DetectFaces Documentation. There are a couple small details for my implementation I want to mention. The api grabs all faces it can find, so FaceDetail is an array. My use case presumes one person, or if more, it just takes the first one it finds. The api emotion is a list of possible emotions with their probabilities. This isn’t very friendly looking, so I only show the highest probability emotion. The whoAmI function provides some addition interesting reporting from the image.
/// Additional face analysis features module FaceExtra =
/// Take an image and return an aws Model.Image letprivate bitmapToModelImage (image:Bitmap) = // Load image into memorystream let ms = new MemoryStream() image.Save(ms, ImageFormat.Jpeg);
// Convert memorystream to aws' Model.Image let modelImage = new Model.Image() modelImage.Bytes <- ms
// Return model image modelImage
/// Take an image filename and return an aws DetectFacesRequest letprivate buildRequest (image:Bitmap) = let request = new Model.DetectFacesRequest()
// Get all attributes back from api let attributeList = new Collections.Generic.List<string>() attributeList.Add("ALL") request.Attributes <- attributeList
// Set Image request.Image <- bitmapToModelImage image
// Return request request
/// Given a list of emotions, return the highest confidence one (in tuple form) let getMainEmotion (emotions:Collections.Generic.List<Model.Emotion>) = if emotions.Count <> 0 then emotions |> Seq.sortByDescending (fun x -> x.Confidence) |> Seq.head |> (fun e -> (e.Type.Value, e.Confidence)) else ("Unknown", float32 0.)
/// Query the rekognition api using the provided bitmap image let getFaceDetails (image:Bitmap) = let request = buildRequest image
let rekognition = new Amazon.Rekognition.AmazonRekognitionClient(Amazon.RegionEndpoint.USEast1)
let detectedFaces = rekognition.DetectFaces(request)
detectedFaces
/// Take a snapshot and determine the person's emotional state. let getCurrentEmotion () = let details = getFaceDetails (Camera.captureImageBitmap())
if details.FaceDetails.Count <> 0 then Some ((fst (getMainEmotion details.FaceDetails.[0].Emotions)).ToLower()) else None
/// Make a friendly description string for showing if face has an attribute let attributeDisplay (value:bool) (description:string) = if value then sprintf "Has %s" description else sprintf "No %s" description
/// Build a simple report string let getFaceReport () = let details = getFaceDetails (Camera.captureImageBitmap())
if details.FaceDetails.Count <> 0 then let face = details.FaceDetails.[0] sprintf "Gender: %s\r\nAge: %d - %d\r\nEmotions: %s\r\n%s\r\n%s\r\n%s\r\n%s" face.Gender.Value.Value face.AgeRange.Low face.AgeRange.High (String.Join(", ", face.Emotions |> Seq.map (fun x -> sprintf "%s (%f)" x.Type.Value x.Confidence))) (attributeDisplay face.Beard.Value "beard") (attributeDisplay face.Mustache.Value "mustache") (attributeDisplay face.Eyeglasses.Value "glasses") (attributeDisplay face.Sunglasses.Value "sunglasses") else""
The additional functionality gets wired into the whoAmI and getName calls. This is a pretty simple add.
/// Lookup person currently in front of camera let whoAmI () = let person = Db.lookupPerson() match person with | Some(person) -> let report = FaceExtra.getFaceReport() printfn "You are %s\r\n%s" person report | None -> printfn "I don't recognize you. Sorry."
/// Get a person's name (by lookup or prompt to add to db) let getName() = let person = Db.lookupPerson() match person with | Some(person) -> let emotion = FaceExtra.getCurrentEmotion() match emotion with | Some(emotion) -> printfn "Hi %s, you seem %s" person emotion | None -> printfn "Hi %s" person Some person | None -> printfn "I don't recognize you. What is your name? " let name = Console.ReadLine() printfn "Taking photos and training..." Db.addPerson name // Note: Could return name, but I want to explicitly force a lookup //Some name None
Time to checkout how the new functionality looks.
Cool. It did a pretty good job. The great thing is, this kind of technology will only get better. This has been fun, but the post has already gone longer than intended, so I’ll end it here. I hope you enjoyed this little glimpse into facial recognition and information extraction. Until next time.