Tuesday, November 8, 2016

Face detection on iOS

I've been playing a bit with the Camera on iOS lately and released a photo gallery/camera library called Chafu. iOS provides a fairly easy API to do all sorts of stuff, such as reading bar codes, QR codes and some other types of machine readable codes. It also supports finding faces!

So as a fun feature I decided to figure out how to detect a face and show it live in the preview on screen when taking a photo, or recording a video, then add it in Chafu.

Note: iOS 7 added functionality to AVFoundation to do all the above, so what I describe here is iOS 7 and up, keep that in mind if you intend to do this on earlier versions.
In this blog post I assume you already know how to set up a AVCaptureSession, AVVCaptureDeviceInput and AVCaptureVideoPreviewLayer to preview what is coming from the the Camera.

Setup face detection

First we need to add AVCaptureMetadataOutput to our session, this class is what detects faces and requires IAVCaptureMetadataOutputObjectsDelegate to be implemented in the class, which is where we get a callback when faces are detected.

Setting up a AVCaptureMetadataOuput is simple
  1. Instantiate it an instance of it
  2. Add it to the AVCaptureSession
  3. Find out if faces metadata is available
  4. Setup callback if 3. is OK

Notice this in the callback? That is the implementation of IAVCaptureMetadataOutputObjectsDelegate. I decided to add it directly to my Camera class. The implementation looks somewhat like follows.

The exportis pretty important, as this is how we tell the iOS world that we have implemented the interface, it is how it gets hold of our code.

Now hopefully. When you run this code and set a break point in the DidOutputMetadataObjects method, it would get hit when a face is detected. The argument metadataObjects will contain AVMetadataFaceObjects, which contains all you need to visually show the detected faces. 

Display faces

To display faces I like using iOS layers, which gives us the possibility to do rotations and other transformations pretty easily. I will show this later in this post how to utilize the Roll and Yaw from the AVMetadataFaceObject when drawing the visual indicator for a face.

First we need to set up the CALayer we will add detected faces to. This is the easiest as it will be easier later on just to clear the sublayers when we don't want to show faces anymore.

Now we can add faces to that layer in our DidOutputMetadataObjects method. This is done in a couple of simple steps.

  1. Iterate the metadataObjects array, which we get as argument
  2. Make sure these are indeed AVMetadataFaceObject
  3. Transform the object through AVCaptureVideoPreviewLayer's GetTransformedMetadataObject to get the correct coordinates for the face
  4. Create a new CALayer for the face with a border or the desired effect
  5. Set Bounds of the CALayer to what we got from the transformation
  6. Add it as sublayer in the overlayLayer we created earlier
 Lets start by defining a method for how we want this layer which will show the face to look like.

Here I simply create a new CALayer and set the border color, width and corner radius. So what we will see are white squares with border width 2 and rounded corners. Simple!

Now, lets add this layer to the overlayLayer so we can actually see something on the screen.

That is it! Now you should have some squares showing up for faces. One problem though. This might add a whole bunch of layers. So we need to remove the sublayers before adding the new ones.

Call RemoveFaces() method before you iterate metadataObjects in DidOutputMetadataObjects and this should.
The observant reader, might notice that this seems inefficient. I won't cover this in this article. However, you can see one approach to solve this in BaseCameraView in Chafu, where I keep track of the FaceId from the AVMetadataFaceObject and simply adjust bounds for that face if it has moved.

Adjusting for Yaw and Roll angles

The AVMetadataFaceObject gives us a RollAngle for when you rotate your head around the Z axis. It also gives us a YawAngle for rotations around the Y axis.

Picture from StackOverflow: http://stackoverflow.com/q/16401505/368379
Supporting RollAngle is easy as it does not require that we rotate into the Z plane, but rather around it. However, rotations around the Y axis, will move the rectangle into the Z plane. Per default, CALayer is flat and has no idea of perspective. We can fix that! Back to where we create the overlayLayer we need to add a simple transformation which will gives us this perspective.

What this does is to take the default transformation and add a distance to the 3D projection plane in terms of 1/z. In other words we add depth. Apple does this in reverse. Hence, -1/z is used, where z is the distance, in this case I use 1000. The bigger the value, the bigger the distance. For a more detailed explanation you could start by reading about 3D projection on Wikipedia.

Now we can do our rotations to the face CALayers.


To make a Roll rotation we simply create a new CATransform3D using the static MakeRotation method, which allows us to rotate around any axis. As shown above roll rotations are around the Z axis. CATransform3D expects the angle in radians. Hence, we need to convert that first.

Pretty simple. We will apply this to the face layer later.


The Yaw rotation is a bit more involved. We need to know the orientation of the device as the Y axis changes depending on the orientation, and we need to adjust the angle for that. This means that faces are always detected in the same orientation. However, our preview layer will change along the orientation.

Now we need to make the Yaw transformation and combine with the orientation transformation.

Finally, apply the two transformations to the face CALayer, in the DidOutputMetadataObjects method.

Notice, I add a default transformation to the faceLayer and concatenate the roll and/or yaw transform according to availability. That is it. Now, you should have something like this. Screenshots are taken from Chafu, which demonstrate detection with no rotation, then with roll and then with yaw.

No rotation