What is a Face? Creating a Skin Texture Model for Facial Recognition
Facial recognition technology (“FRT”) is the general term for a complex series of programs, the aim of which is to have computers recognize and compare images of faces to determine if they match. The process of identifying and recognizing faces, while common and intuitive for humans, is a daunting task for computers. For an example of just how difficult this process can be even for humans who lack the proper “software,” it’s useful to consider the strange condition known as “prosopagnosia,” or face blindness.1 People who suffer from face blindness are thought to have some kind of defect in the area of the brain called the right fusiform gyrus.2 Essentially, this area of the brain is specialized human software whose job is to recognize patterns of facial features to an extent that goes far above and beyond pattern recognition in other objects.3
Those who suffer from face blindness cite incidents up to and including failing to recognize children, spouses, parents, and close friends, and describe the process of attempting to remember and recognize facial features as an exercise in brute memorization. One compared it to seeing a pile of Legos for one second, then having to cover your eyes and perfectly describe the size, color, and location of every block—perfectly, every time, for endless slightly different piles of Legos.4 The extreme difficulty of such a task is remarkable mostly in that it points out the extent to which much most of us don’t have to work to remember faces, as our brains are specifically tailored for just that task. Computers, however, are not. Not only do computers lack instinctive recognition of facial patterns, they lack even the fundamental conception of what a face is, where it is, and how to differentiate between face and not-face in the mass of visual data from a photograph.
Facial recognition technology, therefore, is an immensely complicated task. While its applications are almost endless, from law enforcement suspect identification to authentication for credit and ATM transactions,5 its execution poses almost as many hurdles. Programmers have broken down the task into four basic steps, each of which entails its own complexities: (1) face detection, which separates the face within an image from its background; (2) normalization, in which the image is adjusted to a standard size, pose, and illumination; (3) feature extraction, in which a mathematical representation of the face is created to use as a reference point for comparison between images; and (4) matching images for identification and verification.6 While each of these processes is, as noted, extraordinarily complex, this article focuses on a surprisingly important detail within the “feature extraction” phase: creating a model of a face through skin texture analysis.
II. Learning What A Face Is
Computers do not have the inherent ability to discriminate between different faces, recognize a face, or understand what a face is. Therefore, computers must first learn what a face is to later differentiate between different faces with complex features. Similarly, a person would not be able to distinguish between different African Hornbills, if they did not first know what an African Hornbill is and what it looks like.
For a computer to understand a face, it must be expressed in a language that the computer can understand: mathematical models. To generate a mathematical model of a face that will indicate to the computer what a face “looks like” mathematically, an algorithm (or set of rules) will filter through all facial features in a database of images, known as the training set, and average these traits together to create a symbolic texture primitive, an archetypal model of what a face looks like.7 The archetypal face could be thought of as a face of averages, like the popular Photoshopped images of faces that combine features from different photographs into a single face.8
A. Creating the Archetypal Face
To create the archetypal face, the computer must first analyze each image within the training set in minute detail, down to measuring various characteristics, or vectors, of each individual pixel within every training set face.9 The computer starts this process by analyzing the smallest properties of each pixel within each face in the training set, and averaging the properties of the pixel together to generate a data point.10
Within a specific pixel, the computer will quantify properties of at least three different features, such as the size, shape, and distance between skin pores or wrinkles.11 For each different characteristic within the pixel, the computer will assign the characteristic a specific number by filtering the characteristic through an algorithm.12 For example, if the computer is analyzing the size of the pores within a pixel, it would filter the pixel through an algorithm measuring the size of the pores, allowing the computer to label each pore with a number representing its size.
After the computer has measured the size of each pore within the pixel, the computer will take all of the numbers representing the sizes of pores in the pixel and create a Gaussian filter, which is a type of bell curve.13 The bell curve will guide the computer to understand which pore sizes are the most common in the pixel, and which sizes are the least common.14 The computer will then filter out the two extremes of the bell curve.15 In other words, the computer will take the especially small and especially large pore sizes within the pixel and filter them out of its calculation. Removing the extreme ends of the bell curve is crucial for the subsequent “matching step” of facial recognition because very different pore sizes within the same pixel could result in the computer failing to recognize that these pores are part of the same region of the face.16 Next, the bell curves created for each different characteristic within the pixel will be combined to create a data point, known as a texton.17 This data point will represent the combined characteristics, like the shape, size and density, of skin texture within a specific pixel.18 The computer will complete this process for each pixel on every face in the training set.
B. A Broader Understanding of the Face
Once the computer has calculated a data point for every pixel in every face in the training set, the computer pieces together these data points to generate the skin texture of each particular region of the archetypal face.19 For example, if the computer is attempting to generate the skin texture of the forehead region of the archetypal face, it must combine all of the data points from each pixel within the forehead region of every face in the training set. This process is difficult for the computer, because it is challenging to determine which data points correspond to each region of the training set faces.20 Notably, the skin texture of pixels located on the human forehead have similar characteristics as those located on the cheek.21
The computer will complete the process of mapping data points to particular regions of the faces by connecting data points based on their characteristics.22 This mapping is similar to putting together a jigsaw puzzle. When putting together a puzzle, the person completing the puzzle will look to connect pieces with similar colors, patterns, and shapes that fit together. The computer completes a similar process when piecing together different data points to construct the skin texture of the face.23 To illustrate, if the computer has generated data points for a particular training set face and identifies that several of the data points indicate a forehead wrinkle, the computer will look to find all of the data points that contain parts of this same wrinkle to determine these data points map onto the forehead. The computer will complete the mapping process for each region of each training set face.24
Finally, after the computer has mapped data points onto the correct regions of the training set faces, the computer will average together the data points contained within each region of the face using a bell curve, while removing the extreme ends of the curve.25 This step is identical to the earlier bell curve step, except instead of creating a bell curve for the properties within an individual pixel, the computer is creating a bell curve for the data points contained within an entire region of the face. After generating a bell curve for each region of each training image, the computer averages together these data points for each region of the face to create the archetypal face.26
Having performed this extensive analysis, the computer has an understanding of what a face looks like based on the properties of skin texture. It understands not only the minute properties that make up various skin textures, such as pores or wrinkles within a specific pixel, but can also identify where specific characteristics occur on different regions of the face. This archetypal face model gives the computer the base it needs to perform the final “matching step” in the facial recognition analysis paradigm.
Skin texture analysis has proven to be a successful tool for facial recognition by significantly increasing identification accuracy.27 While one method of building a model based on skin texture analysis is explored here, several other methods are used within the complex process of facial recognition. Though other texture analysis software contains different processes and intricacies, all require the computer to first understand what a face is and what a face looks like through the use of mathematical models.28 These complex systems are created with the purpose of doing what the human brain can do automatically: recognizing and identifying facial characteristics associated with particular individuals. Mathematical precision has allowed computers to identify swaths of the population orders of magnitude greater than a person could identify from human perception and memory alone.29 This increased perception and identification of human faces raises policy concerns as these technologies become increasingly accurate and widespread. The advance in accuracy and deployment of these systems works to make previous expectations of privacy, based on notions of what the human eye can perceive in public, obsolete.30
Paradoxically, despite computers’ ability to recognize faces on a more granular and widespread level, these technologies lack the sophistication of human perception, which results in concerns such as FRT algorithms are less accurate when identifying people of color.31 This paradox of FRT possessing stronger yet weaker perception than the human brain implicates complex technological and sociological questions, which warrant attention from the programmers who build these systems, the lawmakers who regulate these systems, and any person whose image is contained within.
* GLTR Staff Member; Georgetown Law, J.D. expected 2018; Ithaca College, B.S. 2009. © 2017, Jeremy Greenberg.