FairCal
Fairness Calibration for Face Verification
Despite being widely used, face recognition models suffer from bias: the probability of a false positive (incorrect face match) strongly depends on sensitive attributes such as the ethnicity of the face. As a result, these models can disproportionately and negatively impact minority groups, particularly when used by law enforcement.
The majority of bias reduction methods have several drawbacks: they use an end-to-end retraining approach, may not be feasible due to privacy issues, and often reduce accuracy. An alternative approach is post-processing methods that build fairer decision classifiers using the features of pre-trained models, thus avoiding the cost of retraining. However, they still have drawbacks: they reduce accuracy (AGENDA, PASS, FTC), or require retuning for different false positive rates (FSN).
In this work, we introduce the Fairness Calibration method, FairCal, a post-training approach that simultaneously:
- increases model accuracy (improving the state-of-the-art);
- produces fairly-calibrated probabilities;
- significantly reduces the gap in the false positive rates;
- does not require knowledge of the sensitive attribute (group identity such as race, ethnicity, etc.);
- does not require retraining, training an additional model, or retuning.
We apply it to the task of Face Verification, and obtain state-of-the-art results with all the above advantages. We do so by applying a post-hoc calibration method to pseudo-groups formed by unsupervised clustering.
Fairness and Bias in Face Verification
The Face Verification problem consists in given two images, decide if it is a genuine/imposter pair.
Chouldechova (2017) showed that maximum two of the following three conditions can be satisfied:
- Fairness Calibration, i.e., calibrated fairly for different subgroups: $$ \mathbb{P}_{\boldsymbol{x}_1,\boldsymbol{x}_2 \sim \mathcal{G}_1}(Y=1\mid \widehat{C}=c) = \mathbb{P}_{\boldsymbol{x}_1,\boldsymbol{x}_2 \sim \mathcal{G}_2}(Y=1\mid \widehat{C}=c) = c $$
- Predictive Equality, i.e., equal False Positive Rates (FPRs) across different subgroups: $$ \mathbb{P}_{(\boldsymbol{x}_1,\boldsymbol{x}_2) \sim \mathcal{G}_1}(\widehat{Y}=1\mid Y=0) = \mathbb{P}_{(\boldsymbol{x}_1,\boldsymbol{x}_2) \sim\mathcal{G}_2}(\widehat{Y}=1\mid Y=0) $$
- Equal Opportunity, i.e., equal False Negative Rates across different subgroups: $$ \mathbb{P}_{(\boldsymbol{x}_1,\boldsymbol{x}_2) \sim \mathcal{G}_1}(\widehat{Y}=0\mid Y=1) = \mathbb{P}_{(\boldsymbol{x}_1,\boldsymbol{x}_2) \sim \mathcal{G}_2}(\widehat{Y}=0\mid Y=1) $$
In the particular context of policing, predictive equality is considered more important than equal opportunity, as false positive errors (false arrests) risk causing significant harm, especially to members of subgroups already at disproportionate risk for police scrutiny or violence. Hence we choose to omit equal opportunity as our goal and note that no prior method has targeted Fairness Calibration. Predictive equality is measured by comparing the FPR on each subgroup at one global FPR.
Goals and Related Work
Work on bias mitigation for deep Face Verification models can be divided into two main camps:
- methods that let a model learn less-biased representations during training, and
- post-processing approaches that attempt to remove bias after a model is trained.
Our work focuses on (ii) post-hoc methods.
Baseline Approach
Given a trained neural network \(f\) that encodes an image \(\boldsymbol{x}\) into an embedding \(\boldsymbol{z} = f(\boldsymbol{x})\), the baseline classifier for the face verification problem is the following.
- 1) Given an image pair \((\boldsymbol{x}_1,\boldsymbol{x}_2)\): compute the feature embedding pair \((\boldsymbol{z}_1, \boldsymbol{z}_2)\).
- 2) Compute the cosine similarity score \(s(\boldsymbol{x}_1,\boldsymbol{x}_2)=\frac{\boldsymbol{z}_1^T \boldsymbol{z}_2}{\|\boldsymbol{z}_1\| \|\boldsymbol{z}_2\|}\).
- 3) Given a predefined threshold \(s_{\rm{thr}}: s(\boldsymbol{x}_1,\boldsymbol{x}_2) > s_{\rm{thr}} \implies\) genuine pair!
FairCal
We build our proposed method FairCal based on two main ideas:
- 1) Use the feature vector to define population subgroups;
- 2) Use post-hoc calibration methods that convert cosine similarity scores into probabilities of genuine (or imposter) pair.
Calibration stage
Let \(\mathcal{Z}^{\rm{cal}}\) denote the feature embeddings of a set of face images.
-
1) Apply \(K\)-means algorithm to \(\mathcal{Z}^{\rm{cal}}\), partitioning the embedding space into \(K\) clusters \(\mathcal{Z}_1,\ldots,\mathcal{Z}_K\)
-
2) Form the \(K\) calibration sets of cosine similarity scores:
- 3) For \(k=1,\ldots,K\) estimate the calibration map \(\mu_k\) that calibrates the scores:
For FairCal we chose Beta Calibration (Kull et al, 2017) as the post-hoc calibration method but experiments show similar performance with other calibration methods.
Test stage
- 1) Given an image pair (\(\boldsymbol{x}_1\), \(\boldsymbol{x}_2\)), compute (\(z_1\), \(z_2\)), and the cluster of each image feature: \(k_1\) and \(k_2\)
- 2) The model’s confidence \(c\) in it being a genuine pair is:
where \(\theta = \frac{\|S^{\rm{cal}}_{k_1}\|}{\|S^{\rm{cal}}_{k_1}\|+\|S^{\rm{cal}}_{k_2}\|}\) is the relative population fraction of the two clusters.
- 3) Given a predefined threshold \(c_{\rm{thr}}: c(\boldsymbol{x}_1,\boldsymbol{x}_2) > c_{\rm{thr}} \implies\) genuine pair!
Results
Our results show that among post hoc calibration methods,
- 1) FairCal has the best Fairness Calibration.
- 2) FairCal has the best Predictive Equality, i.e., equal FPRs,
- 3) FairCal has the best global accuracy,
- 4) FairCal does not require the sensitive attribute and outperforms methods that use this knowledge,
- 5) FairCal does not require retraining of the classifier, or any additional training.
Unsupervised Clusters
In order to not rely on the sensitive attribute like the Oracle method, our FairCal method uses unsupervised clusters computed with the K-means algorithm based on the feature embeddings of the images. We found them to have semantic meaning.
Citation
You can see the full paper here. Please cite as
@inproceedings{salvador2022faircal,
title={FairCal: Fairness Calibration for Face Verification},
author={Tiago Salvador and Stephanie Cairns and Vikram Voleti and Noah Marshall and Adam M Oberman},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=nRj0NcmSuxb}
}