# Mitigating Unwanted Biases with Adversarial Learning

@article{Zhang2018MitigatingUB, title={Mitigating Unwanted Biases with Adversarial Learning}, author={B. Zhang and Blake Lemoine and Margaret Mitchell}, journal={Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society}, year={2018} }

Machine learning is a tool for building models that accurately represent input training data. When undesired biases concerning demographic groups are in the training data, well-trained models will reflect those biases. We present a framework for mitigating such biases by including a variable for the group of interest and simultaneously learning a predictor and an adversary. The input to the network X, here text or census data, produces a prediction Y, such as an analogy completion or income… Expand

#### Supplemental Code

Github Repo

Via Papers with Code

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.

#### Paper Mentions

#### 477 Citations

Generative Adversarial Networks for Mitigating Biases in Machine Learning Systems

- Computer Science, Mathematics
- ECAI
- 2020

Experimental results show that the proposed solution can efficiently mitigate different types of biases, while at the same time enhancing the prediction accuracy of the underlying machine learning model. Expand

Efficiently Mitigating Classification Bias via Transfer Learning

- Computer Science, Mathematics
- ArXiv
- 2020

The proposed Upstream Bias Mitigation for Downstream Fine-Tuning (UBM) framework is proposed, which mitigate one or multiple bias factors in downstream classifiers by transfer learning from an upstream model. Expand

Towards Learning an Unbiased Classifier from Biased Data via Conditional Adversarial Debiasing

- Computer Science
- ArXiv
- 2021

A novel adversarial debiasing method is presented, which addresses a feature that is spuriously connected to the labels of training images but statistically independent of the labels for test images, so that the automatic identification of relevant features during training is perturbed by irrelevant features. Expand

Fair Representation for Safe Artificial Intelligence via Adversarial Learning of Unbiased Information Bottleneck

- Computer Science
- SafeAI@AAAI
- 2020

Non-discriminated representation is formulated as a dual objective optimization problem of encoding data while obfuscating the information about the protected features in the data representation by exploiting the unbiased information bottleneck. Expand

Data Augmentation for Discrimination Prevention and Bias Disambiguation

- Computer Science
- AIES
- 2020

A novel data augmentation technique to create a fairer dataset for model training that could also lend itself to understanding the type of bias existing in the dataset i.e. if bias arises from a lack of representation for a particular group (sampling bias) or if it arises because of human bias reflected in the labels (prejudice based bias). Expand

Latent Adversarial Debiasing: Mitigating Collider Bias in Deep Neural Networks

- Computer Science
- ArXiv
- 2020

It is argued herein that the cause of failure is a combination of the deep structure of neural networks and the greedy gradient-driven learning process used – one that prefers easyto-compute signals when available. Expand

Bias-Resilient Neural Network

- Computer Science
- ArXiv
- 2019

A method based on the adversarial training strategy to learn discriminative features unbiased and invariant to the confounder(s) by incorporating a new adversarial loss function that encourages a vanished correlation between the bias and learned features. Expand

Learning Fair Representations via an Adversarial Framework

- Computer Science, Mathematics
- ArXiv
- 2019

A minimax adversarial framework with a generator to capture the data distribution and generate latent representations, and a critic to ensure that the distributions across different protected groups are similar provides a theoretical guarantee with respect to statistical parity and individual fairness. Expand

Adversarial Removal of Demographic Attributes from Text Data

- Computer Science, Mathematics
- EMNLP
- 2018

It is shown that demographic information of authors is encoded in—and can be recovered from—the intermediate representations learned by text-based neural classifiers, and the implication is that decisions of classifiers trained on textual data are not agnostic to—and likely condition on—demographic attributes. Expand

Inherent Tradeoffs in Learning Fair Representations

- Computer Science
- NeurIPS
- 2019

This paper provides the first result that quantitatively characterizes the tradeoff between demographic parity and the joint utility across different population groups and proves that if the optimal decision functions across different groups are close, then learning fair representations leads to an alternative notion of fairness, known as the accuracy parity. Expand

#### References

SHOWING 1-10 OF 16 REFERENCES

Data Decisions and Theoretical Implications when Adversarially Learning Fair Representations

- Computer Science
- ArXiv
- 2017

An adversarial training procedure is used to remove information about the sensitive attribute from the latent representation learned by a neural network, and the data distribution empirically drives the adversary's notion of fairness. Expand

Generative Adversarial Nets

- Computer Science
- NIPS
- 2014

We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a… Expand

A statistical framework for fair predictive algorithms

- Mathematics, Computer Science
- ArXiv
- 2016

A method to remove bias from predictive models by removing all information regarding protected variables from the permitted training data is proposed and is general enough to accommodate arbitrary data types, e.g. binary, continuous, etc. Expand

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

- Computer Science, Mathematics
- NIPS
- 2016

This work empirically demonstrates that its algorithms significantly reduce gender bias in embeddings while preserving the its useful properties such as the ability to cluster related concepts and to solve analogy tasks. Expand

Equality of Opportunity in Supervised Learning

- Computer Science, Mathematics
- NIPS
- 2016

This work proposes a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features and shows how to optimally adjust any learned predictor so as to remove discrimination according to this definition. Expand

Adam: A Method for Stochastic Optimization

- Computer Science, Mathematics
- ICLR
- 2015

This work introduces Adam, an algorithm for first-order gradient-based optimization of stochastic objective functions, based on adaptive estimates of lower-order moments, and provides a regret bound on the convergence rate that is comparable to the best known results under the online convex optimization framework. Expand

Inherent Trade-Offs in the Fair Determination of Risk Scores

- Computer Science, Mathematics
- ITCS
- 2017

Some of the ways in which key notions of fairness are incompatible with each other are suggested, and hence a framework for thinking about the trade-offs between them is provided. Expand

Distributed Representations of Words and Phrases and their Compositionality

- Computer Science, Mathematics
- NIPS
- 2013

This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling. Expand

and Johndrow

- J.
- 2016

- 2014