Getting started with research on AI fairness in medical imaging
FAIMI has put together this resource page to help researchers get started with research on the fairness of AI for medical imaging. We would like to see this as an organically evolving resource, so please get in touch with us if you have any suggestions for additions or modifications.
Literature
The literature on AI fairness in medical imaging is growing rapidly, and we do not attempt to provide an exhaustive list of related publications here. Rather, we list a few key references for specific areas of fairness research that can act as starting points for your own literature searches.
Review papers on AI fairness
- Du et al (2020), Fairness in Deep Learning: A Computational Perspective, IEEE Intelligent Systems. (arxiv)
- Mehrabi et al (2021), A Survey on Bias and Fairness in Machine Learning, ACM Computing Surveys. (arxiv)
- Chen et al (2023), Algorithmic Fairness in Artificial Intelligence for Medicine and Healthcare, Nature Biomedical Engineering.
- Xu et al (2024), Addressing Fairness Issues in Deep Learning-based Medical Image Analysis: A Systematic Review, npj Digital Medicine.
Seminal works on AI fairness
- Angwin et al (2016), Machine Bias, ProPublica. COMPAS study in which racial bias was shown in a machine learning algorithm for predicting recidivism.
- Hardt et al (2016), Equality of Opportunity in Supervised Learning, NeurIPS. (arxiv) Introduced the concepts of equality of opportunity and equalized odds in algorithmic fairness. Characterizes trade-offs and provides optimal post-processing methods.
- Chouldechova (2017), Fair Prediction with Disparate Impact: A Study of Bias in Recidivism Prediction Instruments, Big Data. A re-analysis and re-discussion of the COMPAS case. The key result here was that PPV equality and equal error rates are not compatible in the presence of base rate differences between groups.
- Buolamwini et al (2018), Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification, FAccT. One of the earliest papers to uncover AI bias in image classification.
- Obermeyer et al (2019), Dissecting Racial Bias in an Algorithm Used to Manage the Health of Populations, Science. Demonstration that an algorithm actively used to distribute healthcare system resources was severely biased against black patients.
- Barocas et al (2019), Fairness and Machine Learning: Limitations and Opportunities. Full book on algorithmic fairness, covers many aspects.
Perspectives on what constitutes AI fairness
- Corbett-Davies and Goel (2018), The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning, arXiv. (As of 2023, a version of this is now also in JMLR.)
- Rajkomar et al (2018), Ensuring Fairness in Machine Learning to Advance Health Equity, Annals of Internal Medicine.
- McCradden et al (2020), Ethical limitations of algorithmic fairness solutions in health care machine learning, The Lancet Digital Health.
- Sambasivan et al (2021), Re-imagining Algorithmic Fairness in India and Beyond, FAccT.
- Ricci Lara et al (2022), Addressing Fairness in Artificial Intelligence for Medical Imaging, Nature Communications.
- Petersen et al (2023), The Path Toward Equal Performance in Medical Machine Learning, Patterns.
Shortcut learning, models recognizing sensitive patient attributes, and fairness in medical AI
- Geirhos et al (2020), Shortcut Learning in Deep Neural Networks, Nature Machine Intelligence. General overview of shortcut learning in deep neural networks, not medicine-specific.
- Yi et al (2021), Radiology “Forensics”: Determination of Age and Sex from Chest Radiographs Using Deep Learning, Emergency Radiology, and Gichoya et al (2022), AI Recognition of Patient Race in Medical Imaging: A Modelling Study, Lancet Digital Health. AI recognition of patient age, sex, and race from chest x-rays, which raises the possibility of bias in AI models trained using such images.
- Glocker et al (2023), Algorithmic Encoding of Protected Characteristics in Chest X-ray Disease Detection Models, eBiomedicine.
Does encoding of protected characteristics in an AI model necessarily lead to bias? - Brown et al (2023), Detecting Shortcut Learning for Fair Medical AI Using Shortcut Testing, Nature Communications. Method for detecting when shortcut learning is being used
- Zou et al (2023), Implications of Predicting Race Variables from Medical Images, Science.
Quantitative comparisons
- Zhang et al (2022), Improving the Fairness of Chest X-ray Classifiers, CHIL. Comparison of multiple approaches for addressing bias in chest X-ray classification and evaluation using different definitions of fairness.
- Lee et al (2023), An Investigation Into the Impact of Deep Learning Model Choice on Sex and Race Bias in Cardiac MR Segmentation, MICCAI FAIMI workshop. Comparison of bias characteristics of different deep learning models including CNNs and a vision transformer.
- Zong et al (2023), MEDFAIR: Benchmarking Fairness for Medical Imaging, ICLR.
Comparison of many standard fairness methods on ten medical image datasets, spanning chest x-rays, brain MRIs, retinal fundus images, dermatoscopic images, heart CT, lung CT, and SD-OCT.
Applied AI fairness research in medical imaging
AI fairness for chest x-rays
- Larrazabal et al (2020), Gender Imbalance in Medical Imaging Datasets Produces Biased Classifiers for Computer-aided Diagnosis, Proceedings of the National Academy of Sciences (USA).
- Seyyed-Kalantari et al (2021), Underdiagnosis Bias of Artificial Intelligence Algorithms Applied to Chest Radiographs in Under-served Patient Populations, Nature Medicine.
- Zhang et al (2022), Improving the Fairness of Chest X-ray Classifiers, CHIL.
- Lin et al (2023), Improving Model Fairness in Image-based Computer-aided Diagnosis, Nature Communications.
- Glocker et al (2023), Risk of Bias in Chest Radiography Deep Learning Foundation Models, Radiology: Artificial Intelligence.
AI fairness in image reconstruction
- Du et al (2023), Unveiling Fairness Biases in Deep Learning-Based Brain MRI Reconstruction, MICCAI FAIMI
AI fairness for dermatology images
- Abbasi-Sureshjani et al (2020), Risk of Training Diagnostic Algorithms on Data with Demographic Bias, MICCAI.
- Kinyanjui et al (2020), Fairness of Classifiers Across Skin Tones in Dermatology, MICCAI.
- Daneshjou et al (2022), Disparities in Dermatology AI Performance on a Diverse, Curated Clinical Image Set, Science Advances.
- Pakzad et al (2022), CIRCLe: Color Invariant Representation Learning for Unbiased Classification of Skin Lesions, ECCV Workshop on Skin Image Analysis.
- Bencevic et al (2024), Understanding skin color bias in deep learning-based skin lesion segmentation, Computer Methods and Programs in Biomedicine.
- Xu et al (2023), FairAdaBN: Mitigating Unfairness with Adaptive Batch Normalization and Its Application to Dermatological Disease Classification, MICCAI.
AI fairness for brain MRI:
- Petersen et al (2022), Feature Robustness and Sex Differences in Medical Imaging: A Case Study in MRI-Based Alzheimer’s Disease Detection, MICCAI.
- Ioannou et al (2022), A Study of Demographic Bias in CNN-Based Brain MR Segmentation, MICCAI workshop on Machine Learning in Clinical Neuroimaging.
- Wang et al (2023), Bias in Machine Learning Models can be Significantly Mitigated by Careful Training: Evidence from Neuroimaging Studies, Proceedings of the National Academy of Sciences (USA).
- Klingenberg et al (2023), Higher Performance for Women than Men in MRI-based Alzheimer’s Disease Detection, Alzheimer’s Research & Therapy.
AI fairness for cardiac MRI:
- Puyol-Antón et al (2021), Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation, MICCAI.
- Puyol-Antón et al (2022), Fairness in Cardiac Magnetic Resonance Imaging: Assessing Sex and Racial Bias in Deep Learning-Based Segmentation, Frontiers in Cardiovascular Medicine.
- Lee et al (2022), A Systematic Study of Race and Sex Bias in CNN-Based Cardiac MR Segmentation, MICCAI Workshop on Statistical Atlases and Computational Models of the Heart.
- Lee et al (2023), An Investigation Into the Impact of Deep Learning Model Choice on Sex and Race Bias in Cardiac MR Segmentation, MICCAI FAIMI workshop.
AI fairness for ophthalmology:
- Burlina et al (2021), Addressing Artificial Intelligence Bias in Retinal Diagnostics, Translational Vision Science & Technology.
- Lin et al (2023), Improving Model Fairness in Image-based Computer-aided Diagnosis, Nature Communications.
AI fairness for histology:
- Valdya et al (2024), Demographic Bias in Misdiagnosis by Computational Pathology Models, Nature Medicine.
AI fairness for breast DCE-MRI:
- Huti et al (2023), An Investigation Into Race Bias in Random Forest Models Based on Breast DCE-MRI Derived Radiomics Features, MICCAI FAIMI workshop.
AI fairness for medical image segmentation:
- Puyol-Antón et al (2021), Fairness in Cardiac MR Image Analysis: An Investigation of Bias Due to Data Imbalance in Deep Learning Based Segmentation, MICCAI.
- Puyol-Antón et al (2022), Fairness in Cardiac Magnetic Resonance Imaging: Assessing Sex and Racial Bias in Deep Learning-Based Segmentation, Frontiers in Cardiovascular Medicine.
- Lee et al (2022), A Systematic Study of Race and Sex Bias in CNN-Based Cardiac MR Segmentation, MICCAI Workshop on Statistical Atlases and Computational Models of the Heart.
- Lee et al (2023), An Investigation Into the Impact of Deep Learning Model Choice on Sex and Race Bias in Cardiac MR Segmentation, MICCAI FAIMI workshop.
- Ioannou et al (2022), A Study of Demographic Bias in CNN-Based Brain MR Segmentation, MICCAI workshop on Machine Learning in Clinical Neuroimaging.
- Gaggion et al (2023), Unsupervised Bias Discovery in Medical Image Segmentation, MICCAI FAIMI.
- Tian et al (2023), Harvard FairSeg: A Large-Scale Medical Image Segmentation Dataset for Fairness Learning Using Segment Anything Model with Fair Error-Bound Scaling, Arxiv.
- Bencevic et al (2024), Understanding skin color bias in deep learning-based skin lesion segmentation, Computer Methods and Programs in Biomedicine.
- Siddiqui et al (2024), Fair AI-powered orthopedic image segmentation: addressing bias and promoting equitable healthcare, Scientific Reports.
Miscellaneous
- Simoiu et al (2017), The Problem of Infra-Marginality in Outcome Tests for Discrimination, The Annals of Applied Statistics, and Kearns et al (2018), Preventing Fairness Gerrymandering: Auditing and Learning for Subgroup Fairness, ICML: Fairness with respect to one protected attribute can hide aggravated unfairness with respect to another (a.k.a. inframarginality / fairness gerrymandering / subgroup fairness).
- Wick et al (2019), Unlocking Fairness: a Trade-off Revisited, NeurIPS, and Dutta et al (2020), Is There a Trade-Off Between Fairness and Accuracy? A Perspective Using Mismatched Hypothesis Testing, ICML: Observed fairness-accuracy trade-offs may be illusory and purely a result of label bias. Optimizing for fairness may also yield performance-optimal models, even if evaluations on (equally biased) test data suggests otherwise. Also see Sharma et al (2023), On Testing and Comparing Fair classifiers under Data Bias, arxiv, on this subject.
- Lazar Reich and Vijaykumar (2020), A Possibility in Algorithmic Fairness: Can Calibration and Equal Error Rates be Reconciled?, FORC: Contrary to popular belief, equalized odds (i.e., equal TPR and FPR) and calibration by groups are compatible.
- Wachter et al (2021), Bias Preservation in Machine Learning: The Legality of Fairness Metrics Under EU Non-Discrimination Law, West Virginia Law Review, and Wachter et al (2023), The Unfairness of Fair Machine Learning: Levelling Down and Strict Egalitarianism by Default, Michigan Technology Law Review: A legal perspective on AI fairness and the “levelling down” phenomenon in “fair” machine learning.
- Mukherjee et al (2022), Confounding Factors Need to be Accounted for in Assessing Bias by Machine Learning Algorithms, Nature Medicine.
- Schrouff et al (2022), Diagnosing Failures of Fairness Transfer Across Distribution Shift in Real-world Medical Settings, NeurIPS. Are bias mitigation strategies robust to real-world domain shifts?
- Zhao and Gordon (2022), Inherent Tradeoffs in Learning Fair Representations, JMLR. Theoretical analysis of how group-invariant representations and statistical/demographic parity hurt accuracy in the presence of base rate differences between groups.
- Ricci Lara et al (2023), Towards Unraveling Calibration Biases in Medical Image Analysis, MICCAI FAIMI workshop, and Petersen et al (2023), On (Assessing) the Fairness of Risk Score Models, FAccT: Standard calibration error metrics (such as ECE) are biased with respect to the evaluation sample size, which must be taken into account when comparing calibration between (protected) groups of different sizes.
- Jones et al (2023), No Fair Lunch: A Causal Perspective on Dataset Bias in Machine Learning for Medical Imaging, arXiv.
Software toolkits
Although one can investigate fairness issues with standard software environments and packages, a number of researchers have made specialised toolkits aimed at facilitating fairness and bias assessments, and you may find it more efficient to make use of one of these.
- AI Fairness 360, Bellamy et al (2018). Initially created by IBM, now independent.
- FairLearn, Bird et al (2020). Initially created by Microsoft, now independent.
- Aequitas, Saleiro et al (2018). Open source bias audit toolkit for machine learning developers (not imaging).
- MEDFAIR, Zong et al (2023), ICLR. Fairness benchmarking suite for medical imaging.
- FairMedFM, Jin et al (2024), NeurIPS. Fairness benchmarking suite for medical imaging foundation models.
Initiatives, guidelines and legislation
Below are some resources related to data collection and research initiatives, guidelines on fairness in AI and information about government efforts to legislate on the use of AI, many of which include reference to fairness and bias.
Initiatives:
- STANDING Together, Ganapathi et al (2022), Nature Medicine. Promotes the formation of inclusive, diverse and transparent medical datasets.
- “All of Us” research programme, All of Us Research Program Investigators (2019), NEJM. US initiative to acquire diverse medical data.
- Fairness of AI in Medical Imaging (FAIMI): that’s us, an independent academic initiative aimed at exploring and promoting fair AI in medical imaging.
Guidelines:
- FUTURE-AI aims to establish guidelines for AI in healthcare, including fairness as a key principle.
- Algorithmic Bias Playbook (Obermeyer et al, 2021): high-level discussion addressing aspects such as label choice bias.
Legislation/white papers on regulation of AI:
- Global AI Legislation Tracker
- European Commission, “Proposal for a Regulation of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act) and Amending Certain Union Legislative Acts,” 2021
- UK Government Department for Science Innovation and Technology and Office for Artificial Intelligence, “A Pro-innovation Approach to AI Regulation,” 2023
- USA Government, “Blueprint for an AI Bill of Rights,” 2023
- State-by-state summary of AI legislation can be found here
- ISO has a technical standard on AI bias, IEEE has one under development, NIST has a proposal document as well
Datasets
Unfortunately, most currently available databases of medical imaging data do not feature associated demographic information such as sex and race, which is essential for much work in fairness of AI. Below we have put together a summary of the most commonly used datasets that do feature such information.
- UK Biobank. Database of half a million patients from the UK population including approximately 100,000 with imaging consisting of heart, brain and abdominal MRI together with a wide range of demographic information. Available worldwide to approved projects upon payment of administration fee.
- ADNI, Mueller et al (2005), Neuroimaging Clinics of North America: Multisite study of brain imaging (MRI), biochemical, and genetic data with Alzheimer’s diagnosis status and demographic information including race and sex.
- ISIC challenge datasets. The International Skin Imaging Collaboration (ISIC) runs yearly challenges for AI processing of dermatology images. Some of these have associated skin tone information.
- PAD-UFES, Pacheco et al (2020), Data in Brief: skin lesion dataset, dermatology images with 22 clinical parameters including age and Fitzpatrick skin type.
- Diverse Dermatology Images, Daneshjou et al (2022), Science Advances. Database of 656 dermatology images from 570 unique patients, approximately balanced by skin tone. Expert annotations of lesion diagnosis and Fitzpatrick skin tone. Freely available.
- Fitzpatrick 17k dataset, Groh et al (2021), CVPR workshops. Database of 17k dermatology images with algorithmically determined Fitzpatrick skin tone data.
- NIH chest X-ray dataset. 112,120 chest X-ray images from 30,805 patients, labelled with 14 common thorax diseases and demographic information including age and sex.
- CheXpert chest X-ray dataset, Irvin et al (2019), AAAI. 224,316 chest X-rays of 65,240 patients with age and sex information.
- PadChest, Bustos et al (2020). Chest X-ray dataset, 160,000 chest X-ray images from 67,000 patients, includes age.
- Duke-Breast-Cancer-MRI, Saha et al (2018), British Journal of Cancer. Dataset of dynamic contrast-enhanced MRI images of women with breast cancer, includes images, derived radiomics, tumour segmentations and patient demographics including race. Freely available upon registration.
- OASIS Series of brain MR datasets with age, gender information.
- Fairseg, Tian et al (2023). Public dataset of scanning laser ophthalmoscopy fundus images for assessment of bias in segmentation of the optic disc and cup
Talks
2016
2019
2021
- Marzyeh Ghassemi, “The Fairest of Them All: Privacy, Data and Machine Learning for Health”
- Eran Tal, “Accuracy and Fairness in Machine Learning: Lessons From Measurement”
- Natalia Martinez, “Blind Pareto Fairness and Subgroup Robustness”
- Nithya Sambasivan, “Reimagining ML Fairness in India and Beyond”
- Abeba Birhane, “The Limits of Fairness”
- Sara Gerke, “Legal Issues of Artificial Intelligence in Healthcare in the US”
2022
- FAIMI 2022 Online Symposium talks
- Judy Wawira Gichoya, “Hidden in Plain Sight: An Update to the Reading Race Project”
- Sanmi Koyejo, “Algorithmic Fairness: Why it’s Hard, and Why it’s Interesting”, CVPR Tutorial: Part 1, Part 2
- Jessica Schrouff, “Maintaining Fairness Under Distributions Shift”
2023
- FAIMI 2023 Online Symposium talks
- MICCAI FAIMI 2023 Workshop talks
- Can AI Be Harmful? A Conversation with MIT’s Dr. Marzyeh Ghassemi on the NEJM AI Grand Rounds Podcast
- The Double-Edged Sword of AI, with Dr. Ziad Obermeyer on the NEJM AI Grand Rounds Podcast
- Karim Lekadir, “In AI we trust? Towards ethical AI in medical imaging.”
- Bias in AI: Origins and Solutions feat. Dr. Joy Buolamwini on ASK MORE OF AI with Clara Shih