AI EducademyAIEducademy
Get Started
AI EducademyAIEducademy

MIT Licence. Open Source

Learn

  • Academics
  • Lessons
  • Lab

Community

  • GitHub
  • Contribute
  • Code of Conduct
  • About

Support

  • Buy Me a Coffee โ˜•
AI & Engineering Academicsโ€บ๐ŸŒฒ AI Forestโ€บLessonsโ€บResponsible AI Governance
๐Ÿ›ก๏ธ
AI Forest โ€ข Advancedโฑ๏ธ 19 min read

Responsible AI Governance

Responsible AI Governance

Responsible AI is not a checkbox exercise. It is an engineering discipline that requires the same rigour as security or reliability. Companies that treat ethics as an afterthought end up in headlines โ€” Amazon's biased hiring tool, healthcare algorithms that deprioritised Black patients, and facial recognition systems that failed on darker skin tones are not edge cases. They are the predictable result of shipping without governance.

Governance Frameworks

Two frameworks dominate the industry:

NIST AI Risk Management Framework (AI RMF)

The US National Institute of Standards and Technology provides a voluntary, structured approach built around four functions:

  1. Govern โ€” Establish policies, roles, and accountability structures
  2. Map โ€” Identify and document AI risks in context
  3. Measure โ€” Assess risks using quantitative and qualitative methods
  4. Manage โ€” Prioritise and mitigate identified risks

The AI RMF is not prescriptive โ€” it does not tell you what to measure. Instead, it provides a thinking framework that organisations adapt to their specific context.

ISO/IEC 42001

The first international standard for an AI Management System (AIMS). Unlike the NIST framework, ISO 42001 is certifiable โ€” third-party auditors can verify compliance. It covers the entire AI lifecycle: from organisational context and leadership commitment through risk assessment, development controls, and continual improvement.

Certification signals to customers, regulators, and partners that your AI governance is externally validated โ€” increasingly important for enterprise sales.

Framework diagram showing the four pillars of AI governance: documentation, fairness, explainability, and accountability
Effective AI governance rests on four pillars โ€” each requires dedicated tooling and processes.

Model Cards and Datasheets

Documentation is the foundation of governance. Two formats have become industry standard:

Model cards (introduced by Google) document a model's intended use, evaluated performance across subgroups, ethical considerations, and known limitations. Every model you deploy should have one.

Datasheets for datasets document how training data was collected, what it contains, known biases, recommended uses, and maintenance plans. If you cannot describe your training data, you cannot govern the model trained on it.

These are not bureaucratic overhead. They are the difference between "we did not know the model was biased" and "we documented the known limitations and implemented mitigations."

๐Ÿคฏ
Google's original Model Cards paper (2019) found that simply requiring documentation caused teams to discover and fix bias issues they had not previously noticed โ€” the act of writing it down forced critical thinking.

Bias Auditing Pipelines

Bias auditing should be automated and run on every model version, not performed once at launch and forgotten.

A production bias auditing pipeline:

  1. Define protected attributes โ€” Gender, ethnicity, age, disability, and other legally protected characteristics relevant to your domain
  2. Slice evaluation data โ€” Break your test set into subgroups by protected attributes
  3. Compute fairness metrics โ€” Measure performance disparities across groups (see below)
  4. Set thresholds โ€” Define acceptable disparity limits (e.g., no subgroup accuracy below 85%)
  5. Gate deployments โ€” Block model promotion to production if thresholds are violated
  6. Log and track trends โ€” Monitor fairness metrics over time, not just at deployment
๐Ÿง Quick Check

When should bias auditing be performed in the ML lifecycle?

Fairness Metrics

There is no single definition of "fair." Different metrics encode different philosophical positions, and they are often mathematically incompatible:

  • Demographic parity โ€” Each group receives positive outcomes at roughly equal rates. Simple but can conflict with accuracy.
  • Equalised odds โ€” True positive and false positive rates are equal across groups. Stronger than demographic parity.
  • Predictive parity โ€” Precision (positive predictive value) is equal across groups.
  • Individual fairness โ€” Similar individuals receive similar predictions, regardless of group membership.

The critical insight: you cannot simultaneously satisfy all fairness criteria (Chouldechova's impossibility theorem). You must choose which definition of fairness matters most for your specific application and document why.

๐Ÿค”
Think about it:A healthcare algorithm must allocate limited specialist appointments. Demographic parity would give equal appointment rates across racial groups. Equalised odds would ensure the algorithm is equally accurate at identifying who needs care. These criteria conflict. Which would you choose, and how would you justify that decision to affected communities?

Explainability Tools

Stakeholders โ€” users, regulators, and internal teams โ€” need to understand why a model made a specific prediction.

SHAP (SHapley Additive exPlanations)

Based on game theory, SHAP assigns each feature an importance value for a specific prediction. It answers: "How much did each input feature contribute to this particular output?" SHAP values are additive โ€” they sum to the difference between the model's prediction and the baseline.

LIME (Local Interpretable Model-agnostic Explanations)

LIME explains individual predictions by fitting a simple, interpretable model (like linear regression) to the local neighbourhood of the input. It works with any model but can be unstable โ€” small input changes sometimes produce very different explanations.

Attention Visualisation

For transformer models, attention weights show which input tokens the model "attended to" when producing each output. Useful for debugging but controversial as a faithful explanation โ€” attention does not always correlate with actual feature importance.

๐Ÿง Quick Check

What fundamental limitation applies to fairness metrics according to Chouldechova's impossibility theorem?

AI Ethics Boards

Many organisations establish ethics boards to review AI systems. Done well, they provide genuine oversight. Done poorly, they become performative:

What works:

  • Diverse membership (technical, legal, ethicists, impacted community representatives)
  • Real authority to delay or block deployments
  • Transparent decision-making with published criteria
  • Regular review cadence, not just launch-time review

What fails:

  • Advisory-only boards with no power to enforce decisions
  • Homogeneous membership (all technologists, no domain experts or community voices)
  • Meeting only when controversy erupts, not proactively
  • Using the board's existence as PR whilst ignoring its recommendations

Google dissolved its AI ethics board after one week in 2019 due to controversy over member selection. Microsoft and other firms have since moved towards distributed responsibility models where ethics review is embedded in product teams rather than centralised.

Incident Response for AI Failures

AI systems will fail. The question is whether you have a plan:

  1. Detection โ€” Monitoring alerts on fairness metrics, output quality, or user reports
  2. Triage โ€” Severity classification: is this a minor output issue or systematic harm?
  3. Containment โ€” Roll back the model, add guardrails, or disable the feature
  4. Investigation โ€” Root cause analysis of the data, model, or system that caused the failure
  5. Remediation โ€” Fix the underlying issue, retrain if necessary, update documentation
  6. Communication โ€” Inform affected users and stakeholders transparently
  7. Prevention โ€” Update auditing pipelines and testing to catch similar issues
๐Ÿคฏ
Amazon scrapped its AI recruiting tool in 2018 after discovering it systematically penalised CVs containing the word "women's" โ€” the model had learned from a decade of male-dominated hiring data. The tool was never deployed externally, but it operated internally for a year before the bias was caught.

Case Studies of Governance Failures

  • Amazon hiring tool (2018) โ€” Trained on historical hiring data that reflected gender bias. The model penalised female applicants. Lesson: biased training data produces biased models, regardless of model sophistication.
  • Optum healthcare algorithm (2019) โ€” Used healthcare spending as a proxy for health needs. Because Black patients historically had less access to healthcare, the algorithm systematically deprioritised them. Lesson: proxy variables can encode structural inequality.
  • COMPAS recidivism tool โ€” ProPublica's analysis showed the system was twice as likely to falsely flag Black defendants as future criminals. Lesson: aggregate accuracy can mask severe disparities across subgroups.
๐Ÿง Quick Check

Why did the Optum healthcare algorithm disproportionately deprioritise Black patients?

๐Ÿค”
Think about it:You are setting up a responsible AI programme from scratch at a 500-person company shipping ML products. You have budget for three hires. What roles would you fill first, and what processes would you implement in the first 90 days?

๐Ÿ“š Further Reading

  • NIST AI Risk Management Framework โ€” The complete NIST AI RMF with interactive profiles and use cases
  • Model Cards for Model Reporting (Mitchell et al.) โ€” The original paper introducing model cards as a documentation standard
  • Fairness and Machine Learning (Barocas, Hardt, Narayanan) โ€” Free online textbook covering the mathematics and philosophy of ML fairness
Lesson 10 of 100 of 10 completed
โ†AI Infrastructure
All Programsโ†’