tech

UK Biobank Health Data Leaked Online Multiple Times

Mar 14, 20268 min read1 revision1,694 words

TL;DR

An investigation has revealed that health data from the UK Biobank — one of the world's most important biomedical research databases containing records on 500,000 volunteers — was accidentally posted online by researchers on multiple occasions, prompting roughly 80 legal takedown notices to GitHub in the second half of 2025 alone. The exposures, combined with a separate controversy over data sharing with insurance companies and claims by an extremist 'race science' network of unauthorized access, have triggered a broader crisis of trust around the governance of the biobank that underpins thousands of medical studies worldwide.

The UK Biobank is one of the crown jewels of modern medical research — a vast repository of genetic, medical, and lifestyle data volunteered by half a million Britons to help scientists fight cancer, dementia, diabetes, and heart disease. Over 20,000 researchers in more than 70 countries rely on it. More than 14,000 peer-reviewed papers have drawn from its datasets .

But an investigation has revealed a pattern of data exposures that strikes at the foundation of the project: the trust of the people who donated their most intimate health information.

The Leaks: 80 Takedown Notices in Six Months

Between July and December 2025, UK Biobank filed approximately 80 legal takedown notices to GitHub after discovering that researchers with approved data access had inadvertently uploaded sensitive participant data to public repositories on the code-sharing platform . The exposed datasets included hospital diagnosis records, medical procedure histories, and demographic details such as gender and birth month and year for more than 600,000 study participants.

The mechanism was banal but devastating in its implications. Researchers routinely use GitHub to share computer code with collaborators. In the process, some uploaded the raw data files alongside their scripts — effectively publishing confidential medical records on the open internet .

UK Biobank Chief Executive Professor Sir Rory Collins emphasized that the data shared with researchers "does not contain any identifying information about individuals," noting that names, addresses, and NHS numbers are never included in research datasets . The organization has since implemented additional researcher training, published detailed guidance on repository management, and developed a dedicated "UKB Git Audit Tool" designed to scan repositories for accidental data exposure — including in deleted files .

The Re-Identification Problem

But privacy experts warn that the absence of names does not mean anonymity. Luc Rocher, an associate professor at the Oxford Internet Institute, cautioned that "even limited data can reveal sensitive health details if records are linked correctly" . When datasets include a combination of birth dates, hospital diagnoses, and medical procedures, cross-referencing with other publicly available information can, in principle, allow individuals to be identified.

This is not a hypothetical risk. Research from Harvard's Data Privacy Lab has demonstrated that a surprisingly small number of demographic attributes can uniquely identify a large proportion of individuals in a population . In the context of the UK Biobank — where participants are drawn from a defined geographic and age cohort (40-69 years old at enrollment) — the combination of birth month and year with specific diagnosis codes could narrow the field considerably.

The UK's Information Commissioner's Office (ICO) data shows that the health sector already leads all industries in self-reported personal data breaches, recording 3,820 cases between 2023 and the first quarter of 2025 . The UK Biobank exposures add a novel dimension to this landscape: breaches caused not by hackers or system failures, but by the very researchers entrusted with the data.

UK Health Sector Data Breaches Reported to ICO (2023-2025)

Source: UK Information Commissioner's Office

Data as of Jun 30, 2025CSV

A Pattern of Trust Erosion

The GitHub leaks are only the latest in a series of controversies that have dogged UK Biobank's data governance in recent years.

Insurance Companies and Broken Promises

In November 2023, The Observer reported that UK Biobank had approved data access for insurance sector firms between 2020 and 2023 — companies building digital tools to help insurers predict a person's risk of developing chronic disease . This was striking because UK Biobank's own website had previously stated, in its FAQ section: "Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data" .

UK Biobank countered that this commitment applied only to identifiable data, not the de-identified datasets shared with researchers . Professor Yves Moreau, a genetics and AI expert at KU Leuven, called the distinction a "serious and disturbing breach of trust" . Under mounting pressure, UK Biobank announced in January 2025 that it would no longer approve applications from insurance companies for direct access to de-identified data .

Race Science Claims

In October 2024, The Guardian published an investigation based on undercover footage obtained by anti-racism group Hope Not Hate, in which members of a far-right network — including individuals associated with the Human Diversity Foundation — claimed to have obtained "a large amount" of UK Biobank data . One individual, Matthew Frost, was recorded saying his team had "managed to get access to the UK Biobank," while another, Simon Wright, claimed: "it's me who's got the UK Biobank downloaded" .

UK Biobank conducted an investigation — including a third-party search of the internet and dark web — and concluded that the claims were "unfounded," stating that none of the named individuals had ever been granted data access . Professor Collins suggested they were likely discussing publicly available summary statistics, not participant-level data . But the episode deepened unease among participants and the scientific community about the adequacy of access controls.

Katie Bramall-Stainer of the British Medical Association called for stricter regulations, questioning "how, when, where, why, with whom, and for what purpose confidential data was shared" .

The Consent Question

A new academic paper has brought the most fundamental challenge yet. Published in the journal Medicine, Health Care and Philosophy, Dr. Gulzaar Barn of the University of Amsterdam argues that UK Biobank's broad consent model systematically obscures the true range of data uses from participants .

Barn contends that the consent documents frame participation in terms of disease research and treatment, when the actual applications extend far beyond that — to insurance risk modeling, direct-to-consumer genetic testing by companies like 23andMe, and even the development of polygenic scores designed to predict "intelligence" for commercial embryo screening services .

"Such tools are rife with risk of harm and are being deployed without sufficient public deliberation or oversight," Barn wrote . She argues that UK Biobank has failed to adequately safeguard against "dual use" problems — where approved research outputs are subsequently repurposed for applications that fall outside the public interest.

The paper also highlights a structural asymmetry in the consent framework: while participants can technically withdraw from the study, their data cannot be removed retroactively from research that has already used it . For the 500,000 volunteers who enrolled between 2006 and 2010, this means their data's ultimate uses were effectively unknowable at the time they consented.

The Shadow of 23andMe

The UK Biobank controversies unfold against the backdrop of a broader crisis in genomic data governance. In October 2023, genetic testing company 23andMe suffered a massive data breach affecting 6.9 million user profiles, after hackers used credential stuffing to access accounts . The breach, which initially targeted data categorized by ethnic group — Ashkenazi Jewish and Chinese users — demonstrated the unique dangers of genomic data exposure: unlike a compromised password, DNA cannot be changed.

The fallout was catastrophic. 23andMe paid a $30 million settlement to affected users, was fined £2.31 million by the UK's ICO, and ultimately declared bankruptcy in March 2025 . As part of its bankruptcy proceedings, the company moved to auction the DNA profiles of 15 million people — raising alarm among regulators worldwide about the permanence of genetic data once it leaves a controlled environment.

UK Biobank operates under a fundamentally different model — it is a nonprofit research resource, not a commercial enterprise, and its data access controls are far more rigorous than 23andMe's consumer platform. But the 23andMe collapse has heightened public sensitivity to how genetic and health data is managed, making the Biobank's own lapses more politically charged.

Scale Versus Security

At the heart of the UK Biobank's dilemma is a tension that has no easy resolution. The database's enormous value to medical science comes precisely from its scale and accessibility. Over 20,000 researchers across more than 70 countries are currently using its data, contributing to studies on everything from Alzheimer's disease to the genetic underpinnings of common cancers . In 2024 alone, the UK government and Amazon Web Services committed £16 million in new funding, including £8 million worth of cloud computing resources .

UK Biobank Research Growth (2018-2024)

Source: UK Biobank Annual Reports

Data as of Dec 31, 2024CSV

But that very openness creates surface area for error. The more researchers who download datasets — even to the supposedly secure UK Biobank Research Analysis Platform — the greater the probability that someone, somewhere, will accidentally commit a data file to a public repository. And with 80 takedown notices filed in just six months, it is clear that training and guidelines alone have not been sufficient.

UK Biobank has taken concrete steps: the UKB Git Audit Tool, new repository management guidance, legal takedown processes, and the ban on insurance company access . But critics argue these are reactive measures that do not address the structural issue. As Dr. Barn's paper suggests, the more fundamental question is whether a consent framework designed in 2006 — before the era of AI-driven polygenic scoring, commercial embryo screening, and ubiquitous data linkage — can adequately protect participants in 2026 .

What Comes Next

The UK Biobank remains an irreplaceable scientific resource. Its data has contributed to breakthroughs in understanding conditions that affect millions of people, and the overwhelming majority of its 20,000-plus researchers handle data responsibly and ethically. Professor Collins and his team have been transparent about the GitHub exposures and moved quickly to address them.

But the accumulation of controversies — accidental leaks, insurance company access, extremist claims, and fundamental questions about consent — paints a picture of an institution struggling to reconcile its founding promise to participants with the realities of a rapidly evolving data landscape. The 500,000 volunteers who gave their blood, their scans, and their medical histories did so on the understanding that their data would be used to fight disease, not to train insurance algorithms or screen embryos.

Whether UK Biobank can rebuild that trust may depend not just on better tools and tighter policies, but on a willingness to revisit the basic bargain it struck with its participants two decades ago — and to ask whether the consent they gave then still means what they thought it meant.

Sources (15)

[1]
Use Our Data - UK Biobankukbiobank.ac.uk
Over 22,000 researchers from more than 70 countries use UK Biobank data, with more than 18,000 peer-reviewed publications.
[2]
Health Data from UK Biobank Exposed Online in Multiple Leaksthenews.com.pk
UK Biobank issued approximately 80 takedown notices between July and December 2025 after researchers accidentally uploaded hospital diagnosis data for 600,000+ participants to GitHub.
[3]
Repository Management Best Practices for Sensitive Data - UK Biobankcommunity.ukbiobank.ac.uk
UK Biobank released detailed guidance on repository management and developed the UKB Git Audit Tool to scan for accidental data exposure in code repositories.
[4]
Erosion of Anonymity: Mitigating the Risk of Re-identification of De-identified Health Datahealthlawadvisor.com
Research demonstrates that even de-identified datasets may be re-identified when combined with other datasets, as shown by Harvard Data Privacy Lab studies.
[5]
Health Sector Tops UK Self-Reported Data Breaches in 2023-2025securitybrief.co.uk
The health sector recorded 3,820 self-reported personal data breaches to the ICO between 2023 and Q1 2025, the highest of any industry.
[6]
UK Biobank Shared Health Data with Insurance Companiesfreevacy.com
The Observer reported that UK Biobank approved data access for insurance sector firms between 2020 and 2023, despite previous public commitments not to share data with insurers.
[7]
Response to Highly Misleading Article in The Observer - UK Biobankukbiobank.ac.uk
UK Biobank responded that commitments about insurance company access referred to identifiable data, not the de-identified research datasets that were shared.
[8]
Protecting the Data - UK Biobankukbiobank.ac.uk
UK Biobank complies with ISO/IEC 27001 standards, employs robust firewalls, and has independent security consultants regularly test its systems.
[9]
UK Biobank Denies Claims of Data Breach by 'Race Science' Groupcomputing.co.uk
UK Biobank conducted an investigation after Hope Not Hate undercover footage captured far-right network members claiming to have accessed biobank data.
[10]
A Message to Our Participants: Unfounded Claims in The Guardian - UK Biobankukbiobank.ac.uk
UK Biobank stated that claims of unauthorized data access were unfounded, with no evidence found on the internet or dark web of misused data.
[11]
Consent and Its Discontents: The Case of UK Biobanklink.springer.com
Dr. Gulzaar Barn argues UK Biobank's broad consent model obscures the true range of data uses, including insurance risk modeling and polygenic scoring for embryo screening.
[12]
UK Biobank Projects May Breach Trust of Thousands of Participantsmedicalxpress.com
A philosopher at the University of Amsterdam finds that UK Biobank datasets have been processed toward ends inimical to stated aims, breaking consent terms.
[13]
The 23andMe Breach: Anatomy, Impact, and Lessons for Genomic Securitysekurno.com
The 23andMe breach exposed 6.9 million profiles via credential stuffing, led to a $30M settlement, and ultimately contributed to the company's bankruptcy.
[14]
23andMe Data Leaken.wikipedia.org
23andMe declared bankruptcy in March 2025 after the data breach, with its DNA profiles of 15 million users being auctioned as part of proceedings.
[15]
UK Biobank's Wrap Up 2024ukbiobank.ac.uk
In 2024, UK Biobank surpassed 14,000 peer-reviewed publications, 20,000 active researchers, 90,000 imaging participants, and secured £16M in new funding.

UK Biobank Health Data Leaked Online Multiple Times

The Leaks: 80 Takedown Notices in Six Months

The Re-Identification Problem

A Pattern of Trust Erosion

Insurance Companies and Broken Promises

Race Science Claims

The Consent Question

The Shadow of 23andMe

Scale Versus Security

What Comes Next

Related Stories

Anthropic Accidentally Removes Thousands of GitHub Repos While Targeting Leaked Source Code

Lloyds Bank IT Failure Affects Nearly 500,000 Customers

UK Tax Authority to Deploy AI to Detect Fraud and Tax Return Errors

Large-Scale Brain Scan Study Reveals Cognitive Effects of Long-Term Cannabis Use

Study Finds Long-Term Cannabis Use May Cause Physical Changes to Brain Structure

Sources (15)