Researchers Attempt to Reduce the Universal Genetic Code from 20 to 19 Amino Acids
TL;DR
Researchers at Columbia, MIT, and Harvard have used AI-guided protein design to engineer an E. coli strain whose ribosomal machinery operates without isoleucine, one of the 20 amino acids universal to all known life. The work, published in Science in 2026, raises fundamental questions about the malleability of the genetic code and carries implications for biocontainment, synthetic biology commercialization, and biosafety regulation.
Every known living cell on Earth translates its DNA using the same set of 20 amino acids — molecular building blocks that have remained unchanged for roughly four billion years. Now, a team spanning Columbia University, MIT, and Harvard has taken a first step toward breaking that universality, engineering an Escherichia coli bacterium whose core protein-synthesis machinery functions without one of those 20 building blocks .
The research, published in Science in 2026, is not merely an academic exercise. It sits at the intersection of origin-of-life research, AI-driven protein engineering, industrial biotechnology, and biosafety — and the answers it generates will shape how regulators, investors, and scientists think about rewriting the most fundamental rules of biology.
The Target: Isoleucine and Its 3 Codons
The amino acid selected for removal is isoleucine, encoded by three codons (AUU, AUC, AUA) in the standard genetic code. The team chose isoleucine because it is structurally similar to two other amino acids — valine and leucine — making it a plausible candidate for substitution . All three are branched-chain amino acids with overlapping chemical properties, and comparative genomic analyses had already shown that isoleucine is the amino acid most frequently swapped for alternatives across evolutionary lineages .
Harris H. Wang, a systems and synthetic biologist at Columbia University Irving Medical Center who led the study, and his collaborators initially tried a brute-force approach: replacing every isoleucine residue with valine or leucine across 39 essential genes. The result was an organism with roughly 40% of wild-type fitness — alive, but barely .
AI to the Rescue
The breakthrough came from applying multiple AI protein-design tools in combination. The team used sequence-based models — ESM2 and MSA Transformer — to identify evolutionarily plausible mutations, then verified structural integrity with AlphaFold2 and ProteinMPNN . These tools did more than simple one-for-one swaps. They proposed compensatory mutations at neighboring positions and even at distant sites that interact with the edited residue in the folded protein, recovering structural stability that naive substitution destroyed.
The final strain, designated Ec19, carries 21 ribosomal proteins completely free of isoleucine out of the ribosome's 52 protein subunits. The remaining 31 subunits were individually redesigned and validated as isoleucine-free, but the team could not yet combine all of them into a single organism . In total, the researchers purged 382 isoleucine residues from ribosomal proteins.
The strain is robust: fitness stays above 90% of wild-type E. coli, and natural selection did not revert the changes over 450 generations of growth . "Maybe these machine-learning systems know aspects of biology we can experimentally verify but don't yet understand," Wang said .
The Scale of the Problem — and the Precedent
Ec19's ribosomal achievement is substantial, but it represents only a fraction of the full challenge. The rest of the E. coli genome still contains more than 81,000 isoleucine residues across thousands of other proteins . Eliminating isoleucine genome-wide would require editing on a scale that dwarfs even the most ambitious prior recoding projects.
The closest precedent is Syn61, the synthetic E. coli genome created in 2019 by Jason Chin's group at the MRC Laboratory of Molecular Biology in Cambridge, UK. That project replaced every instance of two serine codons (TCG and TCA) and one stop codon (TAG) with synonymous alternatives — a total of 18,214 codon substitutions across a 4-megabase genome . The result was an organism that used only 61 of the 64 possible codons, freeing three codons for reassignment.
Syn61 was a synonymous recoding: serine was still present, just spelled differently. Removing an amino acid entirely is a qualitatively different challenge. Every isoleucine-to-valine swap changes the protein's chemistry, not just its spelling. The number of required edits — potentially exceeding 81,000 — and the need for AI-guided compensatory mutations at each site make this a project of unprecedented complexity.
Tom Ellis, a synthetic biologist, called the work "a tour de force of synthetic biology" and noted that "AI-enabled protein modeling has advanced dramatically in seven years" since earlier attempts .
Applications: Biocontainment and Non-Canonical Amino Acids
The practical appeal of a reduced-alphabet organism centers on two applications: biocontainment and non-canonical amino acid (ncAA) incorporation.
An organism that cannot produce or use isoleucine would be unable to survive outside a controlled environment where isoleucine is supplied — or, more powerfully, where an artificial substitute is provided. This metabolic dependency creates a built-in kill switch. Previous work has shown that organisms engineered to require non-canonical amino acids for essential protein function exhibit "unprecedented resistance to evolutionary escape via mutagenesis" . Recoded genomes also obstruct horizontal gene transfer: incoming DNA using the standard genetic code cannot be properly translated in a recoded host, and the host's genes are similarly unreadable by wild organisms .
Beyond containment, freeing codons from their natural assignments allows them to be reassigned to non-canonical amino acids — synthetic building blocks not found in nature. This is the commercial prize. The global market for unnatural amino acids was valued at approximately $2.16 billion in 2024 and is projected to reach $5.03 billion by 2032, growing at a compound annual rate of 11.15% . Pharmaceutical and biotechnology companies account for over 48% of this market, driven by demand for protein-based therapeutics with enhanced properties .
A 19-amino-acid chassis organism would, in principle, offer three freed codons (those formerly encoding isoleucine) that could be repurposed for ncAA incorporation without competition from the native translation system. Current methods for ncAA incorporation face efficiency losses because engineered and native translation machinery compete for the same codons. Eliminating that competition could reduce production costs and improve yields, though no peer-reviewed study has yet quantified the cost savings for an isoleucine-free platform specifically.
Fitness: What 90% Means — and What It Doesn't
The 90%-of-wild-type fitness figure for Ec19 sounds reassuring, but several caveats apply. First, the measurement covers ribosomal proteins only — roughly 382 out of more than 81,000 isoleucine residues genome-wide. Scaling the approach across the entire genome could produce compounding fitness deficits that are not predictable from the ribosomal subset alone.
Second, laboratory fitness assays typically measure growth rate under optimal conditions: rich media, constant temperature, no competitors, no phages. Industrial bioreactors impose fluctuating temperatures, nutrient limitation, osmotic stress, and contamination pressure. Whether a 10% growth deficit in the lab translates to a manageable or prohibitive disadvantage at industrial scale remains untested.
Third, the 450-generation stability is encouraging but short by evolutionary standards. Industrial fermentation campaigns can run for thousands of generations, and the selective pressure to revert engineered mutations intensifies under stress. The absence of reversion over 450 generations in benign conditions does not guarantee stability under industrial duress.
The Evolutionary Question: Frozen Accident or Optimized Design?
The research reopens one of biology's oldest debates: why 20 amino acids? In 1968, Francis Crick proposed that the universal genetic code was a "frozen accident" — the 20 amino acids were locked in not because they were optimal, but because any change would disrupt every protein simultaneously, making further evolution impossible .
Subsequent work has challenged the "accident" half of that description. Andrew Doig, in a 2017 analysis published in The FEBS Journal, applied selection criteria to the 20 standard amino acids and concluded that "there are excellent reasons for the selection of every amino acid" . The set appears near-optimal for forming soluble protein structures with close-packed hydrophobic cores and ordered binding pockets. Other researchers have noted that the standard amino acid set covers chemical property space — charge, size, hydrophobicity, hydrogen bonding — more evenly than random sets of similar size .
If the 20-amino-acid set is in fact highly optimized rather than arbitrary, removing one member could have consequences that laboratory fitness assays are not designed to detect. Protein folding is a cooperative process: the stability of a folded protein depends on the collective interactions of all its residues. Subtle destabilization — proteins that fold correctly but unfold slightly faster under heat, or that aggregate at marginally higher rates — might not reduce growth rate on a plate but could cause cascading failures in a bioreactor or under environmental stress.
The Wang team's results offer some evidence against the strongest version of this concern: the fact that 21 ribosomal proteins function without isoleucine at all, and the remainder tolerate its removal with AI-guided compensatory mutations, suggests that the code has more slack than the "perfectly optimized" view predicts . But evolutionary biologists point out that the ribosome is one of the most conserved molecular machines in biology — if any system were tolerant of amino acid removal, it would be one that has been under intense selection for billions of years.
Biosafety and Regulation: An Unsettled Framework
No existing regulatory framework explicitly addresses organisms with a fundamentally altered genetic code. In the United States, genetically modified organisms fall under the Coordinated Framework for the Regulation of Biotechnology, established in 1986 and last updated in 2017, which divides oversight among the EPA, FDA, and USDA . But these agencies' mandates were designed around organisms that receive transgenes — foreign DNA inserted into an otherwise standard genome — not organisms whose core translational machinery has been rewritten.
Internationally, the Cartagena Protocol on Biosafety, effective since 2003, governs transboundary movement of living modified organisms . Discussion of synthetic biology has been underway for nearly a decade under the Convention on Biological Diversity, but no specific provisions address organisms with compressed or reduced genetic codes . Regulators in developing countries, still implementing the Cartagena Protocol at the national level, face particular challenges in assessing risks from organisms that fall outside established categories .
The biocontainment properties of recoded organisms — their inability to exchange genetic material with wild organisms and their dependence on synthetic nutrients — are often cited as a safety feature. Research has shown that genomically recoded organisms are "virtually recalcitrant to horizontal gene transfer," with conjugative plasmid transfer impaired up to 100,000-fold and viral resistance demonstrated at titers up to 10^11 PFU/mL . But critics argue that this resistance has been tested under laboratory conditions, not in the complex microbial ecosystems of soil, water, or the human gut. No independent biosafety review has stress-tested a recoded organism's containment under conditions designed to maximize escape probability.
The assumption that a 19-amino-acid organism "cannot survive outside controlled conditions" is plausible but unproven at the level of rigor that environmental release would demand.
Funding and Intellectual Property
The Ec19 research was conducted at Columbia University, MIT, and Harvard, with funding from U.S. federal science agencies . The broader field of codon compression is commercially anchored by Constructive Bio, a Cambridge, UK-based company founded to commercialize Jason Chin's Syn61 technology from the MRC Laboratory of Molecular Biology. Constructive Bio holds exclusive rights to foundational intellectual property from MRC-LMB, including patents on the Syn61 strain and genome-wide codon reassignment methods .
The company has raised $75 million in total funding — a $15 million seed round followed by a $58 million Series A — from investors including Ahren Innovation Capital, OMX Ventures, Paladin Capital Group, and Amadeus Capital Partners . Its stated goal is to build organisms capable of producing "new classes of enzymes, pharmaceuticals and biomaterials" by reprogramming the genetic code to incorporate non-natural monomers at scale .
The concentration of key patents in a single company raises questions about access. If codon-compression technology becomes essential infrastructure for next-generation biomanufacturing, the terms on which Constructive Bio licenses its IP will determine whether the platform functions as an open resource or a proprietary bottleneck. The company has not publicly disclosed its licensing terms. The Wang lab's work on isoleucine removal represents an independent approach, but any organism that combines amino acid removal with codon reassignment may need to navigate Constructive Bio's patent estate.
What Comes Next
Wang has said his team plans to extend the approach across the rest of the E. coli genome — and has raised the possibility of attempting an 18-amino-acid organism . Achieving a truly isoleucine-free cell would require faster DNA synthesis technology and AI models capable of redesigning thousands of proteins simultaneously, not just individually. The current state of the art validates proteins one at a time; scaling to genome-wide redesign is an engineering challenge that does not yet have a clear solution.
The work also has implications beyond Earth. An organism that can function without a standard amino acid could, in principle, be deployed in environments where that amino acid is unavailable — a consideration for long-duration space missions or extraterrestrial biomanufacturing .
For now, Ec19 remains a proof of concept: a bacterium whose most essential molecular machine can run on 19 amino acids, even as the rest of its genome still speaks the full 20-letter language. The distance between that partial achievement and a fully recoded organism is measured not just in base pairs but in unanswered questions about fitness, safety, regulation, and who gets to control the technology.
Related Stories
Trump Administration Files Antisemitism Lawsuit Against Harvard
Harvard Physicist Suggests Large Interstellar Object May Be Alien Reconnaissance Probe
Physicists Observe Dark Patches Moving Faster Than Light Without Violating Relativity
Federal Health Agency Identifies Three Candidate Treatments for Osteoarthritis
Immigration Appeals Board Rules Against Mahmoud Khalil, Moving Him Closer to Deportation
Sources (18)
- [1]AI helps create bacterium that's partially missing a universal amino acidscience.org
Researchers at Columbia, MIT, and Harvard used AI-guided protein design to engineer an E. coli strain whose ribosomal proteins function without isoleucine.
- [2]All life runs on 20 amino acids. These cells run key machinery on just 19nature.com
Nature News coverage of the Science paper on engineering E. coli ribosomal proteins to function without isoleucine.
- [3]Scientists use AI to test whether life can run on only 19 amino acidsscientificamerican.com
The final strain Ec19 purged 382 isoleucine residues from ribosomal proteins, maintaining above 90% wild-type fitness over 450 generations.
- [4]Total synthesis of Escherichia coli with a recoded genomenature.com
Jason Chin's group synthesized a 4-megabase E. coli genome with 18,214 codon substitutions, creating the Syn61 strain with a compressed 61-codon genome.
- [5]Creating an entire bacterial genome with a compressed genetic codemrc-lmb.cam.ac.uk
MRC-LMB description of the Syn61 project: replacing serine codons TCG/TCA and stop codon TAG with synonymous alternatives across the entire E. coli genome.
- [6]Unnatural Amino Acids Market Size, Share & Growth Report 2032snsinsider.com
The unnatural amino acids market was valued at $2.16 billion in 2024 and is projected to reach $5.03 billion by 2032, growing at 11.15% CAGR.
- [7]Toward life with a 19–amino acid alphabet through generative artificial intelligence designscience.org
Primary research paper in Science describing the engineering of E. coli ribosomal proteins to function without isoleucine using AI-guided design.
- [8]Biocontainment of genetically modified organisms by synthetic protein designpmc.ncbi.nlm.nih.gov
Computationally redesigned essential enzymes confer metabolic dependence on nonstandard amino acids, exhibiting unprecedented resistance to evolutionary escape.
- [9]Origin and evolution of the genetic code: the universal enigmapmc.ncbi.nlm.nih.gov
Analysis of genetic code evolution suggests it evolved from simpler initial states with fewer amino acids, combining frozen accident with selection for error minimization.
- [10]Genomic Recoding Broadly Obstructs the Propagation of Horizontally Transferred Genetic Elementspmc.ncbi.nlm.nih.gov
Alternate genetic code conferred viral resistance at titers up to 10^11 PFU/mL and impaired conjugative plasmids up to 100,000-fold.
- [11]Frozen Accident Pushing 50: Stereochemistry, Expansion, and Chance in the Evolution of the Genetic Codepmc.ncbi.nlm.nih.gov
Review of the frozen accident hypothesis at 50 years, examining evidence for stereochemistry, expansion, and chance in genetic code evolution.
- [12]Frozen, but no accident — why the 20 standard amino acids were selectedfebs.onlinelibrary.wiley.com
Analysis showing excellent reasons for selection of every amino acid; the set appears near-optimal for forming soluble structures with close-packed cores.
- [13]How GMOs Are Regulated in the United Statesfda.gov
FDA, EPA, and USDA ensure GMO safety under the Coordinated Framework for Regulation of Biotechnology, established 1986 and updated 2017.
- [14]Cartagena Protocol on Biosafetyen.wikipedia.org
International agreement effective since 2003 governing transboundary movement of living modified organisms resulting from modern biotechnology.
- [15]Regulation of Synthetic Biology: Developments Under the Convention on Biological Diversity and Its Protocolspmc.ncbi.nlm.nih.gov
Synthetic biology has been discussed under the CBD for nearly a decade; new types of LMOs present challenges for regulators and existing definitions.
- [16]From Genome Writing to Novel Polymers: Constructive Bio's and Jason Chin's Bold Vision for Engineered Biologysynbiobeta.com
Constructive Bio holds exclusive IP from MRC-LMB for genome-scale codon reassignment, building on the Syn61 platform.
- [17]Constructive Bio launches with $15m seed investmentamadeuscapital.com
Constructive Bio launched with $15M seed to commercialize codon compression technology from MRC-LMB, aiming to create new enzymes, drugs, and biomaterials.
- [18]About Constructive Bioconstructive.bio
Cambridge UK biotech company with $75M total funding building recoded organisms for novel polymer synthesis and protein therapeutics.
Sign in to dig deeper into this story
Sign In