The longevity field has a data problem that rarely gets acknowledged: we have accumulated decades of mouse studies on aging interventions, but the mouse lifespan of roughly two years makes it a genuinely poor model for the cellular processes that govern multi-decade human aging. When you want to find conserved genetic variants that actually matter for extended healthy lifespan, you should be looking at animals that live a long time — and comparing across enough divergent species that the signal rises above phylogenetic noise.
At caVos, our target identification pipeline begins not with a pathway hypothesis but with a comparative genomics question: which protein-coding variants are over-represented in lineages showing exceptional longevity relative to their body mass? That framing shifts the analysis considerably. You stop asking "what does this pathway do in aging?" and start asking "what did evolution select for, independently, in species that happened to live extraordinarily long?"
Three Species Worth Studying in Detail
Three organisms have become reference points for longevity genomics, each for different reasons. They are not chosen arbitrarily — each offers a different biological context that helps triangulate which signals are genuinely conserved versus which are species-specific adaptations to unusual ecological pressures.
Naked Mole Rats: The Rodent Anomaly
The naked mole rat (Heterocephalus glaber) lives up to 37 years, compared to a predicted lifespan of roughly 6 years based on its body mass. That is not a small deviation — it represents a roughly sixfold departure from the allometric expectation. The cells of naked mole rats display unusually high resistance to protein aggregation, maintain ribosomal fidelity better than mice into advanced age, and show distinctive high-molecular-weight hyaluronic acid secretion that appears to suppress contact inhibition loss.
From a comparative genomics standpoint, naked mole rats are interesting because they are rodents. You can do genuine sequence-level comparison against mice and rats using well-annotated reference genomes, which reduces the alignment quality problem that plagues more phylogenetically distant comparisons. The protein-coding variants in HMW hyaluronidase 2 (HAS2) and the proteasome regulatory subunit genes show positive selection signatures that are absent in shorter-lived rodents. Whether those variants are directly causal for extended lifespan or are correlated with other adaptations — eusocial colony living, subterranean lifestyle — is genuinely not settled.
Bowhead Whales: Mammalian Longevity at Scale
Bowhead whales (Balaena mysticetus) are estimated to live over 200 years based on aspartic acid racemization dating and the recovery of 19th-century harpoon tips in contemporary animals. They are the longest-lived mammals with a sequenced genome, which was published in 2015 (Keane et al., Cell Reports). The genome revealed positive selection signatures in DNA repair genes — particularly ERCC1 and several genes in the nucleotide excision repair pathway — as well as in PCNA (proliferating cell nuclear antigen) and cell cycle checkpoint kinases.
What makes the bowhead genome analytically tractable is that you can compare it against other cetaceans with shorter lifespans — dolphins (~40 years), porpoises (~20 years) — and identify which variants are unique to the bowhead or shared only with other long-lived cetaceans. The signal-to-noise ratio is better than comparing across orders. The identified variants in tumor suppressor pathways are notable because bowhead whales, despite their enormous cell count (over 1000× more cells than a human), show extraordinarily low cancer rates, suggesting that their DNA repair and anti-proliferative adaptations work at scale in a way that matters for tissue maintenance over centuries.
Greenland Sharks: Vertebrate Longevity at its Extreme
Greenland sharks (Somniosus microcephalus) were dated using radiocarbon analysis of the metabolically inert eye lens crystallins in a 2016 Science paper by Nielsen et al., with the largest specimen estimated at 392 ± 120 years. That makes them the longest-lived vertebrates with any quantitative age estimate. The genome is not as well-annotated as the bowhead, and the elasmobranch lineage diverged from tetrapods roughly 450 million years ago, making sequence-level alignment to human orthologs genuinely challenging.
We're not claiming that Greenland shark genomics gives us directly actionable mRNA targets for human therapeutics. The phylogenetic distance is large enough that positive selection signals in shark genes can reflect aquatic-specific pressure, cold adaptation, or torpor physiology rather than generalizable longevity mechanisms. What sharks do offer is a phylogenetically independent data point for asking whether certain gene families — DNA damage tolerance, mitochondrial reactive oxygen species management, IGF-1 signaling axis — show convergent structural modifications across deeply divergent vertebrate lineages that happen to be long-lived.
The Methodology: Cross-Species Alignment to Positive Selection Analysis
The computational pipeline we use to surface mRNA candidates from comparative genomics follows a roughly four-stage process, though in practice these stages are not cleanly sequential.
The first stage is ortholog identification. For each long-lived species, we identify one-to-one orthologs with human genes using a combination of reciprocal best-BLAST hits and synteny-based methods. BUSCO completeness scores on the genome assemblies set a quality floor — we don't include genome-species pairs where the assembly is too fragmented to trust the orthologs. For the bowhead and naked mole rat this is straightforward; for Greenland shark it requires more conservative filtering.
The second stage is multiple sequence alignment across a panel of species with varying lifespans. We construct a phylogenetic tree-aware alignment and then run branch-site models (using codeml from the PAML package, or more recently HyPhy for some analyses) to identify codons showing elevated dN/dS ratios — that is, nonsynonymous to synonymous substitution rates exceeding neutral expectation — specifically on long-lived lineage branches. A site showing elevated dN/dS across the bowhead branch and the naked mole rat branch, independently, is a much stronger candidate than a site with lineage-specific acceleration in only one species.
The third stage is pathway enrichment. Raw lists of positively selected genes are not therapeutically useful without context. We run these candidate lists through pathway enrichment analysis — currently using a combination of Reactome and Gene Ontology annotations — to ask which functional modules are over-represented. This consistently surfaces a small set of pathway categories: DNA damage response, proteostasis (proteasome and autophagy), mitochondrial import and quality control, and insulin/IGF-1 signaling. The convergence on these pathways across three very different long-lived species increases our confidence that the signal is real rather than artifact.
The fourth stage is translatability filtering. Not every positively selected variant is a useful mRNA therapy target. We apply several filters: the gene must have a human ortholog with sufficient expression in the disease-relevant tissue (brain, kidney, liver depending on target indication), the protein-coding variant must be in a functional domain rather than an unstructured linker, and — critically — the variant must increase function or stability rather than representing a loss of function. mRNA therapeutics are protein upregulation tools. We are looking for gain-of-function variants associated with longevity, not loss-of-function variants associated with disease prevention.
Where This Pipeline Surfaces Credible Candidates
We have run this pipeline across a panel including naked mole rats, bowhead whales, Greenland sharks, little brown bats (Myotis lucifugus, up to 34 years lifespan), and ocean quahog clams (Arctica islandica, up to 500 years lifespan). Adding bivalves introduces obvious problems — the phylogenetic distance is extreme — but clam longevity genomics has been studied independently and the FOXO orthologs and heat shock protein systems in bivalves show conserved patterns that are worth cross-referencing even if they cannot be directly aligned.
The candidates that survive our filtering pipeline cluster into a few areas. Klotho-pathway components appear repeatedly — not simply alpha-Klotho itself but the FGF23/Klotho receptor interaction interface, where structural variants in long-lived species appear to modulate the signaling sensitivity. FOXO-family transcription factors show conserved regulatory region variants in several lineages. DNA repair scaffold proteins in the non-homologous end joining pathway show strong signals in the bowhead. And a subset of mitochondrial inner membrane proteases show positive selection patterns suggesting improved protein quality control under the oxidative conditions that accompany aging.
We want to be precise about what this pipeline does and does not tell us. Positive selection analysis identifies variants that evolution favored in long-lived lineages. It does not prove that those variants are the cause of extended lifespan — correlation between a variant and lineage lifespan is not causal evidence. Many of the variants we identify may be passengers on a selection sweep driven by a nearby gene, or may reflect selection for cold tolerance in the bowhead rather than longevity per se. Our candidate list is a hypothesis-generation tool, not a validated target list.
From Genomic Signal to mRNA Candidate
The comparative genomics stage of our pipeline feeds into a sequence design question: if we wanted to express a human-compatible version of a protein that incorporates the positively selected variant residues observed in long-lived species, what mRNA would encode that protein, and how would we design it for optimal expression and minimal immunogenicity?
This is where the computational biology work becomes pharmaceutical biology work. A variant residue identified in a whale ortholog cannot be directly transplanted into the human protein without checking that (a) the structural context is conserved enough that the residue will fold correctly, (b) the variant does not create novel T-cell epitopes that would trigger an immune response, and (c) the modified protein's interaction partners are not disrupted. We use AlphaFold2 structure prediction for candidates where crystal structures are not available and run epitope prediction tools against the modified sequences before advancing them to wet-lab validation.
The mRNA sequence itself — codon optimization, UTR selection, 5' cap analog choice, modified nucleotide incorporation — is a separate layer of design that matters enormously for expression level, half-life, and immunostimulation. We address that design problem separately, but it is worth noting that the genomic pipeline upstream only has value if the mRNA sequence design downstream can actually deliver protein at the target tissue in sufficient quantities to be pharmacologically meaningful. The two problems are coupled.
Comparative genomics of long-lived species gives us a list of protein variants that nature has tested over millions of years of selection. It does not give us a drug. Translating those variants into therapeutically viable mRNA candidates requires moving through structural biology, immunology, and delivery chemistry before any animal experiment makes sense. What the genomics gives us is confidence that the target class is worth investing that translational effort — because if multiple phylogenetically independent lineages independently evolved in the same molecular direction, the probability that we are chasing a dead end is materially lower than if we were pursuing a target based on a single mouse study.