Using AlphaFold-Era Tools for mRNA Target Validation: What Works and What Doesn't

The availability of high-confidence structural predictions for nearly every human protein has changed how we approach the early stages of target validation. Three years ago, a question like "does this candidate protein adopt a fold consistent with membrane association?" required either pulling experimental structures from the PDB — which existed for a small fraction of the proteome — or running homology modeling that was only reliable for proteins with close structural relatives. Now, AlphaFold2 and its successors give you a structural hypothesis for almost any protein within seconds, and AlphaFold-Multimer gives you interaction interfaces that would previously have required months of co-crystallization attempts.

That is genuinely useful. But it is not uniformly useful, and for a program like ours — where the validation question is whether a longevity-associated protein is tractable as an mRNA upregulation target — some aspects of the structural prediction toolkit work well and others introduce risks that are easy to overlook if you are not paying attention to confidence scores and the specific failure modes of these models. This post is a detailed account of how we use these tools, what we have learned to trust, and where we have had to cross-check computationally derived conclusions against experimental literature before committing to a direction.

What AlphaFold-Era Tools Actually Tell Us in Our Workflow

Our use of AlphaFold predictions divides into three categories: fold classification, interaction interface analysis, and disordered region mapping. Each has different reliability characteristics that we have developed a feel for over the course of applying them to our target list.

Fold classification — determining whether a protein adopts a known structural family and what that implies about its function and cellular behavior — is where AlphaFold2 is most reliably useful. For proteins that fall into well-characterized structural families (kinases, phosphatases, secreted cysteine-rich proteins, transmembrane receptors), the predicted fold is almost always consistent with experimental structures where those exist, and the pLDDT confidence scores are typically high across the structured domains. For our Klotho program, the AlphaFold prediction for the full-length alpha-Klotho protein, including both KL1 and KL2 domains, was consistent with the published cryo-EM structures and gave us useful information about the domain linker region that was poorly resolved in some experimental structures.

Interaction interface analysis using AlphaFold-Multimer is valuable but requires more caution. The tool can predict protein-protein interaction interfaces with useful accuracy when both proteins are in the training data and when the interaction involves well-structured domains. Where it becomes less reliable is for interactions that are mediated primarily by intrinsically disordered regions, for weak transient interactions, or for interactions that are conditional on post-translational modifications that the model has no direct way to represent. FOXO3 is instructive here: the interaction between FOXO3 and its nuclear transport machinery involves interfaces that are regulated by phosphorylation state, and the AlphaFold-Multimer predictions for these complexes give you the apo-protein interface geometry without reliable representation of how phosphorylation changes the interaction energetics. We use these predictions as starting hypotheses, not as conclusions.

The Disordered Regions Problem

Intrinsically disordered regions — IDRs — are systematically problematic for structure-based target validation, and they are unusually prevalent in the longevity-associated protein candidates we work with. This is not a coincidence: many transcription factors, regulatory proteins, and stress-response proteins that appear in longevity biology are rich in IDRs because disorder is functionally important for their ability to interact with multiple partners and respond to cellular signals. FOXO3, p53, NRF2, SIRT1 — all have substantial disordered segments that are critical for their biology and that cannot be reliably modeled by AlphaFold2 in the conventional sense.

The tool handles this honestly: low pLDDT scores flag disordered regions explicitly, and the predicted structures for IDRs should be understood as representative conformations rather than ground-truth geometries. The problem is that IDRs are often where the biological action is. The transactivation domain of FOXO3, which is in an IDR, is where many of the regulatory interactions that determine FOXO3 activity occur. Structural predictions for this region are not reliable guides to designing mRNA constructs that preserve specific functional states of that domain.

What we do instead for IDR-heavy targets is rely on linear motif databases — ELM, ANCHOR, and the phosphorylation site databases such as PhosphoSitePlus — to map the functional elements within disordered regions. This is a different kind of computational analysis than structure prediction: it is sequence-based pattern matching rather than physics-based folding, and it is well-validated for the specific purpose of identifying short functional motifs within disordered context. For an mRNA construct design question — specifically, which isoform of a protein to encode, and whether truncation of disordered N- or C-terminal segments is tolerable — this analysis is more directly useful than the structural prediction alone.

Structural Tractability for mRNA Upregulation: A Different Question Than Small Molecule Druggability

Most of the literature on computational druggability assessment is written with small molecules in mind. The SiteMap algorithm, FTMap, and related approaches identify binding pockets and score them for drug-like molecule accommodation. These tools are designed to answer the question: is there a pocket in this protein that a small molecule inhibitor or activator could bind?

That is not our question. For an mRNA upregulation program, the tractability question is fundamentally different: if we increase cellular expression of this protein, does the protein behave appropriately, fold correctly, find its correct cellular compartment, engage its correct partners, and produce the biological effect we intend — without triggering off-target effects from elevated expression of a protein that is normally tightly regulated? This question does not have a direct structural answer. It requires a different analytical framework.

We use structural data to inform three components of this assessment. First, fold stability at elevated expression: proteins that fold rapidly and stably are less likely to accumulate as misfolded aggregates when expression increases. We look at predicted folding energy estimates from tools like RosettaFold alongside AlphaFold confidence scores to identify candidates where the structured domains are likely to be robust to increased cellular concentration. Proteins with large IDRs or with folds that require co-translational chaperone assistance are flagged for more careful analysis.

Second, secretion and trafficking signals: for our CNS program, we are primarily interested in proteins that can be secreted or that localize to the cell membrane, because these are most accessible as targets for mRNA therapeutics that deliver to a subset of cells in the CNS. Alpha-Klotho is secreted — it has a signal peptide and a shed ectodomain form that circulates in CSF — which is one reason it is attractive for our program. The structural prediction plus signal peptide analysis tools (SignalP, DeepTMHMM for membrane topology) give us a rapid assessment of whether a candidate protein is a secreted or membrane protein, which is a key filter in our target ranking.

Third, interactome complexity as a risk signal: proteins that have predicted interaction interfaces with large numbers of partners — inferred from structural similarity to proteins in well-characterized interactome datasets — carry higher risk of off-target effects when overexpressed. A protein that interacts with 5 known partners behaves more predictably when overexpressed than one that interacts with 50. We use this as a soft filter, not a hard cutoff, because interactome complexity can be managed with careful dose and temporal control. But it is a risk factor worth tracking.

Specific AlphaFold Limitations We Have Hit in Practice

We want to be concrete about where these tools have failed us or required careful cross-checking, because the limitations are as important as the capabilities for anyone building a real workflow around them.

The most significant practical limitation we have encountered is the treatment of multimeric proteins. Several of our candidate targets — including some sirtuin family members and longevity-associated kinases — are obligate homodimers or higher-order oligomers in their active form. AlphaFold2's monomer predictions for these proteins show the folded protomer structure reasonably well, but the assembly interface geometry, which governs whether an expressed protein will correctly self-assemble into the functional oligomeric state, requires AlphaFold-Multimer or experimental data. In cases where we could check AlphaFold-Multimer predictions against crystal structures, the interface geometries were often close but not always reliable for quantitative analysis — the buried surface area and interaction energetics can be substantially off even when the qualitative topology is correct.

The second limitation is for proteins with large flexible linkers between structured domains. Alpha-Klotho has two glycoside hydrolase-like domains connected by a flexible segment, and the relative domain orientation in the AlphaFold prediction is one conformation from a conformational ensemble. Published cryo-EM data shows that the domain arrangement is indeed flexible, and the AlphaFold prediction captures a reasonable average geometry but should not be read as a fixed structure. For our mRNA construct design, this means we cannot confidently engineer domain-domain contacts based on the predicted structure alone.

We are not saying that AlphaFold-era tools are unreliable — the opposite is true for the use cases where they are strong. We are saying that their reliability is strongly use-case-dependent, and that the failure modes are predictable enough to manage if you approach the output as a high-confidence hypothesis rather than as experimental ground truth.

How We Integrate Computational and Experimental Validation

Our practical workflow uses computational structural analysis to generate a ranked list of mRNA construct design options for each validated target, prioritizing constructs where the structural predictions are high-confidence and the functional assessments are consistent. The top-ranked construct designs then go into our in vitro expression testing — transfection into iPSC-derived neurons or relevant primary cell types — where we measure expression level, cellular localization, and a set of functional readouts appropriate to each target.

The computational analysis does not replace the experimental step. What it does is reduce the number of experimental variants we need to test by eliminating constructs with obvious structural risks — misfolding-prone truncations, signal peptide mutations that would disrupt trafficking, isoform selections that remove critical functional elements — before we spend wet lab time on them. In a small team with limited CRO access, that filtering function has real value. A computational pre-screen that narrows 20 potential construct variants to 5 prioritized candidates for experimental testing is a genuine efficiency gain, even if the pre-screen is probabilistic rather than deterministic.

The tools we are most optimistic about for the next iteration of this workflow are the protein language model-based embeddings — ESM2 and related models — for functional effect prediction of sequence variants, and the emerging class of structure-dynamics models that begin to address the conformational ensemble question that static AlphaFold predictions cannot answer. Neither is ready for routine deployment in our current pipeline, but both are developing rapidly enough that we expect to integrate them within the next year of work.