Skip to content

Protein evolution

De novo gene birth

Protein-coding genes are sometimes born de novo from non-coding sequences. A newborn protein must avoid causing harm, e.g. through aggregation, while providing some benefit to the organism. High levels of intrinsic structural disorder help newborn genes strike this balance between avoiding harm and causing positive effects (Wilson et al. 2017, Willis & Masel 2018).

Our theories of evolvability, specifically preadapting selection (Masel 2006, Rajon & Masel 2011), help explain why an overwhelmingly high likelihood of harmfulness of a random peptide does not prevent de novo gene birth. If non-coding sequences are translated just a little bit, this provides an opportunity for the most deleterious amino acid sequences to be eliminated by selection (Wilson & Masel 2011).The pre-screened set of sequences provides the raw material from which de novo protein-coding genes could be co-opted. Consistent with this possibility is the fact that many “non-coding” sequences are often found in association with ribosomes in S. cerevisiae, of which only one appears to be a previously unannotated de novo evolved protein (Wilson & Masel 2011).

A tractable case study for the de novo birth of coding sequences is the birth of only part of a gene. One way this can occur is when a stop codon is lost, and the 3'UTR, up to a backup stop codon, becomes part of the protein C-terminus (Giacomelli et al. 2007, Andreatta et al. 2015). Low levels of stop codon readthrough prescreen the genetic variation beyond stop codons, raising its intrinsic structural disorder (Kosinski & Masel 2020).

In yeast, elevated levels of stop codon readthrough can be caused by the [PSI+] prion, an epigenetically inherited aggregate of the Sup35 protein, which is a release factor required for translation to terminate at stop codons. When [PSI+] appears, elevated readthrough occurs at every gene in the genome, and a range of pre-existing cryptic genetic variation is phenotypically revealed. As an epigenetically inherited protein aggregate, [PSI+] can easily be lost after some generations. This returns the lineage to its normal [psi-] state and restores translation fidelity. If a subset of revealed phenotypic variation is adaptive, it may have lost its dependence on [PSI+] by this time. This process of genetic assimilation may, for example, involve one or more point mutations in stop codons. This leaves the yeast with a new adaptive trait and with no permanent load of other, deleterious variation. The yeast prion [PSI+] is a wonderful model system for studying evolutionary capacitance, because the relevant molecular biology is well understood. In Saccharomyces, a high proportion of 3′UTR incorporation events involve the inclusion of inframe 3′UTR through precise mutation of the stop codon, rather than frameshifts. This is compatible with the genetic assimilation of inframe readthrough products produced by [PSI+] (Giacomelli et al. 2007).

Long-term evolutionary trends

De novo genes are born with much higher levels of intrinsic structural disorder than expected from the translation of random non-coding sequences, and then become gradually more hydrophobic with age (Wilson et al. 2017). The hydrophobic amino acids also become more interspersed rather than clustered along the sequence (Foy et al. 2019). Current work in our lab is investigating these trends in considerably more detail. E.g. do these results depend on taxon? How do trends at the amino acid level compare to other independent information? Are trends driven by descent with modification (i.e. selective advantage to alleles with more hydrophobicity) or differential retention (i.e. higher risk of evolutionary loss for my hydrophilic sequences)?

Publications: