PhD Public Seminar - Alexander Sasse

Alexander Sasse
Thursday, October 21, 2021 - 9:00am
Zoom Meeting: Email graduate.coordinator@utoronto.ca for link
PhD Oral Seminar
Abstract: 
Most RNA-binding proteins (RBPs) find their targets through unique binding preferences towards specific RNA sequence or RNA sequence-structure patterns, called specificities. Recently, we used RNAcompete, an in vitro binding assay, to measure the RNA sequence specificities of 174 RBPs. Combined with previous RNAcompete measurements for 207 RBPs, we established the largest collection of RBP sequence specificities to date, containing 381 RBPs from 34 eukaryotes, from protists to humans. RNA sequence specificities are nearly always conserved if RBPs share more than 70% sequence identity (70%-rule) across their RNA-binding domains (RBDs). However, only about half of the RBPs sharing between 30% and 70% recognize similar RNA, limiting the confidence in predictions from sequence identity. To increase the number of RBPs with a confidently inferred specificity from our measured data, I developed a computational method, called joint protein-ligand embedding (JPLE), which jointly embeds amino acid 5-mers and RNA sequence specificities into a joint latent space. The joint latent representation of an RBP can be approximated from protein sequence features alone and enables reconstructions of RNA sequence specificity, prediction of RNA binding similarity, and identification of important binding peptides in the protein sequence. JPLE doubles the number of RBPs with confidently inferred RNA sequence specificities compared to predictions with the 70%-rule. I embed RBPs from 690 eukaryotes in JPLE’s latent space, reconstruct RNA sequence specificities for ~29,000 RBPs, and determine clusters of RBPs with similar RNA sequence specificity. I use these clusters to estimate the binding capacity and evolutionary rate for all eukaryotic RBPs with RRM and KH domains, confirming that RRM containing RBPs massively expanded in different clades to recognize highly diverse sequences. Lastly, I combine 101 inferred RNA sequence specificities from Arabidopsis thaliana with RNA-seq from 69 plant tissues to identify RBPs that regulate mRNA stability through interactions with the 3’UTR.
Supervisor: 
Dr. Quaid Morris
Department of Molecular Genetics