In the human body, there are about 20,000 protein-coding genes which contain almost 180,000 known internal exons. These regions comprise only one percent of the entire human genome. The vast majority that still remains a mystery is referred to as the "dark genome".


What Are Exons?

Exons refer to segments of the genome that can encode proteins to manage tissue development and biological processes within the body. They are regarded as autonomous if they do not need external assistance to splice into a mature RNA transcript.

This region can randomly mutate within the human genome, that is why it is important to document these events since their translation can be potentially harmful. For instance, long non-coding RNA exons, considered autonomous but usually have no known function, have been associated with cancer development.

Additionally, there are exons which reside within non-coding introns. Also known as pseudoexons, these fragments can mutate to strengthen a weak splice site. As a result, an exon included in a mature RNA transcript can potentially lead to disease.

READ ALSO: Human Genome Sequence Completed: Scientists Say Sequencing of Entire Human Genome Done with Missing Pieces


New Insights About Human Genome

An exon definition model has been guiding researchers in molecular genetics. One of the model's assumptions states that accurately removing non-protein-coding intron regions of the genome is aided by clear indicators of where exons begin and end.

However, this assumption does not seem to hold in all cases since splicing of exons does not always go smoothly. Instead, they sometimes result in mature RNA transcripts containing non-functional components. In the study "The human genome contains over a million autonomous exons", the researchers were driven to test the model by questioning this assumption.

Led by Professor Timothy Hughes from the department of molecular genetics in Temerty Faculty of Medicine, the discovery was made possible through a method called exon trapping. This technique involves an assay with plasmids to search for exons in DNA fragments of unknown composition. Although exon trapping is not widely used anymore, it is effective when used in combination with high-throughput sequencing to survey the entire human genome.

In this study, almost none of the newly discovered exons were consistently found across genomes of different species. According to Hughes, they likely appear in the human genome because of random mutation and have the potential to play a significant role in our biology. This suggests that human evolution involves several trial and error, possibly enabled by the vast size of our genome.

The researchers found roughly 1.25 million known and unknown exons through exon trapping. Of these, almost 4% were long non-coding RNA exons. The significance of the newly found exons remains unclear, although some of them can be activated in certain contexts. Scientists believe that their findings can serve as a valuable source of efforts directed at deciphering the splicing code.

With stronger understanding of the factors affecting exon inclusion in mature RNA, experts can improve programs such as SpliceAI. This widely used tool helps in predicting splice sites and aberrant splicing. It can be trained on new data, so that those produced through this research can be used to refine its prediction capabilities.

In most cases, SpliceAI does not offer details on the characteristics of exons. It also has a poor ability in predicting exons splicing which are not yet catalogued. The exon trapping data used in this study contains biologically useful information which can be fed into SpliceAI and other splicing prediction tools. As a result, they can open up new paths for exploring the nature of the dark genome.

RELATED ARTICLE: AI Used to Predict 3D Structures of Proteins Made by Human Genome; Critical for Advancing Medicine


Check out more news and information on Human Genome in Science Times.