Humans may have only 19,000 coding genes (those genes that produce proteins), three thousand fewer than the sum of the tree reference annotations of the human genome and a much lower number than the 100,000 that predicted just twenty years ago
A new study led by the Spanish National Cancer Research Centre (CNIO) reveals that up to 20% of genes classified as coding (those that produce the proteins that are the building blocks of all living things) may not be coding after all because they have characteristics that are typical of non-coding or pseudogenes (obsolete coding genes).
This study suggests that there is still a large amount of uncertainty, since the final number of coding genes could 2,000 more or 2,000 fewer than it is now. The human proteome still requires much work, especially given its importance to the medical community.
Since the completion of the sequencing of the human genome in 2003 experts from around the world have been working to compile the final human proteome (the total number of proteins generated from genes) and the genes that produce them. This task is immense given the complexity of the human genome and the fact that we have about 20,000 separate coding genes.
The researchers analyzed the genes cataloged as protein coding in the main reference human proteomes: the detailed comparison of the reference proteomes from GENCODE/Ensembl, RefSeq and UniProtKB found 22,210 coding genes, but only 19,446 of these genes were present in all 3 annotations.
Researchers analyzed the 2,764 genes that were present in only one or two of these reference annotations and discovered that experimental evidence and manual annotations suggested that almost all of these genes were more likely to be non-coding genes or pseudogenes. In fact, these genes, together with another 1,470 coding genes that are present in the three reference catalogs, were not evolving like typical protein coding genes. The conclusion of the study is that most of these 4,234 genes probably do not code for proteins.
“We have been able to analyze many of these genes in detail and more than 300 genes have already been reclassified as non-coding,” said Michael Tress of the CNIO Bioinformatics Unit who worked with researchers from Wellcome Trust Sanger Institute, Massachusetts Institute of Technology, Pompeu Fabra University, the National Center for Supercomputing (BSC-CNS) and the National Center for Cardiovascular Research (CNIC).
The work once again highlights doubts about the number of real genes present in human cells 15 years after the sequencing the human genome.
Although the most recent data indicates that the number of genes encoding human proteins could exceed 20,000, evidence suggests otherwise.
“Our evidence suggests that humans may only have 19,000 coding genes, but we still do not know which 19,000 genes are,” Federico Abascal, of the Wellcome Trust Sanger Institute in the United Kingdom and first author of the work, says.
“Surprisingly, some of these unusual genes have been well studied and have more than 100 scientific publications based on the assumption that the gene produces a protein,” according to David Juan, of the Pompeu Fabra University and participant in the study.
The consequent reduction in the size of the human genome could have important effects in biomedicine since the number of genes that produce proteins and their identification is of vital importance for the investigation of multiple diseases, including cancer, cardiovascular diseases, etc.