The genomic “Telomere-to-Telomere” (T2T) consortium has published a gap-free human genome sequence in Science

A publication in Science (Nurket et al., Vol. 376,44–53, April 1, 2022), has presented a complete sequence of a human genome, led by a consortium of genomic scientists at the Computational and Statistical Genomics Branch of the National Human Genome Research Institute, NIH, Bethesda, MD. The paper completes a significant milestone to show the first complete “gap-free” sequence of a human genome, comprising the 3.055 billion-base pair sequence. However, as commented in an accompanying Nature news story by Elizabeth Pennisi, “when it comes to sequencing the human genome, ‘complete’ has always been a relative term”.

While the original human genome sequences that were published in 2001^1,2, new and powerful technologies have now presented more detail from the Telomere-to-Telomere (T2T) Consortium, providing a complete comprehensive human reference genome. The current number of genes in the human genome in this study indicates 63,494 “genes”, 19,969 of which comprise protein-coding sequences. This compares, in 2001, to 39,114 “genes”, ~31,000 of which comprise protein-coding sequences¹. (The “Ensembl” site [www.ensembl.org] provides definitive stats on the human reference genome and gene annotation). In their comprehensive article, the T2T consortium scientists stated that the new data “adds five full chromosome arms and more additional sequence than any genome reference release in the past 20 years. This 8% of the genome has not been overlooked because of a lack of importance but rather because of technological limitations. High-accuracy long-read sequencing has finally removed this technological barrier, enabling comprehensive studies of genomic variation across the entire human genome, which we expect to drive future discovery in human genomic health and disease”.

This resolution of the problem of closing gaps in the sequence arose from using improved technologies and decreased costs. The telomeres and the centromeres were among the most challenging sections for some sequence, especially the centromeres in chromosomes 13, 14, 15, 21 and 22 – the short arms of these 5 chromosomes were originally identified in the 1950s. As the gaps in the sequence were reduced over time, a 2017 release of the human genome (version called “GRCh38” in Ensembl) showed less than 1,000 gaps remaining. Subsequent work from the T2T consortium has now presented the most recent publication however, Dr. Richard Gibbs, a geneticist at Baylor College of Medicine Human Genome Project co-leader commented that, “[t]here’s still some work to do”. Elizabeth Pennisi’s excellent news story reported that, “other researchers stress that the field now needs to get similarly complete genome sequences from a greater diversity of people to look for variation in the short arms and the other tough-to-read regions, which could play a role in diseases or traits”. The study of unravelling the human genome will continue.

(1). International Human Genome Sequencing Consortium (2001). Initial sequencing and analysis of the human genome. Nature 409: 860–921.

(2). Venter JC, Adams MD, Myers EW, et al. (2001) The sequence of the human genome. Science 291: 1304–1351.