Genome sequencing not always accurate


Pepijn Kooij


Research on genomes is complex, even for scientists. The human genome was sequenced completely in 2003 after 13 years of work. A genome is the DNA of an organism found in one cell and contains the information of how the organism is build up and functions. It is made up of four different tiny building blocks called nucleotides that are normally represented as letters (A, T, G and C). The human genome contains 3.1 billion (3.1 Giga base pairs or 3.1Gbp) of these nucleotides, enough letters to fill over 175 books or 262,000 pages.


How scientists sequence genomes (brief)


Researchers normally sequence these genomes by chopping the DNA up in tiny pieces of about ±500 nucleotides (or base pairs) and then use powerful computers and software to put these millions of pieces together like a giant jigsaw puzzle. Once all put together, scientist can then look for what genes, i.e. functional parts of the genome, are present and also how many copies there are of the same genes. Unfortunately, computers are only as good as the software they use, written by researchers. This means that sometimes mistakes are made and the researchers using this software might be drawing wrong conclusions.


Combining different tests to improve results


In our most recent study [link], my colleague Jaume Pellicer and I used another technique to estimate the number of base pairs more accurately and we then used this estimation to see which software was the most accurate. With flow cytometry you attach a fluorescent marker to the DNA in your sample. When the sample is then put in the machine, it measures the strength of the fluorescent signal, i.e. the stronger the signal, the more DNA.


We discovered that genomes previously published for the fungus cultivated by ants (Leucoagaricus gongylophorus) are too large (>100Mbp) when compared to our results from the flow cytometer (±40Mbp), but that there are some software packages (ABySS) that are able to get close to this figure. This result of ±40Mbp is very close to the average genome size of fungi in general (±44.2Mbp) and those in the same taxonomic order (Agaricales; ±50Mbp).


Accurate data is key when we want to understand nature and natural processes. Our hope is that with our method scientists will be able to analyse their data more accurately, which will help to understand the world around us.




Kooij, P.W., Pellicer, J. (in press) Genome size versus genome assemblies: are the genomes truly expanded in polyploid fungal symbionts? Genome Biology and Evolution, DOI: 10.1093/gbe/evaa217

©Copyright 2022 Pepijn Kooij

p.kooij (at) unesp.br

pepijn.kooij (at) gmail.com