Recently we published a paper using genome skimming to generate markers for phylogenetic analysis of glycerid annelids (Richter et al. 2015). We are interested in this group, as we are also working on the evolution of venom system of these annelids (e.g. von Reumont et al. 2014). Genome skimming basically means low coverage genome sequencing using NGS techniques. E.g., in the case of a coverage of 5x, it is expected that every part of the genome is (on average) sequenced 5 times. This is far from enough to assemble the nuclear genome. However, sequences which are present in higher copy number within the cell will have also a higher coverage than the average. E.g., we can easily expect 100 or more mitochondria per cell and consequently they will be also represented in a much higher coverage after sequencing. Using this method, complete mitochondria can be reconstructed quite easily from shallowly sequenced nuclear genomes. So we did this for 19 glycerid specimens and one outgroup (Goniadidae). After assembling the data, we were able to recover complete mitochondria for 14 species. For the other cases, several large contigs containing the mitochondrial genes could be found. Moreover, we also retrieved the ribosomal cluster (18S, ITS1, 5.8S, ITS2, 28S), which is also presented in many (tandemly repeated) copies per genome. Using the sequence data, we were able to reconstruct a fairly supported phylogeny of glycerids (Fig. 1), which will help us in the future to understand venom evolution in this group.
Fig. 1 Phylogeny of Glyceridae as reconstructed using Maximum Likelihood from the nucleotides of 15 mitochondrial and 4 ribosomal genes/spacers.Squares(circles represent different gene arrangements (see also Fig. 3).
Mitochondrial genomes (mitogenomes) became a promising candidate to resolve deep metazoan phylogenies in the early 2000´s. “Big trees from little genomes” is the title of a review by Boore and Brown (1998), who illustrated this potential. Animal mitogenomes are considerably small, often ranging in size between 12 and 16kbp. They usually code for 37 genes (22 tRNAs, 13 proteins, 2 rRNAs) which in most cases are arranged on a single, circular molecule. Of course there are many exceptions to this rule and especially the mtDNA of non-bilaterian animals doesn’t fit this description (Lavrov and Pett, 2016). With sequencing mitochondrial genomes, which even feasible with Sanger methods (but laborious), you basically get two data sets for one price: gene order and sequence data. First analyses found that gene order is rather conserved across animals, making gene arrangements a potential interesting phylogenetic marker. E.g., based on the rearrangement of a single tRNA, a clade uniting crustaceans with insects has been proposed (Boore et al. 1998) – which is now generally accepted (to be more precise, insects are nested within crustaceans). In my own research, mitogenomes helped me to clearly support an annelid origin for myzostomids (Bleidorn et al. 2007). However, mitogenomic research for deep phylogenies lost steam, as in most cases the sequence data couldn’t help to resolve deep nodes, whereas the gene order was either too conserved or too variable to provide additional phylogenetic signal (e.g., Bernt et al. 2013). We also found that resolving the phylogeny of annelids seems to be difficult, to say at least, using mitogenomic data alone (Weigert et al. 2016). Basically, working on mitogenomes for deep phylogenies got out of fashion and the use of transcriptomes or complete nuclear genomes is now state of the art.
This is a pity, given that it was never easier to generate complete mitochondrial genomes. However, as you can see in this example, mitogenomes are informative when resolving phylogenies containing less old divergences (e.g., within “families” of annelids). In our case, we could run all sequencing libraries on a single Illumina HiSeq lane (paired-end, 400 million sequences, around 2000€ depending on the provider). Moreover, genome skimming of genomic DNA allows to work with DNA, which is often easier to handle (and collect) than material preserved for RNA. Using DNA also allows the inclusion of museum material. It is even possible to pool more than 20 individuals on a sequencing lane. Using some simulation studies with our data, we found that around 10 million sequences per library should be enough to recover complete mitogenomes for glycerids. The number of sequences needed also depends on the number of copies per cell – and the size of the nuclear genome. The smaller the nuclear genome, the more sequences will be retrieved for mitochondria. Unluckily, with 1-3 gbp glycerid genomes are rather big for invertebrates (even so not as giant as the ones of many crustaceans and onychophorans which are three times as big (or bigger) as the human genome, which is 3 gbp in size). We plotted the number of genomes of invertebrate species against their size and found that most invertebrates have smaller genomes than glycerids. In these cases it is expected that less than 10 million sequences are sufficient for reconstructing complete mitogenomes (Fig. 2). This has been demonstrated by Alfried Vogler’s group at the NHM in London by using genome skimming for biodiversity analyses of beetle communities. By mixing DNA samples of around 500 Coleoptera it was possible for them to retain 107 complete mitochondrial by performing two runs on an Illumina MiSeq which generated around 34 million reads. Instead of constructing individual sequencing libraries, they barcoded (using the cox1 gene) every species before the Illumina sequencing, and used these barcodes to identify individual mitogenomes in the “metagenome” assembly (Crampton-Platt et al. 2015). This seems to be definitely a way to go to work on the phylogeny of annelid families or to analyse annelid communities!
Fig. 2: Histogram comprising the genome size (given as c-value, a c of 1 corresponds roughly to 1 gbp) of 1,985 invertebrate species. Genome size of glycerids indicated by dotted lines.
We were also interested in the gene order of our glycerid mitogenomes and found some surprises here. The gene order itself reflected the conserved gene order which is well-supported to represent the ground pattern of the last common ancestor of Errantia (the group where glycerids belong to) and Sedentaria (see Fig. 3A, order of protein coding genes). Only some tRNA rearrangements (tRNAs are known to be relocated more easily) were found, resulting in four similar but different gene orders. It was nice to see that mapping these different gene orders on the phylogeny (Fig. 1) gave a consistent result and we didn’t had to assume convergent rearrangements within glycerids. The surprise came with two mitogenomes which were unusually big in size. Most annelids glycerids have mitogenomes with a size around 15 kbp, which is within the expected range for animals. However, two species showed mitogenomes with a size of more than 20 kpb. Analysing the gene content we found integrations of so-called group II introns, which are self-splicing genetic elements, within the cox1 gene of these species. This is remarkable, as such elements are rarely found in metazoan and especially in bilaterian mitogenomes. For the latter groups, so far they only have been found in few annelids, always harbouring within the cox1 gene. Our analyses of these genes clearly show that these integrations happened convergently for the two glycerid species. One of them even carries two independent group II intron integrations within the cox1 gene. Massive sequencing of annelid genomes may help to uncover even more of these rare integrations. The source for this group II introns are likely a horizontal transfer from bacteria or viruses. It remains unclear why so far within Bilateria they are only found in annelids – and why they always seem to integrate in the cox1.
Fig. 3: A. Gene order of glycerid and goniadid mitogenomes. B. Integration of group II introns with the cox1 of two glycerid species.
In summary, analysing mitogenomes is still a useful – and cost efficient – tool for phylogenetics. Annelids were thought to harbour a rather conserved gene order, but our recent study on syllids (Aguado et al. 2016) and basally branching annelids (Weigert et al. 2016) also uncovered a high diversity of this feature, which might be an interesting phylogenetic marker for some taxa.