Skip to content

Introduction to Nanopore Sequencing

In this tutorial we will assemble the E. coli genome using a mix of long, error-prone reads from the MinION (Oxford Nanopore) and short reads from a HiSeq instrument (Illumina).

The MinION data used in this tutorial come a test run by the Loman lab.
The Illumina data were simulated using InSilicoSeq

Get the Data

First download the nanopore data


You will not need the HiSeq data right away, but you can start the download in another window

curl -O -J -L
curl -O -J -L

look at basic stats of the nanopore reads

assembly-stats ecoli_allreads.fasta


How many nanopore reads do we have?


How long is the longest read?


What is the average read length?

Adapter trimming

The guppy basecaller, i.e. the program that transform raw electrical signal in fastq files, already demultiplex and trim for us.


We assemble the reads using wtdbg2 (version > 2.3)

head -n 20000 ecoli_allreads.fasta > subset.fasta
wtdbg2 -x ont -i subset.fasta -fo assembly
wtpoa-cns -i assembly.ctg.lay.gz -fo assembly.ctg.fa


Since the assembly likely contains a lot of errors, we correct it with Illumina reads.

First we map the short reads against the assembly

bowtie2-build assembly.ctg.fa assembly
bowtie2 -x assembly -1 ecoli_hiseq_R1.fastq.gz -2 ecoli_hiseq_R2.fastq.gz | \
    samtools view -bS -o assembly_short_reads.bam
samtools sort assembly_short_reads.bam -o assembly_short_sorted.bam
samtools index assembly_short_sorted.bam

then we run the consensus step

samtools view assembly_short_sorted.bam | wtpoa-cns -t 16 -x sam-sr \
    -d assembly.ctg.fa -i - -fo assembly_polished.fasta

which will correct eventual misamatches in our assembly and write the new improved assembly to assembly_polished.fasta

For better results we should perform more than one round of polishing.

Compare with the existing assembly and an illumina only assembly

an existing assembly

Go to and search for NC_000913. Download the associated genome in fasta format and rename it to ecoli_ref.fasta

nucmer --maxmatch -c 100 -p ecoli assembly_polished.fasta ecoli_ref.fasta
mummerplot --fat --filter --png --large -p ecoli

then take a look at ecoli.png

compare metrics


First you need to assemble the illumina data

Then run busco and quast on the 3 assemblies


which assembly would you say is the best?


If you have time, train your annotation skills by running prokka on your genome!

prokka --outdir annotation --kingdom Bacteria assembly_polished.fasta

You can open the output to see how it went

cat annotation/*.txt


Does it fit your expectations? How many genes were you expecting?