1928 long-read analysis in-depth

2020-07-02

1928 now supports analysis of both shortreads (left) and longreads (right).

Analysing long-read data with 1928

The 1928 service platform provides analysis of long-read data. The service currently supports analysis of long-read data from sequences either from PacBio or Oxford Nanopore.

How to analyse

Once in the platform, just drag and drop the files you want to analyze and press upload. The upload and analysis will start immediately.

What analyses can I perform?

We currently support two different analyses for long-read data, namely MLST and resistance gene identification.

For MLST, a best estimate is given on what alleles are present and what sequence type the sample has. It is important to stress that we only give our best estimate since the error rate is way higher compared to Illumina and IonTorrent (roughly 10-15% error rate compared to 0.1-1%). Especially for Nanopore, the mlst result will depend on the species and how old/new the Nanopore data is (the newer the better).

For resistance gene identification, the same database is used for long-reads and short-reads. However, a different approach is used to find the resistance genes and because of this results may vary when using the two different sequencing methods. For long-read analysis identifying a gene has high accuracy, while determining the correct gene variant can be more challenging. This is partially because genes can differ by as little as one nucleotide and partially due to the increased error rate for long-reads.

What species do we support?

The 1928 platform has support for the following species:

The species in italics (Mycobacterium tuberculosis and Legionella pneumophila) currently cannot be analyzed in the 1928 long-read pipeline.

What generates high quality results?

Long-read sequences have a significantly higher error rate than short read sequencing. This needs to be accounted for in the bioinformatic data analysis. 1928 believes that results from an analysis should come fast and be reliable. To support this functionality we have implemented different quality control steps in the service.

The following pre-processing steps are implemented in the 1928 platform:

  • Downsampling: If the sample is really big (several gigabytes of data) then data is downsampled to a desired size. This significantly reduces the run time.
  • Adapter removal: Sequencing adapters for Nanopores samples, are searched for and removed.
  • Read filtering: Reads that are too short or have very low quality are removed.

When it comes to our quality metrics, a certain sequencing depth is required in order for the analysis to proceed. Currently, that threshold is set to 40x for Nanopore and 60x for PacBio. This means that the genome on average needs to be covered at least 40 times and 60 times respectively (after the pre-processing steps). Otherwise, the analysis will stop.

For support and further insights on how to best use long-read sequencing together with the 1928 platform, connect with us directly at support@1928d.com.