Phage Bioinformatics

Short nuggets, videos, and slides of bioinformatics. Please help us contribute by writing to jan@phage.directory and asking for direct Notion edit access!

Welcome to Phage Directory's Phage Bioinformatics page! We'll add more talks, PDFs, and other resources throughout the year!

If you'd like to request a topic or a speaker, let Jessica know by email: jessica@phage.directory

💬

Join our Slack and come say "hi" in our #phage-bioinformatics channel!

Lecture Notes

🎓

IBRC & PHAGE DIRECTORY, APRIL 20, 2021

Phage Bioinformatics #4: Dr. Jason Gill on phage genome annotation in CPT Galaxy & WebApollo

Dr. Jason Gill, Assoc Prof at Center for Phage Technology (CPT) at Texas A&M University

Dr. Jason Gill, Assoc Prof at Center for Phage Technology (CPT) at Texas A&M University, gave a talk for the IBRC/Phage Directory phage bioinformatics series on the pipelines for phage genome annotation within the CPT's phage-specific instance of Galaxy, as well as using WebApollo.

Access CPT Galaxy here: https://cpt.tamu.edu/galaxy-pub

🎓

IBRC & PHAGE DIRECTORY, APRIL 2, 2021

Phage Bioinformatics #2: Katelyn McNair on PHANOTATE: Gene finding in phages

Katelyn McNair, Research Scholar at the University of California, Irvine

The second speaker in the IBRC/Phage Directory Phage Bioinformatics series was held March 2, 2021, with a talk by Katelyn McNair, a Research Scholar at the University of California, Irvine and San Diego State University, USA. Katelyn has developed the software PHANOTATE (https://academic.oup.com/bioinformatics/article/35/22/4537/5480131), the first gene finder specifically designed for phage genome annotation.

🎓

IBRC & PHAGE DIRECTORY, FEBRUARY 16, 2021

Genomic Annotation and Comparative Analysis of Actinobacteriophages

Deborah Jacobs-Sera, Research Instructor, SEA-PHAGES, University of Pittsburgh

Highlights of Deborah's talk, by Madhav Madurantakam Royam

Dr. Urmi Bajpai, an Associate Professor from Acharya Narendra Dev College, University of Delhi India introduced the speaker, who oversees PHIRE (Phage Hunters Integrating Research and Education) and SEA-PHAGES (Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science) programs.

In Dr. Hatfull’s lab they have a collection of about 18,000 phages and they have sequenced around 3700 of them, almost all of the phages were isolated by the students through the PHIRE and SEA-PHAGES program.

The first steps for the students are: initial collection samples, purification, DNA extraction and archiving and the sequencing. - They use Illumina for sequencing, as they get good enough coverage and experience to analyse the genome without additional technology.

Dan Russell from their laboratory has written tutorials on how to finish an orientation and it is available in PhagesDB (http://www.phagesdb.org).

Most annotation is done manually as there are possibility to miss certain genes as phages don’t follows the rules of programming parameters.

An important skill for the bioinformatic analysis is pattern recognition.

The predictive coding potential is done using Glimmer and GeneMark tools which used the 4 nucleotide patterns using the Hidden Markov models.

Some of her guiding principles for phage genome annotation are (i) Only one frame in one strand for a protein coding gene, (ii) Genes do not overlap but less than few bp eg. 30 bp, (iii) Phages have high gene density with tightly packed genes, (iv) Most protein-coding genes will have coding potential, (v) Many phage genes are unique and will not have any homologues in the databases (and many more!)

She talked through a genome representation of Zizzle phage in the DNA Master software and how the data are represented. - She explained how to investigate a gene or ORF that has overlap and how the data should be read.

She briefed us on the Phamerator software using two phages Zizzles & LilMoolah and concluded with some tips on the DNA Master files.

Recommended tools

Aragorn (http://www.ansikte.se/ARAGORN/)

tRNA ScanSE (http://lowelab.ucsc.edu/tRNAscan-SE/)

NCBI: BLASTn & BLASTp

Phages DB: BLASTn & BLASTp

Phamerator (https://phamerator.org/)

DNA Master.

All the details about these tools can be found in the Bioinformatics guide (https://seaphagesbioinformatics.helpdocsonline.com/home).

Download Lecture PDF

SEA-PHAGES Bioinformatics Guide

🎓

APF & PHAGE DIRECTORY, FEBRUARY 15, 2021

Basics of phage genome annotation & classification: getting started, with Dr. Evelien Adriaenssens

Dr. Evelien Adriaenssens, Group Leader at the Quadram Institute Bioscience

Slides available here

Highlights of Evelien's talk, by Madhav Madurantakam Royam

Noutin Fernand, a PhD student from Pan African University, Kenya, introduced the speaker

First Evelien gave an overview of what the overall process looks like from the reads to the assembly: once we get our sequenced phage genomes, from there we annotate the genome and submit it to the databases, and all along the way we identify the genome ends & organisations, gene predictions, special features, regulatory elements and gene functional predictions.

A few concepts on the assembly, read mapping and annotation were introduced. She further explained the phage genome structure with circularly permutated and defined ends along with its implications.

Most of the phages have coding sequences on both the strands, but there are possibilities to have coding sequence in one strand alone. In case of any gap between the coding sequences, each frame must be analysed, and the start codon along with the ribosome binding site such as Shine-Dalgarno sequence must be examined.

After delineating the coding sequences, the functional annotation is performed using tools for protein function such as BLASTp, HHPred and structural prediction using the PHYRE2.

An essential criterion is the knowledge of the database in use because, for examples, RefSeq is a curated database while not all phages are present.

If you are unsure of the function, general rules for functional annotation is to label them as "hypothetical protein".

She also gave a brief introduction on ICTV (how to name and classify a phage), explained how to assign a new subfamily or family, and how the basic phage classification workflow works.

Recommended tools

All-in-one tools:

PATRIC (https://patricbrc.org)

Galaxy (https://cpt.tamu.edu/galaxy-pub).

Command-line tools for assembly:

SPAdes (https://cab.spbu.ru/software/spades)

Shovill (https://github.com/tseemann/shovill)

Megahit (https://github.com/voutcn/megahit)

Gene prediction tools:

Prodigal (https://github.com/hyattpd/Prodigal)

Glimmer (https://ccb.jhu.edu/software/glimmer/index.shtml)

GeneMarkS (https://exon.gatech.edu/GeneMark/)

PHANOTATE (https://github.com/deprekate/PHANOTATE)

Gene prediction & Annotation tools

Prokka (https://github.com/tseemann/prokka)

Balrog (https://github.com/salzberg-lab/Balrog)

Other useful tools

Kropinski suite of collected tools (https://molbiol-tools.ca)

STEP3 (https://step3.erc.monash.edu) for enhanced prediction of phage virion proteins

Easyfig (https://mjsull.github.io/Easyfig)

BRIG (http://brig.sourceforge.net)

CGview server (http://stothard.afns.ualberta.ca/cgview_server)

Highlights of Evelien's Q&A, by Rohit Kongari

Phage-term predicts terminal repeats based on read mapping and is accurate in most cases. However, Sanger sequencing can be used to verify the exact location and start/stop endpoints of the repeats.

Genome circularity works as a good measure for completeness reliably for phages with headful packaging or long terminal repeats, but not for phages with cos-sites/short overlaps or genomes that have been Nexterra-prepped.

Prokka can be a very helpful manual annotation tool while handling metagenomic datasets.

The viral clusters produced by the tool V-contact are related at the sub-family level according to the latest ICTV reorganization of species and genus levels.

Prophages predictions from bacterial genomes are not sufficient to submit them as complete genomes for the phage. - Showing evidence that the phage can be induced and verifying the completeness through whole genome sequencing should help avoid polluting viral databases with bacterial sequences.