Phage Bioinformatics

Welcome to Phage Directory's Phage Bioinformatics page! We'll add more talks, PDFs, and other resources throughout the year!

If you'd like to request a topic or a speaker, let Jessica know by email:

Join our Slack and come say "hi" in our #phage-bioinformatics channel!

Lecture Notes

IBRC & Phage Directory, February 16, 2021

Genomic Annotation and Comparative Analysis of Actinobacteriophages

Deborah Jacobs-Sera, Research Instructor, SEA-PHAGES, University of Pittsburgh

YouTube video of Phage Bioinfo #1

Highlights of Deborah's talk, by Madhav Madurantakam Royam

  • Dr. Urmi Bajpai, an Associate Professor from Acharya Narendra Dev College, University of Delhi India introduced the speaker, who oversees PHIRE (Phage Hunters Integrating Research and Education) and SEA-PHAGES (Science Education Alliance-Phage Hunters Advancing Genomics and Evolutionary Science) programs.
  • In Dr. Hatfull’s lab they have a collection of about 18,000 phages and they have sequenced around 3700 of them, almost all of the phages were isolated by the students through the PHIRE and SEA-PHAGES program.
  • The first steps for the students are: initial collection samples, purification, DNA extraction and archiving and the sequencing. - They use Illumina for sequencing, as they get good enough coverage and experience to analyse the genome without additional technology.
  • Dan Russell from their laboratory has written tutorials on how to finish an orientation and it is available in PhagesDB (
  • Most annotation is done manually as there are possibility to miss certain genes as phages don’t follows the rules of programming parameters.
  • An important skill for the bioinformatic analysis is pattern recognition.
  • The predictive coding potential is done using Glimmer and GeneMark tools which used the 4 nucleotide patterns using the Hidden Markov models.
  • Some of her guiding principles for phage genome annotation are (i) Only one frame in one strand for a protein coding gene, (ii) Genes do not overlap but less than few bp eg. 30 bp, (iii) Phages have high gene density with tightly packed genes, (iv) Most protein-coding genes will have coding potential, (v) Many phage genes are unique and will not have any homologues in the databases (and many more!)
  • She talked through a genome representation of Zizzle phage in the DNA Master software and how the data are represented. - She explained how to investigate a gene or ORF that has overlap and how the data should be read.
  • She briefed us on the Phamerator software using two phages Zizzles & LilMoolah and concluded with some tips on the DNA Master files.

Recommended tools

APF & Phage Directory, February 15, 2021

Basics of phage genome annotation & classification: getting started, with Dr. Evelien Adriaenssens

Dr. Evelien Adriaenssens, Group Leader at the Quadram Institute Bioscience

YouTube video of APF 2

Highlights of Evelien's talk, by Madhav Madurantakam Royam

  • Noutin Fernand, a PhD student from Pan African University, Kenya, introduced the speaker
  • First Evelien gave an overview of what the overall process looks like from the reads to the assembly: once we get our sequenced phage genomes, from there we annotate the genome and submit it to the databases, and all along the way we identify the genome ends & organisations, gene predictions, special features, regulatory elements and gene functional predictions.
  • A few concepts on the assembly, read mapping and annotation were introduced. She further explained the phage genome structure with circularly permutated and defined ends along with its implications.
  • Most of the phages have coding sequences on both the strands, but there are possibilities to have coding sequence in one strand alone. In case of any gap between the coding sequences, each frame must be analysed, and the start codon along with the ribosome binding site such as Shine-Dalgarno sequence must be examined.
  • After delineating the coding sequences, the functional annotation is performed using tools for protein function such as BLASTp, HHPred and structural prediction using the PHYRE2.
  • An essential criterion is the knowledge of the database in use because, for examples, RefSeq is a curated database while not all phages are present.
  • If you are unsure of the function, general rules for functional annotation is to label them as "hypothetical protein".
  • She also gave a brief introduction on ICTV (how to name and classify a phage), explained how to assign a new subfamily or family, and how the basic phage classification workflow works.

All-in-one tools:

Command-line tools for assembly:

Gene prediction tools:

Gene prediction & Annotation tools

Other useful tools

Highlights of Evelien's Q&A, by Rohit Kongari

  • Phage-term predicts terminal repeats based on read mapping and is accurate in most cases. However, Sanger sequencing can be used to verify the exact location and start/stop endpoints of the repeats.
  • Genome circularity works as a good measure for completeness reliably for phages with headful packaging or long terminal repeats, but not for phages with cos-sites/short overlaps or genomes that have been Nexterra-prepped.
  • Prokka can be a very helpful manual annotation tool while handling metagenomic datasets.
  • The viral clusters produced by the tool V-contact are related at the sub-family level according to the latest ICTV reorganization of species and genus levels.
  • Prophages predictions from bacterial genomes are not sufficient to submit them as complete genomes for the phage. - Showing evidence that the phage can be induced and verifying the completeness through whole genome sequencing should help avoid polluting viral databases with bacterial sequences.