Teachers : Kamil S. Jaron, José Cerca, Rishi De-Kayne, Lucía Campos, Siavash Mirarab
K-mers are an extremely powerful tool in addressing many genomics questions, especially in species without great reference genomes and annotations, which is for the vast majority of life and all sequencing projects of yet undescribed species. Although various k-mer based approaches are becoming more and more popular, many genomicists struggle with lack of good resources for deeper understanding of what's going on under the hood.
Objectives : The aim of the workshop will be to train and inspire genomics researchers to understand and utilize k-mer-based approaches for their sequencing projects in an online hands-on course. The workshop will contain several blocks detailed below with different types of k-mer analyzes, and will be interleaved with several talks on the application of the discussed approaches.
Prerequisites : Basic knowledge of command line (bash, UNIX). Previous experience with genomics data is recommended, however we will be open to everyone. A “pre-workshop” will be held on basics of genomics / command line.
Detailed program
Mo, 13.9. |
Tu, 14.9. |
We, 15.9. |
Th, 16.9. |
|
12:00 - 15:00 CET |
Welcome & Participant Intro Introduction to K-mer spectra analysis |
Characterization of genomes using k-mer spectra analysis |
Separate sub-genomes of an allopolyploid |
|
16:00 - 19:00 CET |
Bash Refresher |
Separating chromosomes by comparison of sequencing libraries |
Introduction to k-mers for analyzing skimming data |
Advanced use of k-mers for analyzing skimming data |
Bash Refresher
If you want to join the workshop but are just getting started with bioinformatics or are worried your bioinformatic skills might be a bit rusty then there's no need to exclaim 'Oh-know'! At the start of the workshop we will be running a short bash refresher module which will cover the basics of the command line and use of a computer cluster. We will discuss the different types of sequencing technologies available, what the output from these technologies look like, and how to get from your raw data to the k-mer analysis steps we will cover later on. This will include unpacking, exploring, and manipulating sequence data, doing basic quality control, and preparing summary statistics which will set you up for the more advanced bioinformatics tools and analyzes discussed later in the workshop.
K-mer spectra analysis
Most of the genomes sequenced are Pandora boxes - completely undescribed genomes. While cytological studies and flow cytometry are the best way to generate some general insights about the genome structure, they are hard to scale unfortunately requiring very different expertise. K-mer spectra analysis is an alternative way to infer basic genomic properties directly from sequencing data. It provides us with an elegant way to estimate heterozygosity, genome size and repetitive fractions prior genome assembly. In Introduction to K-mer spectra analysis module we will first understand the logic behind decomposing reads into k-mers and explore the basic properties of the k-mer spectra on a variety of genomes. In the follow up module Characterization of genomes using k-mer spectra analysis we will learn how to apply the k-mer spectra analysis on more complicated genomes, including polyploids.
Separating of sub-genomes
Many species show exciting karyotype variations - sex chromosomes differ between sexes, germ-line restricted chromosomes differ between soma and the germ line, accessory (B) chromosomes differ between different lineages or populations. As a consequence, we are able to separate chromosomes by comparison of sequencing libraries . We will show how sequencing libraries can be compared, how we can identify k-mers belonging to individual chromosomes or sub-genomes. Sometimes, however, the two libraries are not possible to sequence, for example if the chromosomes to separate are sister chromosomes of a polyploid species that evolved by a hybridization of two different species (allopolyploid) and the two parental species are unknown or even extinct. In separating sub-genomes of an allopolyploid block we will show how to use k-mers related to transposable element fossils to tease apart the two parental sub-genomes.
Analyzes skimming data
Genome skimming, the practice of sequencing genomes at low coverage (eg, 1X), is increasingly gaining popularity as a way of characterizing biodiversity. The resulting data, which could simply be called a bag of reads, cannot be assembled and in many applications, a reference genome that would allow mapping does not exist. Given these limitations, how are we to analyze the genome skims? The traditional approach is to simply assemble organelle genomes and use only a small fraction of the data. However, k-mer-based analyzes allow a wider range of analyzes. The quintessential computational question for many downstream analyzes is the following. Given two bags or reads covering the genome at low coverage, can we compute the distance between the genomes that generated those bags of reads? The answer is yes. Using k-mers, such distances can be computed with high accuracy. However, there are several adjacent opportunities and challenges that should be considered. Challenges include dealing with contamination and estimation of genomic parameters (eg, length and repeat spectra). Opportunities include phylogenetic placement of samples using skimming data and identification of mixtures. In this module, we present a suite of tools that deal with the goal of analyzing genome skims: Skmer for distance calculation, APPLES and MISA for phylogenetic placement using such distances, CONSULT for elimination of contamination, and RESPECT for estimation of genomic parameters.
Teaching format: Due to corona restrictions, the course will be fully online with talks on Zoom and the use of Slack and GitHub for the practical.
Target group: PhD students, master students, postdocs, researchers, museum staff with relevant background in biology.
Working language: English
Assignment and credits: The course is equivalent to 1 ECTS. ForBio will provide certificates for those that successfully complete the course assignment.
Application deadline: August 15th.
Registration: Registration is closed. There is no course fee. The course is open to ForBio members and associates. Becoming a ForBio associate is free and non-committal, but allows us to report our activities to our funders. You can register as a ForBio member/associate here.
Contact Hugo de Boer, Quentin Mauvisseau for more information.