events //

INTRODUCTION TO NEXT-GENERATION SEQUENCING DATA AND ANALYSIS COURSE

 

This course provides an introduction to the basics of next generation sequencing data and analysis, and includes hands-on exercises throughout.

 

Topics covered include:

  1. basics of the Linux command line;
  2. DNA/RNA preparation and sequencing technologies, including reduced-representation sequencing;
  3. what to do with newly-delivered sequencing data;
  4. assembling sequences of varying sizes and complexities;
  5. variant discovery;
  6. annotation of small to large datasets;
  7. ancient and variable-coverage DNA;
  8. transcriptomes and gene expression analysis; and
  9. walkthroughs of example analyses.

 

Emphasis throughout is on understanding fundamentals, and on developing skills for design of practical sequencing projects and analysis of sequencing data in light of research questions and biological and practical limitations.

 

Class time is limited, so students will be given more information in materials than will be covered in lectures and exercises. These materials are also considered essential for understanding NGS data analysis, but will not be covered directly in class so that we have sufficient time for exercises. Online sources of information will be emphasised.

 

 

PROGRAMME

 

Tuesday

 

9.30-12.30 | Introduction to the Linux command line

  • Entering a command, redirecting output, and pipes
  • Using your command history
  • Downloading a program and building it
  • Why write scripts
  • BioPerl, BioPython, R/BioConductor
  • How can I learn more?
  • Introduction to SeqAnswers, BioStars, StackOverflow, blogs

 

14.00-17.00 | Introduction to Next-Generation Sequencing

  • What is a genome, and what is a read, in light of NGS technologies?
  • Library construction
  • Sequencing technologies
  • Reduced-representation sequencing and OTUs
  • Trimming and cleaning new sequencing data
  • ** Hands-on examination and cleaning of NGS data
  • What questions can we answer just with reads?

 

Wednesday

 

9.30-12.30 | Introduction to Assembly

  • Small, medium and large assembly tasks
  • Overlap-consensus, de Briujn and hybrid approaches
  • ** Hands-on assembly
  • Reference-guided assembly
  • Challenges facing all assembly tasks
  • Assembly validation
  • ** Hands-on assembly validation

 

14.00-17.00 | Introduction to Variant Discovery

  • Types of variants
  • Read mapping and direct examination of mappings
  • ** Hands-on read mapping and examination of BAM files
  • NGS challenges to variant discovery: inference required
  • VCF files
  • Strength of variant calls and variant filtering
  • Online variant catalogs
  • Typical variant pipeline with GATK
  • ** Hands-on variant calling and filtering
  • Typical reduced-representation variant pipeline

 

Thursday

 

9.30-12.30 | Introduction to Annotation

  • What questions do you want to answer?
  • Quick similarity searches with Mummer and Blast
  • GFF and GTF files
  • Annotation pipelines
  • ** Hands-on examination of assembly with annotation
  • Homology searches for gene function
  • ** Domain-based searches
  • What do these variants affect?
  • ** Examine assembly, annotation and VCF files
  • ** Summaries with Bedtools and snpEff
  • Annotation of transcriptomes and reduced-representation assemblies
  • Annotation of repetitive content and assembly masking
  • Uncertainty in annotation

 

14.00-17.00 | Introduction to Transcriptomics and Gene Expression

  • What questions do you want to answer?
  • Biological considerations and limitations
  • Transcriptome assembly
  • ** Identifying isoforms
  • Basics of RNA read mapping
  • ** Examine RNA-seq read mapping
  • Basics of statistical analysis and sources of noise
  • Sampling design, replication, and sequencing design
  • ** Example of analysis with and without correct error structure

 

Friday

 

9.30-12.30 | Working with Low- and Variable-Coverage DNA

  • In what condition and how accessible is the DNA? RNA?
  • What do we now mean by "coverage"?
  • Amplification and capture
  • Ancient DNA and directly analysing reads
  • The DNA is fine: What is gained and lost with low-coverage sequencing
  • Working with single cells or single samples
  • Pooling: What is gained and what is lost

 

14.00-17.00 | Putting It All Together: Example Projects and Workflows

  • Working out 3-4 examples:

              - Population structure, divergence and diversity for a non-model species;
              - Genomic signatures of selection;
              - Gene expression differences between natural treatments;
              - Ancient DNA.

 

 

LECTURERS
Douglas Scofield
John Archer
Antonio Muñoz

 

 

DATES & LOGISTICS
The course will take place at CIBIO-InBIO, in Vairão Campus - Room 2, June 23-26, 2015, from 9.30-12.30 and 14.00-17.00 (24 hours).
The lectures will be in English.

 

 

REQUIREMENTS
The course is aimed for postgraduate researchers with particular interest in the analysis of NGS, genomics and transcriptomics data (preferably already working on such kind of projects).
Pending upon the number of applications received, PhD students directly involved in NGS & genomic studies can also be accepted.
All participants must bring their own personal laptop.

 

 

REGISTRATION
Deadline for registration is May 25, 2015.
Participation is free of charge.
The course accepts a minimum of 15 and a maximum of 20 attendees, selected according to their track record and the relevance of the course for their research and/or work.
Preference will be given to CIBIO-InBIO researchers, but other applications can be accepted.

To register, please send an email to newgen.course@cibio.up.pt with a one-paragraph explanation as to how the course will reinforce your research and/or work (preferably including a link to your web profile/short cv).

 

 

COURSE ORGANIZERS
Catarina Ginja 
Fredrik Oxelfelt
John Archer
Antonio Muñoz