Workshop – 1
Hands-on Workshop on Next-Generation Sequencing (NGS) Data Analysis and Bioinformatics Skills Development

Introduction
Next-Generation Sequencing (NGS) technologies have rapidly evolved and are now central to a wide array of biological and medical research. These technologies enable high-throughput sequencing of DNA and RNA, generating enormous volumes of data that hold valuable insights into genetic structure, variation, and function. However, the true power of NGS lies not just in data generation but in its correct interpretation—an area that requires specialized skills in bioinformatics.
Experimental researchers increasingly encounter the challenge of working with complex datasets but often lack the necessary informatics training to process, analyse, and interpret sequencing results. This workshop aims to bridge this skills gap by providing comprehensive training in the principles and practical approaches to NGS data analysis. Participants will learn to manage, process, and analyze sequencing data in a structured and reproducible manner using command-line environments and widely accepted methodologies.
Objectives
- Introduce the foundational principles of next-generation sequencing and its application in biological research.
- Develop hands-on skills in navigating and utilizing command-line interfaces for data handling and analysis.
- Build competency in quality assessment, read alignment, variant detection, RNA sequencing analysis, and genome assembly.
- Guide participants through workflows in genomic data analysis.
Target Audience
The workshop is designed for graduate students, early-career researchers, laboratory professionals, and educators from life sciences, particularly those working in genomics, biotechnology, microbiology, molecular biology, and related fields. Prior knowledge of programming is not required, although basic familiarity with biology and data handling is recommended.
Workshop Structure & Thematic Modules
1. Introduction to NGS and Bioinformatics Concepts
Participants will begin with an overview of current sequencing technologies and understand how data is generated and stored. Discussions will include the types of data files encountered in NGS and the significance of each in downstream analysis. Key algorithmic concepts such as sequence alignment, data mapping, and variation detection will be explained to establish a strong theoretical foundation.
2. Working in a UNIX/Linux Environment
This module introduces participants to a command-line interface, the primary environment for most bioinformatics analysis. They will learn to navigate directories, manage files, and execute essential commands for data manipulation. Emphasis will be placed on creating, organizing, and editing files; searching and retrieving data; and modifying file permissions for secure and efficient data handling.
3. Quality Control and Data Pre-processing
High-throughput data often contains sequencing errors or low-quality regions. This session focuses on inspecting raw sequencing data for quality metrics and applying filtering or trimming techniques to improve dataset reliability. Participants will interpret quality scores, identify sequencing artifacts, and prepare clean data for further analysis.
4. Sequence Alignment and Variant Analysis
A crucial part of NGS analysis involves aligning sequencing reads to a reference genome or transcriptome. In this module, participants will learn the logic behind sequence alignment, including how mismatches, gaps, and repeats are handled computationally. They will perform alignment of reads and then identify genetic variations such as single nucleotide polymorphisms (SNPs), insertions/deletions, and larger structural changes.
5. RNA Sequencing (RNA-Seq) Analysis
This module introduces gene expression analysis through RNA-Seq workflows. Participants will learn to quantify expression levels, perform normalization, and identify differentially expressed genes across experimental conditions. They will explore data visualization techniques to interpret biological significance from gene expression profiles.
6. Genome Assembly and Annotation
Participants will explore how sequencing reads can be used to assemble a genome, especially in cases where a reference genome is not available. They will learn the differences between reference-guided and de novo assembly approaches. Challenges such as repeat resolution, contig merging, and assembly validation will be addressed.
Resource Persons:

Dr. Muhammad Ilyas (Centre for Omic Sciences, Islamia College Peshawar)
Dr. Muhammad Ilyas is a prominent Pakistani genome biologist known for his pioneering work in human genetics, particularly within underserved communities. He earned his PhD in Molecular Biology from the University of the Punjab and completed postdoctoral training at University College London. Dr. Ilyas played a key role in decoding the first Pashtun genome and established the Pakistani Genetic Variants Database (PGVDB), a public resource for DNA variant data. Beyond research, he is an active science communicator, authoring genetics books in Urdu and Pashto, and organizing large-scale public science events like the Science Festivals. He has held leadership roles such as Director of the Centre for Human Genetics at Hazara University and continues to mentor young scientists in genomic research and bioinformatics. His work bridges cutting-edge genomic science with public engagement, aiming to make genetic knowledge accessible and impactful for local communities.

Dr. Waseem Haider (Department of Biosciences, COMSATS University Islamabad)
Dr. Waseem Haider is an Associate Professor of Bioinformatics in COMSATS University Islamabad’s Department of Biosciences. He earned his Ph.D. in Bioinformatics from the University of Illinois at Urbana-Champaign (USA) in 2014, bringing over two decades of teaching and research experience at CUI since joining in 2005. His expertise spans next-generation DNA/RNA sequencing, transcriptomics, machine learning applications in genomics, and public health data analysis, applied across areas including cancer genomics, infectious disease epidemiology, and agricultural genetics. Beyond academia, he is the founder and CEO of Next Gen. Solutions, a training and consulting firm that has provided bioinformatics and computational biology instruction to over 1,500 researchers, teaching tools such as Python, R, SAS, and PERL. A prolific researcher, Dr. Waseem has contributed to studies on soybean heat tolerance, hepatitis A outbreaks, plant stress response, and disease surveillance, garnering hundreds of citations in the field.

Dr. Valeed Khan (Rahman Medical Institute Peshawar)
A Clinical Scientist with extensive experience in molecular diagnostics, currently serving as a Principal Scientific Officer and team leader. I am dedicated to diagnosing infectious, genetic, and malignant diseases across fields such as virology, bacteriology, and oncology. My expertise also spans computational biology, allowing me to identify potential variants in inherited disorders, including neurodevelopmental conditions, thalassemia, and breast cancer. I have a demonstrated history of working in the healthcare industry, adhering to ISO: 15189 Accreditation standards.