Bioinformatics Curriculum — Full Learning Path

Phase 1 — Foundations

📅 Weeks 1–7 📦 5 modules 🗂 4 GitHub repos

✦ Prerequisites: none — this is where everyone starts

Weeks 1–2

Bash & Linux

The command line is the backbone of all bioinformatics. Every tool — STAR, GATK, Snakemake — runs here. Learn to navigate, write scripts, handle files, and manage processes before touching any bioinformatics software.

Navigation File operations Pipes & redirection Variables & loops Bash scripting Permissions Processes

bash-linux-practice ~14 hrs

Week 2

Git & GitHub

Every repo you build in this curriculum gets pushed to GitHub as proof of skill. Learn Git properly once — branching, commit messages, .gitignore — and use it every single day from here on.

init · clone · add · commit Branching & merging README.md .gitignore Push & pull

git-practice ~5 hrs

Week 3

Conda & environment management

Bioinformatics tools constantly conflict with each other. Conda environments isolate them perfectly. This is a daily-use skill — every pipeline you build will depend on it.

conda create · activate environment.yml bioconda channel mamba Exporting envs

conda-envs ~5 hrs

Weeks 4–5

R — beyond basics

You know R basics. Now master the tools used in genomics: tidyverse for data wrangling, ggplot2 for publication figures, R Markdown for reproducible reports, and Bioconductor for biology-specific packages.

tidyverse · dplyr · tidyr ggplot2 R Markdown purrr Bioconductor setup

r-genomics-notebooks ~10 hrs

Weeks 6–7

Python — bioinformatics focus

Extend your Python basics into bioinformatics-specific tools. pandas and NumPy handle large genomics tables. Biopython reads FASTA and FASTQ files. argparse lets you build command-line tools like the pros.

pandas · NumPy matplotlib · seaborn Biopython argparse CLI scripts

python-bioinfo-scripts ~12 hrs

▼ Complete Phase 1 before continuing

Phase 2 — Core bioinformatics

📅 Weeks 8–18 📦 8 modules 🗂 7 GitHub repos

✦ Prerequisites: Phase 1 complete — especially Bash, Git, and Conda

Week 8

QC & preprocessing

Every sequencing pipeline starts here — before you align a single read you need to check quality and remove adapter contamination. A bad QC step ruins every downstream result.

FastQC MultiQC Trimmomatic fastp FASTQ format Quality scores

rnaseq-qc-pipeline ~7 hrs

Weeks 9–10

Sequence alignment

Map sequencing reads to a reference genome. Understand SAM/BAM formats, alignment flags, and how to extract useful statistics. You cannot do differential expression or variant calling without this step.

STAR HISAT2 SAMtools SAM/BAM format featureCounts IGV

rnaseq-alignment-workflow ~14 hrs

Weeks 11–12

DESeq2 — differential expression

The most common analysis in RNA-seq biology. Go deep — understand the negative binomial model, produce publication-quality volcano plots and heatmaps, and build a fully reproducible R Markdown report.

DESeq2 lfcShrink Volcano & MA plots ComplexHeatmap PCA clusterProfiler GO / KEGG

deseq2-differential-expression ~14 hrs

Weeks 13–14

Variant calling

Call SNPs and indels from sequencing data — the foundation of GWAS, population genomics, and precision medicine. Follow GATK best practices, understand VCF format, and filter variants correctly.

GATK HaplotypeCaller VCF format BCFtools BQSR SnpEff

variant-calling-gatk ~14 hrs

Week 15

BLAST & Minimap2

Sequence similarity search and long-read alignment. BLAST finds homologous sequences across species. Minimap2 aligns long Oxford Nanopore and PacBio reads. Both are used weekly in plant genomics.

blastn · blastp · blastx E-value interpretation Minimap2 PAF format Output parsing in Python

sequence-search-tools ~7 hrs

Week 16

Biological databases

Know where the data lives and how to fetch it programmatically. Download genomes, annotations, and raw reads from NCBI, Ensembl, and TAIR without clicking through web interfaces — write scripts instead.

NCBI · SRA Ensembl BioMart TAIR Entrez Direct Biopython Entrez

database-queries ~7 hrs

Week 17

SQL & database querying

Listed in 55% of bioinformatics job postings. Write SELECT queries, JOINs, and aggregations to query genomics databases. Use SQLite locally and connect SQL to pandas — a short module with high career impact.

SELECT · WHERE · JOIN GROUP BY · ORDER BY SQLite pandas SQL bridge Genomics table queries

sql-bioinfo ~7 hrs

Week 18

Snakemake — workflow automation

Turn your individual scripts into one reproducible, automated pipeline. Snakemake is increasingly required in job postings — not just nice-to-have. Build the full RNA-seq pipeline end-to-end in a Snakefile.

Rules & wildcards config.yaml Conda envs per rule DAG diagram HPC cluster profiles

rnaseq-snakemake-pipeline ~7 hrs

▼ Complete Phase 2 before continuing

Phase 3 — Advanced & specialised

📅 Weeks 19–30 📦 9 modules 🗂 8 GitHub repos

✦ Prerequisites: Phases 1 & 2 — especially DESeq2, Snakemake, and variant calling

Weeks 19–20

GWAS & genomic prediction

Your MSc thesis skill — and your strongest differentiator. Implement genomic prediction models with rrBLUP and BGLR, calculate BPV, run cross-validation, and produce Manhattan plots. Push your actual thesis code here.

rrBLUP · BGLR GRM BPV & weighting Cross-validation Manhattan & QQ plots

genomic-selection-bpv ~14 hrs

Week 21

Machine learning basics

60% of life science companies are increasing ML investment. Learn classification, clustering, dimensionality reduction, and cross-validation with scikit-learn — applied directly to genomics datasets.

scikit-learn Classification Clustering PCA Cross-validation Feature importance

ml-bioinfo ~7 hrs

Week 22

Docker

Package your entire analysis so it runs identically on any machine — your laptop, a colleague's server, or a cloud cluster. Docker is now a standard requirement in pharma and biotech roles.

Images & containers Dockerfile BioContainers Volumes Docker Compose

docker-bioinfo ~7 hrs

Week 23

HPC & SLURM

Most bioinformatics compute happens on HPC clusters. Write SLURM job scripts, run array jobs across dozens of samples in parallel, and integrate Snakemake with your cluster's scheduler.

sbatch · squeue Array jobs Resource requests Snakemake + SLURM module load

hpc-slurm-scripts ~7 hrs

Week 24

Nextflow & nf-core

Many pharma and biotech companies use Nextflow over Snakemake. nf-core provides 100+ ready-made pipelines including nf-core/rnaseq. Knowing both workflow managers makes you versatile in any team.

Processes & channels DSL2 nf-core pipelines Config profiles nf-core/rnaseq

nextflow-pipelines ~7 hrs

Week 25

Jupyter notebooks

The standard format for shareable, reproducible analysis in Python — and increasingly in R via IRkernel. Every recruiter expects to see rendered notebooks on your GitHub. Learn best practices from day one.

JupyterLab Markdown cells nbconvert to HTML Best practices IRkernel for R

jupyter-bioinfo-notebooks ~5 hrs

Week 26

edgeR & limma

Complementary to DESeq2. Many published papers use edgeR or limma-voom. Knowing all three methods and when to use each makes you more versatile and credible in peer review discussions.

DGEList TMM normalisation limma-voom Contrasts DESeq2 vs edgeR comparison

deseq2-differential-expression ~7 hrs

Week 27

ChIP-seq & ATAC-seq

Epigenomics roles are growing fast. Learn peak calling with MACS2, differential binding analysis with DiffBind, and visualisation in IGV. Connects directly to transcriptomics — chromatin state drives gene expression.

Peak calling · MACS2 Differential binding DiffBind IGV visualisation Motif analysis basics

chipseq-atacseq ~7 hrs

Week 28

Advanced visualisation

Strong visualisation skills are immediately visible on GitHub and in papers. Build multi-panel publication figures, annotated heatmaps, Manhattan plots, and interactive HTML reports with Quarto.

ggplot2 · patchwork ComplexHeatmap Manhattan plots · CMplot seaborn · plotly Quarto reports

bioinformatics-visualisation ~7 hrs

Weeks 29–30

Capstone — plant genomics pipeline

Tie everything from Phases 1–3 together. One polished end-to-end pipeline on public plant data — Snakemake, DESeq2, Docker, Jupyter notebook, full README with workflow diagram. Tag a v1.0 release on GitHub.

Snakemake pipeline DESeq2 report Docker container Jupyter notebook v1.0 GitHub release

plant-genomics-pipeline ~21 hrs

▼ Complete Phase 3 before continuing

Phase 4 — Single-cell RNA-seq

📅 Weeks 31–40 📦 7 modules 🗂 4 GitHub repos

✦ Prerequisites: DESeq2 (Module 8), Python (Module 5), R beyond basics (Module 4)

Week 31

scRNA-seq concepts & data formats

Before writing any code, understand what makes single-cell data fundamentally different from bulk — the dropout problem, sparse matrices, AnnData and Seurat object structures, and which public datasets to use for practice.

10x Genomics · droplets Cell Ranger output Sparse matrices AnnData structure Seurat object

scrna-seq-learning ~7 hrs

Weeks 32–33

Seurat — core workflow

Seurat is the dominant scRNA-seq R package. Master the full standard workflow on the PBMC3k dataset — QC filtering, SCTransform normalisation, PCA, UMAP, Leiden clustering, marker detection, and cell type annotation.

QC metrics & filtering SCTransform PCA · UMAP FindClusters FindAllMarkers Cell type annotation

scrna-seurat-pbmc ~14 hrs

Week 34

Seurat — integration & batch correction

Real experiments have multiple samples from different batches. Learn to integrate them with Harmony and Seurat CCA — and understand when to use each. Remove batch effects without removing real biological signal.

Harmony integration Seurat CCA DoubletFinder Multi-sample UMAP Before/after comparison

scrna-seurat-pbmc ~7 hrs

Weeks 35–36

Scanpy — core workflow

Scanpy is the Python equivalent of Seurat and is increasingly preferred for large datasets. Reproduce the exact same PBMC analysis in Python — having both Seurat and Scanpy versions shows bilingual competence that few candidates demonstrate.

AnnData sc.pp.normalize_total PCA · UMAP Leiden clustering rank_genes_groups .h5ad export

scrna-scanpy-pbmc ~14 hrs

Week 37

Trajectory & pseudotime analysis

Model cell differentiation and developmental processes — especially relevant in plant biology. Order cells along a developmental path with Monocle3 in R, then model RNA velocity with scVelo in Python.

Monocle3 Pseudotime ordering scVelo RNA velocity Spliced / unspliced

scrna-trajectory-analysis ~7 hrs

Week 38

Differential expression in scRNA-seq

Your DESeq2 knowledge transfers here. The pseudo-bulk approach — aggregate cells per sample per cluster, then run DESeq2 — is the statistically correct method. Wilcoxon per-cluster testing is exploratory only.

Pseudo-bulk DE muscat package DESeq2 on sc data Wilcoxon vs pseudo-bulk Multi-condition comparison

scrna-seurat-pbmc ~7 hrs

Weeks 39–40

Plant scRNA-seq capstone

Your flagship GitHub project. Apply everything to a real plant scRNA-seq dataset — Arabidopsis root or rice. Full Seurat and Scanpy workflows, trajectory analysis, pseudo-bulk DE, and TAIR gene annotations. This is the crown jewel of your portfolio.

Seurat + Scanpy on plant data Trajectory analysis TAIR annotations Biological interpretation v1.0 GitHub release

plant-scrna-seq-analysis ~14 hrs

▼ Final phase — career readiness

Phase 5 — Career readiness

📅 Weeks 41–42 📦 1 module 🗂 1 GitHub repo

✦ Prerequisites: all phases complete — this ties everything together

Weeks 41–42

Scientific writing & communication

68% of hiring managers say the biggest gap in bioinformatics candidates is communication — not technical skill. Learn to write methods sections, craft compelling READMEs, present results to non-bioinformaticians, and write cover letters that get interviews.

Methods section writing README documentation Results narrative Presenting to biologists Cover letters

scientific-writing ~10 hrs

✓

After Week 42

You are job-ready

At this point you will have 19 GitHub repos with real code, a published plant scRNA-seq analysis, a full RNA-seq pipeline, a genomic selection model from your thesis, and the communication skills to explain all of it in an interview.

19 GitHub repos 31 skills proven ~100% job coverage

github.com/shajedurhossain 42 weeks total

Your complete path to becoming a bioinformatician

Follow the phases in order. Don't skip ahead.

Start at Phase 1

One module at a time

Push to GitHub

Use public data

Finish each phase

Phase 1 — Foundations

Phase 2 — Core bioinformatics

Phase 3 — Advanced & specialised

Phase 4 — Single-cell RNA-seq

Phase 5 — Career readiness

Ready to start?