Tech & Coding · Bioinformatics Track

Your complete path to becoming a bioinformatician

A structured, job-market-aligned curriculum built from the ground up. Follow the phases in order — each one builds directly on the last. Every module ends with a real GitHub repository you push as proof of skill.

31 Modules
5 Phases
42 Weeks at 1 hr/day
19 GitHub repos
100% Free forever
How this works

Follow the phases in order. Don't skip ahead.

01

Start at Phase 1

Even if you know some R or Python, Phase 1 fills critical gaps — Bash, Git, and Conda are used in every single later module.

02

One module at a time

Each module is 1–2 weeks at 1 hour per day. Complete all lessons, do all exercises, then move on. Depth beats speed.

03

Push to GitHub

Every module ends with a real repo push. This is not optional — your GitHub is your portfolio. No repo = no proof.

04

Use public data

All exercises use freely available datasets from GEO, SRA, or 10x Genomics. You never need to pay for data.

05

Finish each phase

Each phase ends with a capstone project that ties all modules together. Complete the capstone before starting the next phase.

1

Phase 1 — Foundations

📅 Weeks 1–7 📦 5 modules 🗂 4 GitHub repos
✦ Prerequisites: none — this is where everyone starts
01
Weeks 1–2
Bash & Linux

The command line is the backbone of all bioinformatics. Every tool — STAR, GATK, Snakemake — runs here. Learn to navigate, write scripts, handle files, and manage processes before touching any bioinformatics software.

Navigation File operations Pipes & redirection Variables & loops Bash scripting Permissions Processes
bash-linux-practice ~14 hrs
02
Week 2
Git & GitHub

Every repo you build in this curriculum gets pushed to GitHub as proof of skill. Learn Git properly once — branching, commit messages, .gitignore — and use it every single day from here on.

init · clone · add · commit Branching & merging README.md .gitignore Push & pull
git-practice ~5 hrs
03
Week 3
Conda & environment management

Bioinformatics tools constantly conflict with each other. Conda environments isolate them perfectly. This is a daily-use skill — every pipeline you build will depend on it.

conda create · activate environment.yml bioconda channel mamba Exporting envs
conda-envs ~5 hrs
04
Weeks 4–5
R — beyond basics

You know R basics. Now master the tools used in genomics: tidyverse for data wrangling, ggplot2 for publication figures, R Markdown for reproducible reports, and Bioconductor for biology-specific packages.

tidyverse · dplyr · tidyr ggplot2 R Markdown purrr Bioconductor setup
r-genomics-notebooks ~10 hrs
05
Weeks 6–7
Python — bioinformatics focus

Extend your Python basics into bioinformatics-specific tools. pandas and NumPy handle large genomics tables. Biopython reads FASTA and FASTQ files. argparse lets you build command-line tools like the pros.

pandas · NumPy matplotlib · seaborn Biopython argparse CLI scripts
python-bioinfo-scripts ~12 hrs
▼ Complete Phase 1 before continuing
2

Phase 2 — Core bioinformatics

📅 Weeks 8–18 📦 8 modules 🗂 7 GitHub repos
✦ Prerequisites: Phase 1 complete — especially Bash, Git, and Conda
06
Week 8
QC & preprocessing

Every sequencing pipeline starts here — before you align a single read you need to check quality and remove adapter contamination. A bad QC step ruins every downstream result.

FastQC MultiQC Trimmomatic fastp FASTQ format Quality scores
rnaseq-qc-pipeline ~7 hrs
07
Weeks 9–10
Sequence alignment

Map sequencing reads to a reference genome. Understand SAM/BAM formats, alignment flags, and how to extract useful statistics. You cannot do differential expression or variant calling without this step.

STAR HISAT2 SAMtools SAM/BAM format featureCounts IGV
rnaseq-alignment-workflow ~14 hrs
08
Weeks 11–12
DESeq2 — differential expression

The most common analysis in RNA-seq biology. Go deep — understand the negative binomial model, produce publication-quality volcano plots and heatmaps, and build a fully reproducible R Markdown report.

DESeq2 lfcShrink Volcano & MA plots ComplexHeatmap PCA clusterProfiler GO / KEGG
deseq2-differential-expression ~14 hrs
09
Weeks 13–14
Variant calling

Call SNPs and indels from sequencing data — the foundation of GWAS, population genomics, and precision medicine. Follow GATK best practices, understand VCF format, and filter variants correctly.

GATK HaplotypeCaller VCF format BCFtools BQSR SnpEff
variant-calling-gatk ~14 hrs
10
Week 15
BLAST & Minimap2

Sequence similarity search and long-read alignment. BLAST finds homologous sequences across species. Minimap2 aligns long Oxford Nanopore and PacBio reads. Both are used weekly in plant genomics.

blastn · blastp · blastx E-value interpretation Minimap2 PAF format Output parsing in Python
sequence-search-tools ~7 hrs
11
Week 16
Biological databases

Know where the data lives and how to fetch it programmatically. Download genomes, annotations, and raw reads from NCBI, Ensembl, and TAIR without clicking through web interfaces — write scripts instead.

NCBI · SRA Ensembl BioMart TAIR Entrez Direct Biopython Entrez
database-queries ~7 hrs
12
Week 17
SQL & database querying

Listed in 55% of bioinformatics job postings. Write SELECT queries, JOINs, and aggregations to query genomics databases. Use SQLite locally and connect SQL to pandas — a short module with high career impact.

SELECT · WHERE · JOIN GROUP BY · ORDER BY SQLite pandas SQL bridge Genomics table queries
sql-bioinfo ~7 hrs
13
Week 18
Snakemake — workflow automation

Turn your individual scripts into one reproducible, automated pipeline. Snakemake is increasingly required in job postings — not just nice-to-have. Build the full RNA-seq pipeline end-to-end in a Snakefile.

Rules & wildcards config.yaml Conda envs per rule DAG diagram HPC cluster profiles
rnaseq-snakemake-pipeline ~7 hrs
▼ Complete Phase 2 before continuing
3

Phase 3 — Advanced & specialised

📅 Weeks 19–30 📦 9 modules 🗂 8 GitHub repos
✦ Prerequisites: Phases 1 & 2 — especially DESeq2, Snakemake, and variant calling
14
Weeks 19–20
GWAS & genomic prediction

Your MSc thesis skill — and your strongest differentiator. Implement genomic prediction models with rrBLUP and BGLR, calculate BPV, run cross-validation, and produce Manhattan plots. Push your actual thesis code here.

rrBLUP · BGLR GRM BPV & weighting Cross-validation Manhattan & QQ plots
genomic-selection-bpv ~14 hrs
15
Week 21
Machine learning basics

60% of life science companies are increasing ML investment. Learn classification, clustering, dimensionality reduction, and cross-validation with scikit-learn — applied directly to genomics datasets.

scikit-learn Classification Clustering PCA Cross-validation Feature importance
ml-bioinfo ~7 hrs
16
Week 22
Docker

Package your entire analysis so it runs identically on any machine — your laptop, a colleague's server, or a cloud cluster. Docker is now a standard requirement in pharma and biotech roles.

Images & containers Dockerfile BioContainers Volumes Docker Compose
docker-bioinfo ~7 hrs
17
Week 23
HPC & SLURM

Most bioinformatics compute happens on HPC clusters. Write SLURM job scripts, run array jobs across dozens of samples in parallel, and integrate Snakemake with your cluster's scheduler.

sbatch · squeue Array jobs Resource requests Snakemake + SLURM module load
hpc-slurm-scripts ~7 hrs
18
Week 24
Nextflow & nf-core

Many pharma and biotech companies use Nextflow over Snakemake. nf-core provides 100+ ready-made pipelines including nf-core/rnaseq. Knowing both workflow managers makes you versatile in any team.

Processes & channels DSL2 nf-core pipelines Config profiles nf-core/rnaseq
nextflow-pipelines ~7 hrs
19
Week 25
Jupyter notebooks

The standard format for shareable, reproducible analysis in Python — and increasingly in R via IRkernel. Every recruiter expects to see rendered notebooks on your GitHub. Learn best practices from day one.

JupyterLab Markdown cells nbconvert to HTML Best practices IRkernel for R
jupyter-bioinfo-notebooks ~5 hrs
20
Week 26
edgeR & limma

Complementary to DESeq2. Many published papers use edgeR or limma-voom. Knowing all three methods and when to use each makes you more versatile and credible in peer review discussions.

DGEList TMM normalisation limma-voom Contrasts DESeq2 vs edgeR comparison
deseq2-differential-expression ~7 hrs
21
Week 27
ChIP-seq & ATAC-seq

Epigenomics roles are growing fast. Learn peak calling with MACS2, differential binding analysis with DiffBind, and visualisation in IGV. Connects directly to transcriptomics — chromatin state drives gene expression.

Peak calling · MACS2 Differential binding DiffBind IGV visualisation Motif analysis basics
chipseq-atacseq ~7 hrs
22
Week 28
Advanced visualisation

Strong visualisation skills are immediately visible on GitHub and in papers. Build multi-panel publication figures, annotated heatmaps, Manhattan plots, and interactive HTML reports with Quarto.

ggplot2 · patchwork ComplexHeatmap Manhattan plots · CMplot seaborn · plotly Quarto reports
bioinformatics-visualisation ~7 hrs
23
Weeks 29–30
Capstone — plant genomics pipeline

Tie everything from Phases 1–3 together. One polished end-to-end pipeline on public plant data — Snakemake, DESeq2, Docker, Jupyter notebook, full README with workflow diagram. Tag a v1.0 release on GitHub.

Snakemake pipeline DESeq2 report Docker container Jupyter notebook v1.0 GitHub release
plant-genomics-pipeline ~21 hrs
▼ Complete Phase 3 before continuing
4

Phase 4 — Single-cell RNA-seq

📅 Weeks 31–40 📦 7 modules 🗂 4 GitHub repos
✦ Prerequisites: DESeq2 (Module 8), Python (Module 5), R beyond basics (Module 4)
24
Week 31
scRNA-seq concepts & data formats

Before writing any code, understand what makes single-cell data fundamentally different from bulk — the dropout problem, sparse matrices, AnnData and Seurat object structures, and which public datasets to use for practice.

10x Genomics · droplets Cell Ranger output Sparse matrices AnnData structure Seurat object
scrna-seq-learning ~7 hrs
25
Weeks 32–33
Seurat — core workflow

Seurat is the dominant scRNA-seq R package. Master the full standard workflow on the PBMC3k dataset — QC filtering, SCTransform normalisation, PCA, UMAP, Leiden clustering, marker detection, and cell type annotation.

QC metrics & filtering SCTransform PCA · UMAP FindClusters FindAllMarkers Cell type annotation
scrna-seurat-pbmc ~14 hrs
26
Week 34
Seurat — integration & batch correction

Real experiments have multiple samples from different batches. Learn to integrate them with Harmony and Seurat CCA — and understand when to use each. Remove batch effects without removing real biological signal.

Harmony integration Seurat CCA DoubletFinder Multi-sample UMAP Before/after comparison
scrna-seurat-pbmc ~7 hrs
27
Weeks 35–36
Scanpy — core workflow

Scanpy is the Python equivalent of Seurat and is increasingly preferred for large datasets. Reproduce the exact same PBMC analysis in Python — having both Seurat and Scanpy versions shows bilingual competence that few candidates demonstrate.

AnnData sc.pp.normalize_total PCA · UMAP Leiden clustering rank_genes_groups .h5ad export
scrna-scanpy-pbmc ~14 hrs
28
Week 37
Trajectory & pseudotime analysis

Model cell differentiation and developmental processes — especially relevant in plant biology. Order cells along a developmental path with Monocle3 in R, then model RNA velocity with scVelo in Python.

Monocle3 Pseudotime ordering scVelo RNA velocity Spliced / unspliced
scrna-trajectory-analysis ~7 hrs
29
Week 38
Differential expression in scRNA-seq

Your DESeq2 knowledge transfers here. The pseudo-bulk approach — aggregate cells per sample per cluster, then run DESeq2 — is the statistically correct method. Wilcoxon per-cluster testing is exploratory only.

Pseudo-bulk DE muscat package DESeq2 on sc data Wilcoxon vs pseudo-bulk Multi-condition comparison
scrna-seurat-pbmc ~7 hrs
30
Weeks 39–40
Plant scRNA-seq capstone

Your flagship GitHub project. Apply everything to a real plant scRNA-seq dataset — Arabidopsis root or rice. Full Seurat and Scanpy workflows, trajectory analysis, pseudo-bulk DE, and TAIR gene annotations. This is the crown jewel of your portfolio.

Seurat + Scanpy on plant data Trajectory analysis TAIR annotations Biological interpretation v1.0 GitHub release
plant-scrna-seq-analysis ~14 hrs
▼ Final phase — career readiness
5

Phase 5 — Career readiness

📅 Weeks 41–42 📦 1 module 🗂 1 GitHub repo
✦ Prerequisites: all phases complete — this ties everything together
31
Weeks 41–42
Scientific writing & communication

68% of hiring managers say the biggest gap in bioinformatics candidates is communication — not technical skill. Learn to write methods sections, craft compelling READMEs, present results to non-bioinformaticians, and write cover letters that get interviews.

Methods section writing README documentation Results narrative Presenting to biologists Cover letters
scientific-writing ~10 hrs
After Week 42
You are job-ready

At this point you will have 19 GitHub repos with real code, a published plant scRNA-seq analysis, a full RNA-seq pipeline, a genomic selection model from your thesis, and the communication skills to explain all of it in an interview.

19 GitHub repos 31 skills proven ~100% job coverage
github.com/shajedurhossain 42 weeks total

Ready to start?

Phase 1 is live. Open a terminal, boot into Ubuntu, and begin Lesson 1 right now — no sign-up required.

Start Lesson 1: Bash & Linux → Browse all courses