Lesson 3: Pipes & Redirection — Bash & Linux

01 The three streams — the foundation of everything

Before you can understand pipes and redirection, you need to understand one key concept: every Linux command has three numbered channels for receiving and sending data. These are called streams.

🔌 stdin, stdout, stderr — the three streams

Think of a command like a machine with three connections: one input pipe and two output pipes. Data flows in through the input, and results come out through two outputs — one for normal results and one for error messages.

Number	Name	What it carries	Default destination
0	stdin (standard input)	Data going into the command	Your keyboard
1	stdout (standard output)	Normal results coming out of the command	Your screen
2	stderr (standard error)	Error messages coming out of the command	Your screen (separate from stdout)

By default, all output — results and errors — appear on your screen together. But they are separate numbered streams and can be controlled independently. The operators >, >>, 2>, and | redirect these streams to files or other commands.

02 Redirecting stdout — > and >>

📤 How > and >> redirect stream 1 (stdout)

When you type ls -lh, the results go to your screen because stdout (stream 1) defaults to the screen. Adding > after a command intercepts stream 1 and sends it to a file instead.

> (single arrow) — creates the file if it does not exist. If it already exists, its entire contents are replaced with no warning. Think: "write from scratch".

>> (double arrow) — appends to the end of the file without touching existing contents. Think: "add to the bottom".

The output file appears in whichever folder you are currently in unless you specify a full path. Always run pwd first if unsure where you are.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

cd ~/bash-linux-bioinformatics/module-1-foundations

# No redirection: results go to screen
ls -lh

# With >: results go to file_list.txt in current folder
ls -lh > file_list.txt

# With >>: results are APPENDED to file_list.txt
ls -lh >> file_list.txt

# Save to a specific location (not current folder)
ls -lh > ~/bash-linux-bioinformatics/results/file_list.txt

# Verify it was created and see its contents
cat file_list.txt

⚠

A single > silently destroys the previous file contents. There is no undo. When in doubt, always use >>.

03 Redirecting stderr — what 2> means

⚠ Why stderr is separate — and why 2> matters in bioinformatics

Remember the stream numbers: stdout is stream 1, and stderr is stream 2. The 2> operator works exactly like >, but it redirects stream 2 (error messages) instead of stream 1 (normal output).

In bioinformatics, tools like STAR, GATK, Trimmomatic, and featureCounts print their progress messages and warnings to stderr. The actual results (aligned reads, variant calls, count tables) go to stdout or to output files you specify. The logs go to stderr.

This matters because if you only redirect stdout, progress messages still flood your screen. To work cleanly:

Use > results.txt to save your actual output
Use 2> run.log to save the progress log to a separate file

Now your screen is clean, your results are in one file, and your log is in another. This is standard practice for every bioinformatics pipeline run.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

cd ~/bash-linux-bioinformatics/module-1-foundations

# Without 2>: error message prints to screen
ls nonexistent_folder
ls: cannot access 'nonexistent_folder': No such file or directory

# With 2>: error is saved to file, nothing prints to screen
ls nonexistent_folder 2> error.log
cat error.log
ls: cannot access 'nonexistent_folder': No such file or directory

# 2>> appends errors to an existing log file
ls another_missing 2>> error.log

# Redirect stdout to results AND stderr to log (separately)
ls -lh > output.txt 2> errors.log

# Send both stdout and stderr to the same file
ls -lh > everything.log 2>&1
# 2>&1 means "send stderr (2) to wherever stdout (1) is going"

Reading 2>&1: The 2 is stderr, > means redirect, &1 means "to wherever stream 1 is currently going". So > file.log 2>&1 sends stdout to the file, then sends stderr to the same file.

04 /dev/null — the black hole

🕳 What /dev/null is and when to use it

/dev/null is a special file built into Linux. It is sometimes called the "black hole" — anything you send to it disappears immediately and permanently. No disk space is used. Nothing is stored.

You use it when a command produces output you simply do not want to see. For example, some tools print dozens of informational lines every run. If you are running hundreds of samples and you know everything is working, that output just clutters your terminal. Redirect it to /dev/null to silence it completely.

Run from: anywhere

bash

# Silence error messages only
ls nonexistent 2> /dev/null

# Save results to file but silence progress messages
# (example pattern for a bioinformatics tool)
# some_tool > results.txt 2> /dev/null

# Silence everything: both stdout and stderr
# some_tool > /dev/null 2>&1

05 The pipe operator — connecting commands

🔗 What a pipe does and why it matters

The pipe character | takes the stdout of one command and feeds it directly as the stdin of the next command — without writing anything to disk. Data flows left to right.

Compare the same task without and with a pipe:

Without pipe: run ls -lh, save to a temp file, run wc -l on it, delete the temp file. Three steps and a leftover file.
With pipe: ls -lh | wc -l. One step. No temp file. Immediate result.

You can chain as many commands as you need. The output of each becomes the input of the next.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

cd ~/bash-linux-bioinformatics/module-1-foundations

# Without pipe: two steps
ls -lh > temp.txt
wc -l temp.txt

# With pipe: one step, no temp file
ls -lh | wc -l

# Count .sh scripts in current folder
ls | grep ".sh" | wc -l

# Show the 5 largest files (ls -S sorts by size, largest first)
ls -lhS | head -n 6

💡

Build pipes one step at a time. Run the first command alone to see its output. Then add | next_command and run again. Keep adding one step at a time. This is how experts debug pipes.

06 grep — search for patterns in text

🔍 Why grep is one of the most-used commands in bioinformatics

grep stands for Global Regular Expression Print. It searches through a file or piped input and prints every line that contains a matching pattern.

In bioinformatics you use grep constantly: finding all read headers in a FASTQ file, skipping comment lines in a VCF, searching for a gene name in an annotation file, finding error messages in a log. It is one of the ten commands you use every day.

Two essential special characters: ^ means "start of line" — so grep "^@" matches lines that start with @, exactly what FASTQ read headers look like. $ means "end of line".

The -v flag (invert) prints every line that does not match. In bioinformatics this is how you skip VCF header lines: grep -v "^#" variants.vcf gives you only the variant data lines.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

cd ~/bash-linux-bioinformatics/module-1-foundations

grep "Sorghum" species.txt         # find lines containing "Sorghum"
grep -i "sorghum" species.txt       # -i: case-insensitive (matches Sorghum, SORGHUM, sorghum)
grep -v "Sorghum" species.txt       # -v: invert -- show lines that do NOT match
grep -c "Sorghum" species.txt       # -c: count matching lines (prints a number)
grep -n "Sorghum" species.txt       # -n: show line numbers alongside matches
grep -r "Sorghum" ~/bash-linux-bioinformatics/  # -r: search all files in folder recursively

# Bioinformatics-specific uses
grep "^@" sample.fastq              # find all read headers (lines starting with @)
grep -v "^#" variants.vcf           # skip VCF header lines (lines starting with #)
grep "ERROR" pipeline.log           # find all error lines in a log file

07 sort & uniq — organise and deduplicate

`sort` — Sort lines

🔤 Why -n matters when sorting numbers

sort sorts lines of text. By default it sorts alphabetically, which causes a problem with numbers: alphabetically, 10 comes before 9 because "1" sorts before "9". This is wrong for numerical data. The -n flag switches to proper numerical sorting: 9 before 10.

Always use -n when sorting numbers. In bioinformatics this matters when sorting variant positions, coverage values, or counts.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

sort species.txt                   # sort alphabetically (default)
sort -r species.txt                 # -r: reverse order
sort -n numbers.txt                 # -n: sort numerically (not alphabetically)
sort -k2 data.txt                  # -k2: sort by second column
sort -u species.txt                 # -u: sort and remove duplicates in one step

`uniq` — Remove duplicate lines

🔄 The critical rule: always sort before uniq

uniq removes duplicate lines — but only if they are next to each other. If you run uniq without sorting first, it will only catch duplicates that happen to be adjacent. Duplicates scattered through the file are missed.

The standard pattern is always sort | uniq. Sorting groups identical lines together first, then uniq can remove them reliably.

The -c flag prepends a count to each line showing how many times it appeared. Combine with sort -rn to see which items appear most frequently — very useful for analysing gene lists, sample IDs, or species counts.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash

sort species.txt | uniq            # sort first, then remove duplicates
sort species.txt | uniq -c         # -c: count how many times each line appears
sort species.txt | uniq -d         # -d: show only lines that ARE duplicated
sort species.txt | uniq -u         # -u: show only lines that appear exactly once

08 Combining pipes and redirection

Pipes connect commands. Redirection saves results. They work together perfectly. Here are real bioinformatics patterns you will use regularly.

Run from: ~/bash-linux-bioinformatics/module-1-foundations

bash — real bioinformatics patterns

# Count reads in a FASTQ file (each header starts with @)
grep -c "^@" sample.fastq

# Most common species, ranked highest to lowest
sort species.txt | uniq -c | sort -rn | head -n 10

# Filter results and save to a specific location
grep "Sorghum" species.txt | sort > ~/bash-linux-bioinformatics/results/sorghum_only.txt

# Count how many files are in a folder
ls ~/bash-linux-bioinformatics/ | wc -l

💡

You can mix pipes and redirection in the same command. The | connects commands in a chain. The > at the end saves the final output to a file. They do not interfere with each other.

09 Quick reference

Operator / Command	What it does	Key flags
>	Redirect stdout (stream 1) to a file — overwrites	—
>>	Redirect stdout to a file — appends	—
2>	Redirect stderr (stream 2) to a file — overwrites	`2>>` to append
2>&1	Send stderr to the same place as stdout	—
/dev/null	Discard output completely	—
\|	Pipe stdout of left command into stdin of right command	—
grep [pattern]	Print lines matching a pattern	`-i` case-insensitive · `-v` invert · `-c` count · `-n` line numbers · `-r` recursive
sort	Sort lines alphabetically by default	`-r` reverse · `-n` numeric · `-u` unique · `-k` column
uniq	Remove consecutive duplicates — always sort first	`-c` count · `-d` duplicates only · `-u` unique only

10 Exercises

Work through all five exercises in your Ubuntu terminal. Type every command yourself — do not copy-paste.

Exercise 1Redirect and append

Navigate to ~/bash-linux-bioinformatics/module-1-foundations/. Create a file called genomes.txt with four genome names one per line: GRCh38, GRCm39, TAIR10, Sbi3.1.1. Then append a fifth: B73v5. Print the file to confirm all five lines are there.

💬 Hint: use > for the first line only. Use >> for lines 2–5.

Show answer

cd ~/bash-linux-bioinformatics/module-1-foundations
echo "GRCh38" > genomes.txt
echo "GRCm39" >> genomes.txt
echo "TAIR10" >> genomes.txt
echo "Sbi3.1.1" >> genomes.txt
echo "B73v5" >> genomes.txt
cat genomes.txt
GRCh38
GRCm39
TAIR10
Sbi3.1.1
B73v5

Exercise 2grep — search and filter

Using genomes.txt in ~/bash-linux-bioinformatics/module-1-foundations/: find all entries containing the letter G. Then find entries that do NOT contain G. Finally count how many contain G.

💬 Hint: three grep commands — plain, then -v, then -c.

Show answer

cd ~/bash-linux-bioinformatics/module-1-foundations
grep "G" genomes.txt
GRCh38
GRCm39
grep -v "G" genomes.txt
TAIR10
Sbi3.1.1
B73v5
grep -c "G" genomes.txt
2

Exercise 3Pipe — three commands in one line

List all items in /etc, pipe through grep to find items containing apt, then pipe into wc -l to count them. Do all three in a single piped command.

Show answer

ls /etc | grep "apt" | wc -l
1    # number may vary on your system

Exercise 4sort and uniq — count duplicates

Navigate to ~/bash-linux-bioinformatics/module-1-foundations/. Create duplicates.txt with these seven lines: maize, rice, sorghum, rice, maize, wheat, maize. Then use a piped command to count how many times each species appears, sorted highest to lowest.

💬 Hint: sort | uniq -c | sort -rn. Sort before uniq or duplicates will be missed.

Show answer

cd ~/bash-linux-bioinformatics/module-1-foundations
echo "maize" > duplicates.txt
echo "rice" >> duplicates.txt
echo "sorghum" >> duplicates.txt
echo "rice" >> duplicates.txt
echo "maize" >> duplicates.txt
echo "wheat" >> duplicates.txt
echo "maize" >> duplicates.txt
sort duplicates.txt | uniq -c | sort -rn
      3 maize
      2 rice
      1 sorghum
      1 wheat

Exercise 5 · ChallengeFASTQ headers and stderr redirection

From ~/bash-linux-bioinformatics/module-1-foundations/, extract all read header lines from sample.fastq (lines starting with @), count them with a pipe, and save the count to ~/bash-linux-bioinformatics/results/read_count.txt. Then try to list a nonexistent folder and redirect the error to error.log in the same folder. Check both files were created.

💬 Hint: grep "^@" sample.fastq | wc -l > path/read_count.txt. For the error: ls nonexistent 2> error.log.

Show answer

cd ~/bash-linux-bioinformatics/module-1-foundations
grep "^@" sample.fastq | wc -l > ~/bash-linux-bioinformatics/results/read_count.txt
cat ~/bash-linux-bioinformatics/results/read_count.txt
3   # 3 read headers = 3 reads

ls nonexistent_folder 2> error.log
cat error.log
ls: cannot access 'nonexistent_folder': No such file or directory
# Error was saved to file instead of printing to screen

Pipes & Redirection

01 The three streams — the foundation of everything

🔌 stdin, stdout, stderr — the three streams

02 Redirecting stdout — > and >>

📤 How > and >> redirect stream 1 (stdout)

03 Redirecting stderr — what 2> means

⚠ Why stderr is separate — and why 2> matters in bioinformatics

04 /dev/null — the black hole

🕳 What /dev/null is and when to use it

05 The pipe operator — connecting commands

🔗 What a pipe does and why it matters

06 grep — search for patterns in text

🔍 Why grep is one of the most-used commands in bioinformatics

07 sort & uniq — organise and deduplicate

sort — Sort lines

🔤 Why -n matters when sorting numbers

uniq — Remove duplicate lines

🔄 The critical rule: always sort before uniq

08 Combining pipes and redirection

09 Quick reference

10 Exercises

`sort` — Sort lines

`uniq` — Remove duplicate lines