Lesson 3 of 13 · ⏱ 45 min · ✓ Free

Pipes & Redirection

Pipes are what make Linux truly powerful. Instead of writing one command at a time, you chain commands together so the output of one flows directly into the next — building complex analyses from simple building blocks. This is how real bioinformatics pipelines work.

01 The mental model

Every Linux command has three streams — think of them as channels:

stdin (0) — standard input. Where the command reads data from. By default: your keyboard.

stdout (1) — standard output. Where the command sends its results. By default: your screen.

stderr (2) — standard error. Where error messages go. By default: also your screen.

Redirection and pipes let you control these streams. Instead of results going to your screen, you can send them to a file. Instead of a command reading from your keyboard, you can feed it the output of another command. This is the foundation of every bioinformatics pipeline ever written.

02 Redirection

You already used > and >> in Lesson 2. Now let's understand all the redirection operators properly.

Output redirection — > and >>

bash
ls -lh > file_list.txt       # save ls output to a file (overwrites)
ls -lh >> file_list.txt      # append ls output to the file
echo "run complete" > log.txt  # write a log message

Error redirection — 2> and 2>>

In bioinformatics, tools like STAR and GATK print progress messages and warnings to stderr. You often want to save these to a log file separately from the actual results.

bash
ls nonexistent_folder 2> error.log      # redirect error message to a file
ls nonexistent_folder 2>> error.log     # append error to existing log

Redirect both stdout and stderr

When running long pipelines you often want to capture everything — both normal output and errors — into one log file.

bash
ls -lh > output.log 2>&1          # stdout to file, stderr to same file
ls -lh &> output.log              # shorthand — same result

Discard output — /dev/null

/dev/null is a special file that swallows everything sent to it. Use it to silence commands you don't care about.

bash
ls nonexistent 2> /dev/null        # silence error messages
some_tool > results.txt 2> /dev/null # save results, silence all errors

03 The pipe operator |

The pipe | takes the output of one command and feeds it directly as input to the next — without writing anything to disk. You can chain as many commands as you need.

The pattern: command1 | command2 | command3

The output of command1 becomes the input of command2, whose output becomes the input of command3. Data flows left to right through the pipe.

bash
# Without a pipe — two separate steps
ls -lh > temp.txt
wc -l temp.txt

# With a pipe — one elegant step
ls -lh | wc -l

# Count how many .sh files are in a folder
ls | grep ".sh" | wc -l

# Show the 5 largest files in current folder
ls -lhS | head -n 6
💡

Build pipes one step at a time. Run the first command alone to see its output, then add | next_command and run again. This makes it easy to debug when something does not work as expected.

04 grep — search text

grep searches for a pattern inside a file or stream and prints matching lines. It is one of the most used commands in bioinformatics — for searching FASTQ headers, filtering VCF files, finding genes in annotation files, and checking log files for errors.

bash
grep "Sorghum" species.txt        # find lines containing "Sorghum"
grep -i "sorghum" species.txt     # -i = case-insensitive search
grep -v "Sorghum" species.txt     # -v = invert: show lines that do NOT match
grep -c "Sorghum" species.txt     # -c = count matching lines
grep -n "Sorghum" species.txt     # -n = show line numbers
grep -r "Sorghum" ~/projects/    # -r = search recursively in a folder

# Bioinformatics uses of grep
grep "^@" sample.fastq              # find all read headers in FASTQ (start with @)
grep -v "^#" variants.vcf           # skip VCF header lines (start with #)
grep "ERROR" pipeline.log           # find errors in a log file

^ means "start of line" in grep patterns. $ means "end of line". So grep "^@" matches lines that start with @ — exactly what FASTQ read headers look like.

05 sort & uniq

sort — Sort lines

Sorts lines of a file alphabetically by default. Very commonly used before uniq.

bash
sort species.txt                 # sort alphabetically
sort -r species.txt              # sort in reverse order
sort -n numbers.txt             # sort numerically (not alphabetically)
sort -k2 data.txt               # sort by second column
sort -u species.txt              # sort and remove duplicates in one step

uniq — Remove duplicate lines

Removes consecutive duplicate lines. Always sort firstuniq only removes duplicates that are next to each other.

bash
sort species.txt | uniq           # sort then remove duplicates
sort species.txt | uniq -c        # count how many times each line appears
sort species.txt | uniq -d        # show only lines that ARE duplicated
sort species.txt | uniq -u        # show only lines that are NOT duplicated

06 Combining it all

This is where pipes become genuinely powerful. Here are real bioinformatics patterns you will use regularly — each one is a chain of simple commands that together do something useful.

bash — real bioinformatics patterns
# How many reads are in a FASTQ file?
grep -c "^@" sample.fastq

# Show unique chromosomes in a VCF file (skip header lines)
grep -v "^#" variants.vcf | cut -f1 | sort -u

# Find the 10 most common words in a file
cat notes.txt | tr ' ' '\n' | sort | uniq -c | sort -rn | head -n 10

# Count how many files are in each subfolder
ls ~/bash-linux-bioinformatics/ | wc -l

# Save filtered results to a file
grep "Sorghum" species.txt | sort > sorghum_only.txt
💡

You can mix pipes and redirection in one command. The pipe | connects commands. The redirect > saves the final output to a file. They work together perfectly.

07 Quick reference

Operator / Command What it does Key flags
> Redirect stdout to a file (overwrites)
>> Redirect stdout to a file (appends)
2> Redirect stderr to a file 2>&1 redirect stderr to stdout
| Pipe stdout of left command into stdin of right
/dev/null Discard output completely
grep [pattern] [file] Search for pattern, print matching lines -i case-insensitive · -v invert · -c count · -n line numbers · -r recursive
sort [file] Sort lines alphabetically -r reverse · -n numeric · -u unique · -k column
uniq Remove consecutive duplicate lines -c count · -d duplicates only · -u unique only

08 Exercises

Work through all five exercises in your Ubuntu terminal. Type every command yourself — do not copy-paste.

Exercise 1 Redirect and append

Create a file called genomes.txt containing four genome names — one per line: GRCh38, GRCm39, TAIR10, Sbi3.1.1. Then append a fifth line: B73v5. Print the final file to confirm all five lines are there.

💬 Hint: use > for the first line, >> for the rest.

Show answer
echo "GRCh38" > genomes.txt
echo "GRCm39" >> genomes.txt
echo "TAIR10" >> genomes.txt
echo "Sbi3.1.1" >> genomes.txt
echo "B73v5" >> genomes.txt
cat genomes.txt
GRCh38
GRCm39
TAIR10
Sbi3.1.1
B73v5
Exercise 2 grep — search and filter

Using genomes.txt, find all entries that contain the letter G. Then find all entries that do not contain G. Count how many entries contain G.

Show answer
grep "G" genomes.txt
GRCh38
GRCm39

grep -v "G" genomes.txt
TAIR10
Sbi3.1.1
B73v5

grep -c "G" genomes.txt
2
Exercise 3 Pipe — count and sort

List all files in /etc, pipe the output through grep to find only files containing the word apt, then pipe into wc -l to count how many there are. Do all three steps in a single command.

Show answer
ls /etc | grep "apt" | wc -l
1   # number may vary on your system
Exercise 4 sort and uniq

Create a file called duplicates.txt with these lines — some repeated: maize, rice, sorghum, rice, maize, wheat, maize. Then use sort and uniq -c to count how many times each species appears. Sort the counts from highest to lowest.

💬 Hint: chain three commands with pipes — sort | uniq -c | sort -rn.

Show answer
echo "maize" > duplicates.txt
echo "rice" >> duplicates.txt
echo "sorghum" >> duplicates.txt
echo "rice" >> duplicates.txt
echo "maize" >> duplicates.txt
echo "wheat" >> duplicates.txt
echo "maize" >> duplicates.txt

sort duplicates.txt | uniq -c | sort -rn
      3 maize
      2 rice
      1 sorghum
      1 wheat
Exercise 5 · Challenge FASTQ read headers

Using the sample.fastq file you created in Lesson 2, extract only the read header lines (lines starting with @), count them with a pipe, and save the count to a file called read_count.txt. Then print the file to confirm.

💬 Hint: pipe grep "^@" into wc -l, then redirect with >.

Show answer
grep "^@" sample.fastq | wc -l > read_count.txt
cat read_count.txt
3
# 3 reads confirmed ✓