Pipes are what make Linux truly powerful. Instead of writing one command at a time, you chain commands together so the output of one flows directly into the next — building complex analyses from simple building blocks. This is how real bioinformatics pipelines work.
Every Linux command has three streams — think of them as channels:
stdin (0) — standard input. Where the command reads data from. By default: your keyboard.
stdout (1) — standard output. Where the command sends its results. By default: your screen.
stderr (2) — standard error. Where error messages go. By default: also your screen.
Redirection and pipes let you control these streams. Instead of results going to your screen, you can send them to a file. Instead of a command reading from your keyboard, you can feed it the output of another command. This is the foundation of every bioinformatics pipeline ever written.
You already used > and >> in Lesson 2.
Now let's understand all the redirection operators properly.
> and >>ls -lh > file_list.txt # save ls output to a file (overwrites) ls -lh >> file_list.txt # append ls output to the file echo "run complete" > log.txt # write a log message
2> and 2>>In bioinformatics, tools like STAR and GATK print progress messages and warnings to stderr. You often want to save these to a log file separately from the actual results.
ls nonexistent_folder 2> error.log # redirect error message to a file ls nonexistent_folder 2>> error.log # append error to existing log
When running long pipelines you often want to capture everything — both normal output and errors — into one log file.
ls -lh > output.log 2>&1 # stdout to file, stderr to same file ls -lh &> output.log # shorthand — same result
/dev/null
/dev/null is a special file that swallows everything sent to it.
Use it to silence commands you don't care about.
ls nonexistent 2> /dev/null # silence error messages some_tool > results.txt 2> /dev/null # save results, silence all errors
|
The pipe | takes the output of one command and feeds it directly
as input to the next — without writing anything to disk.
You can chain as many commands as you need.
The pattern: command1 | command2 | command3
The output of command1 becomes the input of command2,
whose output becomes the input of command3.
Data flows left to right through the pipe.
# Without a pipe — two separate steps ls -lh > temp.txt wc -l temp.txt # With a pipe — one elegant step ls -lh | wc -l # Count how many .sh files are in a folder ls | grep ".sh" | wc -l # Show the 5 largest files in current folder ls -lhS | head -n 6
Build pipes one step at a time. Run the first command alone to see its output,
then add | next_command and run again. This makes it easy to debug
when something does not work as expected.
grep searches for a pattern inside a file or stream and prints matching lines.
It is one of the most used commands in bioinformatics — for searching FASTQ headers,
filtering VCF files, finding genes in annotation files, and checking log files for errors.
grep "Sorghum" species.txt # find lines containing "Sorghum" grep -i "sorghum" species.txt # -i = case-insensitive search grep -v "Sorghum" species.txt # -v = invert: show lines that do NOT match grep -c "Sorghum" species.txt # -c = count matching lines grep -n "Sorghum" species.txt # -n = show line numbers grep -r "Sorghum" ~/projects/ # -r = search recursively in a folder # Bioinformatics uses of grep grep "^@" sample.fastq # find all read headers in FASTQ (start with @) grep -v "^#" variants.vcf # skip VCF header lines (start with #) grep "ERROR" pipeline.log # find errors in a log file
^ means "start of line" in grep patterns.
$ means "end of line".
So grep "^@" matches lines that start with @ —
exactly what FASTQ read headers look like.
sort — Sort linesSorts lines of a file alphabetically by default. Very commonly used before uniq.
sort species.txt # sort alphabetically sort -r species.txt # sort in reverse order sort -n numbers.txt # sort numerically (not alphabetically) sort -k2 data.txt # sort by second column sort -u species.txt # sort and remove duplicates in one step
uniq — Remove duplicate lines
Removes consecutive duplicate lines. Always sort first — uniq
only removes duplicates that are next to each other.
sort species.txt | uniq # sort then remove duplicates sort species.txt | uniq -c # count how many times each line appears sort species.txt | uniq -d # show only lines that ARE duplicated sort species.txt | uniq -u # show only lines that are NOT duplicated
This is where pipes become genuinely powerful. Here are real bioinformatics patterns you will use regularly — each one is a chain of simple commands that together do something useful.
# How many reads are in a FASTQ file? grep -c "^@" sample.fastq # Show unique chromosomes in a VCF file (skip header lines) grep -v "^#" variants.vcf | cut -f1 | sort -u # Find the 10 most common words in a file cat notes.txt | tr ' ' '\n' | sort | uniq -c | sort -rn | head -n 10 # Count how many files are in each subfolder ls ~/bash-linux-bioinformatics/ | wc -l # Save filtered results to a file grep "Sorghum" species.txt | sort > sorghum_only.txt
You can mix pipes and redirection in one command.
The pipe | connects commands. The redirect >
saves the final output to a file. They work together perfectly.
| Operator / Command | What it does | Key flags |
|---|---|---|
| > | Redirect stdout to a file (overwrites) | — |
| >> | Redirect stdout to a file (appends) | — |
| 2> | Redirect stderr to a file | 2>&1 redirect stderr to stdout |
| | | Pipe stdout of left command into stdin of right | — |
| /dev/null | Discard output completely | — |
| grep [pattern] [file] | Search for pattern, print matching lines | -i case-insensitive · -v invert · -c count · -n line numbers · -r recursive |
| sort [file] | Sort lines alphabetically | -r reverse · -n numeric · -u unique · -k column |
| uniq | Remove consecutive duplicate lines | -c count · -d duplicates only · -u unique only |
Work through all five exercises in your Ubuntu terminal. Type every command yourself — do not copy-paste.
Create a file called genomes.txt containing four genome names —
one per line: GRCh38, GRCm39, TAIR10, Sbi3.1.1.
Then append a fifth line: B73v5.
Print the final file to confirm all five lines are there.
💬 Hint: use > for the first line, >> for the rest.
echo "GRCh38" > genomes.txt echo "GRCm39" >> genomes.txt echo "TAIR10" >> genomes.txt echo "Sbi3.1.1" >> genomes.txt echo "B73v5" >> genomes.txt cat genomes.txt GRCh38 GRCm39 TAIR10 Sbi3.1.1 B73v5
Using genomes.txt, find all entries that contain the letter G.
Then find all entries that do not contain G.
Count how many entries contain G.
grep "G" genomes.txt GRCh38 GRCm39 grep -v "G" genomes.txt TAIR10 Sbi3.1.1 B73v5 grep -c "G" genomes.txt 2
List all files in /etc, pipe the output through grep
to find only files containing the word apt,
then pipe into wc -l to count how many there are.
Do all three steps in a single command.
ls /etc | grep "apt" | wc -l 1 # number may vary on your system
Create a file called duplicates.txt with these lines —
some repeated: maize, rice, sorghum,
rice, maize, wheat, maize.
Then use sort and uniq -c to count how many times
each species appears. Sort the counts from highest to lowest.
💬 Hint: chain three commands with pipes — sort | uniq -c | sort -rn.
echo "maize" > duplicates.txt echo "rice" >> duplicates.txt echo "sorghum" >> duplicates.txt echo "rice" >> duplicates.txt echo "maize" >> duplicates.txt echo "wheat" >> duplicates.txt echo "maize" >> duplicates.txt sort duplicates.txt | uniq -c | sort -rn 3 maize 2 rice 1 sorghum 1 wheat
Using the sample.fastq file you created in Lesson 2,
extract only the read header lines (lines starting with @),
count them with a pipe, and save the count to a file called read_count.txt.
Then print the file to confirm.
💬 Hint: pipe grep "^@" into wc -l, then redirect with >.
grep "^@" sample.fastq | wc -l > read_count.txt cat read_count.txt 3 # 3 reads confirmed ✓