Navigating the filesystem gets you to the right place — but files are what you actually work with. In bioinformatics you will copy FASTQ files, move results to new folders, delete intermediate files to save space, and inspect large files without opening them. This lesson teaches you all of that.
Before you can copy or move files you need some to work with.
The two most common ways to create files from the terminal are touch and echo.
touch — Create an empty filetouch creates a blank file instantly. It is also used to update the timestamp of an existing file without changing its contents.
touch sample.txt # create one empty file touch file1.txt file2.txt # create multiple files at once ls -lh # confirm they were created (size will be 0)
echo — Write text into a fileecho prints text to the screen. Combined with > it writes that text into a file. Combined with >> it appends to an existing file without overwriting it.
echo "Hello bioinformatics" # prints to screen echo "Hello bioinformatics" > notes.txt # writes to file (creates or overwrites) echo "Second line" >> notes.txt # appends — does NOT overwrite
A single > overwrites the entire file. A double >> appends.
Getting these confused is a common way to accidentally destroy data. We will practise this in the exercises.
cp — Copy
Copies a file from one location to another. The original file stays in place.
In bioinformatics you use cp constantly — for example, copying a raw FASTQ file
into a working directory before processing it so the original is always safe.
cp notes.txt notes_backup.txt # copy in same folder with new name cp notes.txt ~/projects/ # copy to a different folder cp notes.txt ~/projects/notes_v2.txt # copy to different folder with new name cp -r projects/ projects_backup/ # copy an entire folder (-r = recursive)
Always use cp -r when copying folders. Without -r,
cp will refuse to copy a directory and give an error.
mv — Move or rename
Moves a file to a new location — or renames it if you stay in the same folder.
Unlike cp, the original is removed. mv works on both files and folders
without needing -r.
mv notes.txt notes_renamed.txt # rename a file mv notes_renamed.txt ~/projects/ # move to a different folder mv results/ ~/projects/results_v1/ # rename a whole folder
cp vs mv: Think of cp as a photocopier — you keep the original.
Think of mv as physically picking up the file and putting it somewhere else — the original is gone from its starting location.
rm — Remove
Permanently deletes files. There is no recycle bin in Linux — once you run rm,
the file is gone. This is the most dangerous command in everyday use.
Always double-check what you are deleting.
rm notes_backup.txt # delete one file rm file1.txt file2.txt # delete multiple files rm -i notes.txt # -i asks for confirmation before deleting rm -r projects_backup/ # delete a folder and everything inside it rm -ri projects_backup/ # delete folder with confirmation at each step
Never run rm -rf / or rm -rf * from a sensitive directory.
These commands delete everything with no warning and no recovery.
Use rm -i when in doubt — the extra confirmation takes one second and can save hours of lost work.
In bioinformatics your files are often enormous — a single FASTQ file can be 50 GB. You never open these in a text editor. Instead you use command-line tools to inspect just the parts you need.
cat — Print entire file to screenPrints the full contents of a file. Only use this on small files — running cat on a 50 GB FASTQ will flood your terminal for hours.
cat notes.txt # print entire file cat file1.txt file2.txt # print two files one after another
head — Show the beginning of a file
Shows the first 10 lines by default. Invaluable for quickly checking what a file looks like
without loading the whole thing. In bioinformatics, head on a FASTQ file
shows you the first read immediately.
head notes.txt # first 10 lines (default) head -n 4 notes.txt # first 4 lines only head -n 8 sample.fastq # first 2 FASTQ reads (each read = 4 lines)
tail — Show the end of a fileShows the last 10 lines by default. Extremely useful for checking log files — the most recent output from a pipeline run is always at the bottom.
tail notes.txt # last 10 lines (default) tail -n 20 pipeline.log # last 20 lines of a log file tail -f pipeline.log # live view — updates as the file grows
tail -f is one of the most useful commands when running a long pipeline.
Open a second terminal and run tail -f your_pipeline.log to watch progress in real time.
Press Ctrl + C to stop.
wc — Word count
Despite the name, wc counts lines, words, and characters — not just words.
In bioinformatics the most common use is counting reads in a FASTQ file:
since each read is 4 lines, you divide the line count by 4.
wc notes.txt # lines, words, characters — all three wc -l notes.txt # lines only — most common flag wc -w notes.txt # words only wc -c notes.txt # characters (bytes) only wc -l sample.fastq # count lines in a FASTQ file # Example output of wc notes.txt 5 12 68 notes.txt # 5 lines, 12 words, 68 characters
Counting FASTQ reads: Every read in a FASTQ file takes exactly 4 lines.
So if wc -l sample.fastq returns 400000, you have
100,000 reads in that file.
| Command | What it does | Key flags |
|---|---|---|
| touch [file] | Create an empty file | — |
| echo "text" > [file] | Write text into a file (overwrites) | >> to append instead |
| cp [src] [dest] | Copy a file | -r for folders |
| mv [src] [dest] | Move or rename a file / folder | — |
| rm [file] | Permanently delete a file | -i confirm · -r folder |
| cat [file] | Print entire file to screen | — |
| head [file] | Show first 10 lines | -n N for N lines |
| tail [file] | Show last 10 lines | -n N lines · -f live |
| wc [file] | Count lines, words, characters | -l lines · -w words · -c bytes |
Work through all five exercises in your Ubuntu terminal. Type every command yourself — do not copy-paste.
Navigate to your ~/bash-linux-bioinformatics/module-1-foundations/ folder.
Create a file called species.txt and write three lines into it:
Sorghum bicolor, Arabidopsis thaliana, and Oryza sativa.
Then print the file to the screen.
💬 Hint: use echo "..." > for the first line, then echo "..." >> for the next two.
cd ~/bash-linux-bioinformatics/module-1-foundations echo "Sorghum bicolor" > species.txt echo "Arabidopsis thaliana" >> species.txt echo "Oryza sativa" >> species.txt cat species.txt Sorghum bicolor Arabidopsis thaliana Oryza sativa
Copy species.txt into ~/bash-linux-bioinformatics/data/raw/.
Then rename the copy to plant_species.txt using a single mv command.
Confirm both the original and the renamed copy exist.
cp species.txt ~/bash-linux-bioinformatics/data/raw/ mv ~/bash-linux-bioinformatics/data/raw/species.txt ~/bash-linux-bioinformatics/data/raw/plant_species.txt ls ~/bash-linux-bioinformatics/data/raw/ plant_species.txt ls ~/bash-linux-bioinformatics/module-1-foundations/ lesson-01-navigation.sh species.txt
The /etc/passwd file lists all users on your system — it has many lines.
Without opening it, find out: how many lines does it have? What are the first 3 lines?
What is the last line?
💬 Hint: three separate commands — wc -l, head -n 3, tail -n 1.
wc -l /etc/passwd 45 /etc/passwd # number varies by system head -n 3 /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin tail -n 1 /etc/passwd shajedur:x:1000:1000:,,,:/home/shajedur:/bin/bash
Create a temporary file called temp_delete_me.txt anywhere in your home folder.
Now delete it safely using the -i flag so Linux asks you to confirm first.
Type y to confirm.
touch ~/temp_delete_me.txt rm -i ~/temp_delete_me.txt rm: remove regular empty file '/home/shajedur/temp_delete_me.txt'? y
Create a fake FASTQ file with exactly 3 reads. Each read in FASTQ format is 4 lines:
a header line starting with @, a sequence line, a + line,
and a quality score line. Use echo and >> to build it,
then use wc -l to count lines and calculate the number of reads.
💬 Hint: 3 reads × 4 lines = 12 lines total.
echo "@read1" > sample.fastq echo "ATCGATCG" >> sample.fastq echo "+" >> sample.fastq echo "IIIIIIII" >> sample.fastq echo "@read2" >> sample.fastq echo "GCTAGCTA" >> sample.fastq echo "+" >> sample.fastq echo "IIIIIIII" >> sample.fastq echo "@read3" >> sample.fastq echo "TTAACCGG" >> sample.fastq echo "+" >> sample.fastq echo "IIIIIIII" >> sample.fastq wc -l sample.fastq 12 sample.fastq # 12 lines ÷ 4 = 3 reads ✓