Lesson 2 of 13 · ⏱ 45 min · ✓ Free

File Operations

Navigating the filesystem gets you to the right place — but files are what you actually work with. In bioinformatics you will copy FASTQ files, move results to new folders, delete intermediate files to save space, and inspect large files without opening them. This lesson teaches you all of that.

01 Creating files

Before you can copy or move files you need some to work with. The two most common ways to create files from the terminal are touch and echo.

touch — Create an empty file

touch creates a blank file instantly. It is also used to update the timestamp of an existing file without changing its contents.

bash
touch sample.txt               # create one empty file
touch file1.txt file2.txt     # create multiple files at once
ls -lh                        # confirm they were created (size will be 0)

echo — Write text into a file

echo prints text to the screen. Combined with > it writes that text into a file. Combined with >> it appends to an existing file without overwriting it.

bash
echo "Hello bioinformatics"               # prints to screen
echo "Hello bioinformatics" > notes.txt    # writes to file (creates or overwrites)
echo "Second line" >> notes.txt           # appends — does NOT overwrite

A single > overwrites the entire file. A double >> appends. Getting these confused is a common way to accidentally destroy data. We will practise this in the exercises.

02 Copying & moving files

cp — Copy

Copies a file from one location to another. The original file stays in place. In bioinformatics you use cp constantly — for example, copying a raw FASTQ file into a working directory before processing it so the original is always safe.

bash
cp notes.txt notes_backup.txt          # copy in same folder with new name
cp notes.txt ~/projects/               # copy to a different folder
cp notes.txt ~/projects/notes_v2.txt   # copy to different folder with new name
cp -r projects/ projects_backup/       # copy an entire folder (-r = recursive)
💡

Always use cp -r when copying folders. Without -r, cp will refuse to copy a directory and give an error.

mv — Move or rename

Moves a file to a new location — or renames it if you stay in the same folder. Unlike cp, the original is removed. mv works on both files and folders without needing -r.

bash
mv notes.txt notes_renamed.txt          # rename a file
mv notes_renamed.txt ~/projects/       # move to a different folder
mv results/ ~/projects/results_v1/    # rename a whole folder

cp vs mv: Think of cp as a photocopier — you keep the original. Think of mv as physically picking up the file and putting it somewhere else — the original is gone from its starting location.

03 Deleting files safely

rm — Remove

Permanently deletes files. There is no recycle bin in Linux — once you run rm, the file is gone. This is the most dangerous command in everyday use. Always double-check what you are deleting.

bash
rm notes_backup.txt           # delete one file
rm file1.txt file2.txt         # delete multiple files
rm -i notes.txt               # -i asks for confirmation before deleting
rm -r projects_backup/        # delete a folder and everything inside it
rm -ri projects_backup/       # delete folder with confirmation at each step

Never run rm -rf / or rm -rf * from a sensitive directory. These commands delete everything with no warning and no recovery. Use rm -i when in doubt — the extra confirmation takes one second and can save hours of lost work.

04 Reading files

In bioinformatics your files are often enormous — a single FASTQ file can be 50 GB. You never open these in a text editor. Instead you use command-line tools to inspect just the parts you need.

cat — Print entire file to screen

Prints the full contents of a file. Only use this on small files — running cat on a 50 GB FASTQ will flood your terminal for hours.

bash
cat notes.txt                  # print entire file
cat file1.txt file2.txt         # print two files one after another

head — Show the beginning of a file

Shows the first 10 lines by default. Invaluable for quickly checking what a file looks like without loading the whole thing. In bioinformatics, head on a FASTQ file shows you the first read immediately.

bash
head notes.txt                  # first 10 lines (default)
head -n 4 notes.txt             # first 4 lines only
head -n 8 sample.fastq          # first 2 FASTQ reads (each read = 4 lines)

tail — Show the end of a file

Shows the last 10 lines by default. Extremely useful for checking log files — the most recent output from a pipeline run is always at the bottom.

bash
tail notes.txt                  # last 10 lines (default)
tail -n 20 pipeline.log         # last 20 lines of a log file
tail -f pipeline.log            # live view — updates as the file grows
💡

tail -f is one of the most useful commands when running a long pipeline. Open a second terminal and run tail -f your_pipeline.log to watch progress in real time. Press Ctrl + C to stop.

05 Counting with wc

wc — Word count

Despite the name, wc counts lines, words, and characters — not just words. In bioinformatics the most common use is counting reads in a FASTQ file: since each read is 4 lines, you divide the line count by 4.

bash
wc notes.txt                  # lines, words, characters — all three
wc -l notes.txt               # lines only — most common flag
wc -w notes.txt               # words only
wc -c notes.txt               # characters (bytes) only
wc -l sample.fastq            # count lines in a FASTQ file

# Example output of wc notes.txt
  5  12  68 notes.txt
# 5 lines, 12 words, 68 characters

Counting FASTQ reads: Every read in a FASTQ file takes exactly 4 lines. So if wc -l sample.fastq returns 400000, you have 100,000 reads in that file.

06 Quick reference

Command What it does Key flags
touch [file] Create an empty file
echo "text" > [file] Write text into a file (overwrites) >> to append instead
cp [src] [dest] Copy a file -r for folders
mv [src] [dest] Move or rename a file / folder
rm [file] Permanently delete a file -i confirm · -r folder
cat [file] Print entire file to screen
head [file] Show first 10 lines -n N for N lines
tail [file] Show last 10 lines -n N lines · -f live
wc [file] Count lines, words, characters -l lines · -w words · -c bytes

07 Exercises

Work through all five exercises in your Ubuntu terminal. Type every command yourself — do not copy-paste.

Exercise 1 Create and inspect a file

Navigate to your ~/bash-linux-bioinformatics/module-1-foundations/ folder. Create a file called species.txt and write three lines into it: Sorghum bicolor, Arabidopsis thaliana, and Oryza sativa. Then print the file to the screen.

💬 Hint: use echo "..." > for the first line, then echo "..." >> for the next two.

Show answer
cd ~/bash-linux-bioinformatics/module-1-foundations
echo "Sorghum bicolor" > species.txt
echo "Arabidopsis thaliana" >> species.txt
echo "Oryza sativa" >> species.txt
cat species.txt
Sorghum bicolor
Arabidopsis thaliana
Oryza sativa
Exercise 2 Copy and move

Copy species.txt into ~/bash-linux-bioinformatics/data/raw/. Then rename the copy to plant_species.txt using a single mv command. Confirm both the original and the renamed copy exist.

Show answer
cp species.txt ~/bash-linux-bioinformatics/data/raw/
mv ~/bash-linux-bioinformatics/data/raw/species.txt ~/bash-linux-bioinformatics/data/raw/plant_species.txt
ls ~/bash-linux-bioinformatics/data/raw/
plant_species.txt
ls ~/bash-linux-bioinformatics/module-1-foundations/
lesson-01-navigation.sh  species.txt
Exercise 3 head, tail, and wc

The /etc/passwd file lists all users on your system — it has many lines. Without opening it, find out: how many lines does it have? What are the first 3 lines? What is the last line?

💬 Hint: three separate commands — wc -l, head -n 3, tail -n 1.

Show answer
wc -l /etc/passwd
45 /etc/passwd       # number varies by system

head -n 3 /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin

tail -n 1 /etc/passwd
shajedur:x:1000:1000:,,,:/home/shajedur:/bin/bash
Exercise 4 Safe delete

Create a temporary file called temp_delete_me.txt anywhere in your home folder. Now delete it safely using the -i flag so Linux asks you to confirm first. Type y to confirm.

Show answer
touch ~/temp_delete_me.txt
rm -i ~/temp_delete_me.txt
rm: remove regular empty file '/home/shajedur/temp_delete_me.txt'? y
Exercise 5 · Challenge Count reads in a simulated FASTQ

Create a fake FASTQ file with exactly 3 reads. Each read in FASTQ format is 4 lines: a header line starting with @, a sequence line, a + line, and a quality score line. Use echo and >> to build it, then use wc -l to count lines and calculate the number of reads.

💬 Hint: 3 reads × 4 lines = 12 lines total.

Show answer
echo "@read1" > sample.fastq
echo "ATCGATCG" >> sample.fastq
echo "+" >> sample.fastq
echo "IIIIIIII" >> sample.fastq
echo "@read2" >> sample.fastq
echo "GCTAGCTA" >> sample.fastq
echo "+" >> sample.fastq
echo "IIIIIIII" >> sample.fastq
echo "@read3" >> sample.fastq
echo "TTAACCGG" >> sample.fastq
echo "+" >> sample.fastq
echo "IIIIIIII" >> sample.fastq

wc -l sample.fastq
12 sample.fastq
# 12 lines ÷ 4 = 3 reads ✓