Session 01

Date: Wednesday, Jan 14
Entry ticket: None

Welcome to the research team.

As outlined in the syllabus, this course differs from a traditional elective. You are not here to memorize protocols; you are here to build a professional-grade technical skill set that allows you to function as a Research Software Engineer and Computational Biologist.

The first four weeks of this semester constitute your onboarding phase. In a biotechnology startup or a graduate laboratory, this is the period where production is secondary to competency. Your primary objective right now is not to make a scientific discovery, but to master the tools of the trade so that you do not become a liability to the codebase later.

Since this is our first session, we will not assume you have already engaged with the technical brief. However, you are expected to have already engaged with the async material, as we will not use our limited face-to-face time to reiterate what you have already read. Instead, these sessions are designed as a technical bootcamp.

We will operate under a “Code-Along” model. This is a high-bandwidth, follow-the-leader exercise in which we configure your development environment in real time. The pace will be rapid, and the computational environment will be unforgiving. This is intentional. The goal is to expose you to the friction of the CLI while you have immediate access to support, rather than leaving you to struggle alone at 11:00 PM.

Learning Objectives

By the end of this session, you will have started transitioning away from the GUI and become an operator of the Linux kernel. You will establish the three pillars of your computational environment:

Navigation: Moving through the file system without visual aids.
Configuration: Customizing the shell to serve your workflow.
Access: Authenticating with remote systems using cryptographic keys rather than passwords.

We are effectively building the cockpit you will fly for the rest of the semester. Let’s get to work.

Concept Check

Goal: Verify your understanding of the reading before we begin the Code-Along.

Drill 1: The Stack

We distinguish between the Terminal, the Shell, and the Kernel. If you cannot identify which component is responsible for an error, you cannot fix it.

In the following three scenarios, which component is the “point of failure”?

Scenario A: You type a command, but the letters do not appear on the screen.
Scenario B: You type run_analysis, and the text returns command not found.
Scenario C: You run a script, and the computer completely freezes; the mouse cursor stops moving.

(Instructor Note: We will debrief this in 2 minutes. Be ready to justify your answer.)

Drill 2: The Navigation Maze

Execute the following commands in order.

mkdir -p ~/sandbox/data/raw
cd ~/sandbox/data/raw

From this location, attempt to list your shell configuration file using the following three commands. You must be able to explain where Command A looked and where Command B looked.

Command A: ls .bashrc (or .zshrc) -> Why does this fail?
Command B: ls /.bashrc (or /.zshrc) -> Why does this fail?

Now, construct a relative path using the parent operator (..) to list that file without using the tilde (~) or a starting slash (/).

Code Along

Goal: Configure the shell for research.

The Identity Check

Software often relies on hidden configuration files (Dotfiles). We need to verify which shell you are running to know which file to edit.

echo $SHELL

If the output ends in /zsh, your config file is ~/.zshrc. If the output ends in /bash, your config file is ~/.bashrc.

The PATH Inspection

Your shell relies on an ordered list of directories (The PATH) to find programs. We need to verify that this list exists.

echo $PATH

Is your home directory’s local bin folder (usually ~/.local/bin or similar) included in this list? If not, any tool you install locally effectively “does not exist” to the shell. We will fix this during the Pixi installation next week.

Aliases

Programmers are productively lazy. We do not type the same long commands repeatedly. We create aliases.

The command ls -lah (List All, Human-readable sizes) is a standard research reflex. Typing seven characters is inefficient. Let’s map it to two.

alias ll="ls -lah"

ll

Close your terminal window and open a new one. Type ll. It fails. Why? Commands typed in the terminal are temporary. To make them permanent, we must write them into your “Morning Routine” file (.bashrc or .zshrc) using a text editor. We will do this shortly.

Pipes & Redirection

The power of the shell comes from “piping” the output of one tool into the input of another. This is the foundation of biological data pipelines.

Instead of printing text to the screen, we can catch it and save it to a file.

cd ~/sandbox
echo "Sample_01" > samples.txt

cat samples.txt

Use the double arrow (>>) to add to the bottom.

echo "Sample_02" >> samples.txt
cat samples.txt

How many files are in your /bin directory? Do not count them manually.

ls /bin | wc -l

Translation: “Take the list from ls and hand it directly to wc (word count) to count the lines.”

Tab Completion

Typos are the enemy of reproducibility. The shell can type for you.

Type cd ~/sand (Do not press Enter).
Press the Tab key.

The shell should autocomplete to cd sandbox/ (from our previous drill). Press Enter.

Type cd d and press Tab.

It autocompletes to data/. Never type a full path manually. Always tab-complete to verify the path exists before you hit Enter.

Text Editors

To make our alias permanent, or to write a Python script later, we need a terminal-based editor.

cd ~/sandbox
nano draft.txt

Write: Type “This is my first artifact.”
Save: Press Ctrl+O (Write Out), then Enter to confirm the filename.
Exit: Press Ctrl+X.

(Note: If you are brave and installed Helix or Neovim, use hx draft.txt or nvim draft.txt instead.)

`man`

The man (manual) command opens the definitive documentation for any tool installed on your system.

Navigation: Use Arrow Keys to scroll.
Search: Type /keyword to search for a word. Press n to find the next occurrence.
Exit: Press q to quit.

`grep`

grep (Global Regular Expression Print) is the standard search tool. But how do we tweak it? We want to search for the word “error” in a log file, but we don’t know whether it’s spelled “Error”, “ERROR”, or “error”. We need to ignore the case.

Open the manual: man grep
Do not read the whole thing. It is too long.
Type /case and press Enter.
Look for the option that enables “ignore-case”.

`find`

Imagine you are looking for a specific file. In a GUI, you might click through folders or use a search bar that scans for filenames. But in computational biology, we often need to search for attributes, not names.

You generated a massive dataset yesterday, but you cannot remember where you saved it. You don’t know the filename, but you it is very large (over 10 Megabytes).

To locate this needle in the haystack, we use find. This tool crawls the directory tree, inspecting every single file to see if it matches your criteria. Because find has hundreds of options, we will not memorize them. We will ask the Oracle.

Open the manual: man find
We need to filter by size. Type /size and press Enter.
Read the syntax. You will see that k stands for kilobytes and M stands for Megabytes.
Construct the command based on what you just read.

`head` and `tail`

Before you run an analysis, verify that your input file is formatted correctly. Is it comma-separated? Does it have a header row?

Instead of opening the whole file, we use head to slice off just the top.

head -n 5 data.txt

This prints the first five lines instantly. Conversely, tail shows you the end of the file. This is particularly useful for checking log files; if a long simulation crashes, the error message is almost always in the last few lines.

`wc`

Data loss is a silent killer in research. If you start with 1,000 DNA samples and your pipeline outputs 998 results, you have a problem. You need a quick way to audit your data.

wc (Word Count) is your auditor. Passing the -l flag causes it to count newlines.

wc -l samples.txt

If that number doesn’t match your expectations, stop immediately. You have found a bug.

`less`

Sometimes you do need to read the file, scrolling through it to spot-check values. Since we cannot load 50GB into RAM, we use a “Pager” called less.

less big_data.txt

less is efficient. It only loads the specific chunk of the file you are currently looking at. This allows you to open terabyte-sized files instantly. You can navigate using the arrow keys or Spacebar, and exit by pressing q.

`history`

One of the greatest virtues of a programmer is laziness. We hate doing the same thing twice.

You will often construct complex commands with long pipelines, specific flags, and file paths. Five minutes later, you will need to run that command again. Do not retype it.

You can view your entire timeline by typing:

history

This prints a numbered list of your past actions. If you see that the complex command you want was number 105, you can instantly re-run it by typing !105.

The Search (Magic)

Scanning a list of 1,000 commands is slow. Instead, use the Reverse Search feature.

Press Ctrl + r
Start typing a snippet of the command you remember (e.g., ssh).
The shell will time-travel backwards and show you the most recent command that matches that text.
Press Enter to run it, or the Right Arrow key to edit it.

This single keyboard shortcut is often the difference between a novice user and a power user.

Open Lab

You are now operating autonomously. The teaching team is circulating to provide “Consulting Support.” If you are stuck for more than 5 minutes on a syntax error, use the Pink Card.

Your assignment (Ticket 01) is to complete Levels 0 through 12 of the Bandit Wargame. This is a standard industry simulation used to train security professionals.

Your proof of work will be the passwords you retrieve. You must submit the passwords for Levels 1, 4, 8, and 13 to Gradescope by Tuesday at 11:59 PM.

Exit Criteria

You are free to leave once you have:

Successfully logged into Bandit Level 3.
Retrieved the password for Level 4.

Last updated on January 13, 2026