Session 01
Date: Wednesday, Jan 14
Entry ticket: None
Welcome to the research team.
As outlined in the syllabus, this course differs from a traditional elective. You are not here to memorize protocols; you are here to build a professional-grade technical skill set that allows you to function as a Research Software Engineer and Computational Biologist.
The first four weeks of this semester constitute your onboarding phase. In a biotechnology startup or a graduate laboratory, this is the period where production is secondary to competency. Your primary objective right now is not to make a scientific discovery, but to master the tools of the trade so that you do not become a liability to the codebase later.
Since this is our first session, we will not assume you have already engaged with the technical brief. However, you are expected to have already engaged with the async material, as we will not use our limited face-to-face time to reiterate what you have already read. Instead, these sessions are designed as a technical bootcamp.
We will operate under a “Code-Along” model. This is a high-bandwidth, follow-the-leader exercise in which we configure your development environment in real time. The pace will be rapid, and the computational environment will be unforgiving. This is intentional. The goal is to expose you to the friction of the CLI while you have immediate access to support, rather than leaving you to struggle alone at 11:00 PM.
Learning Objectives
By the end of this session, you will have started transitioning away from the GUI and become an operator of the Linux kernel. You will establish the three pillars of your computational environment:
- Navigation: Moving through the file system without visual aids.
- Configuration: Customizing the shell to serve your workflow.
- Access: Authenticating with remote systems using cryptographic keys rather than passwords.
We are effectively building the cockpit you will fly for the rest of the semester. Let’s get to work.
Concept Check
Goal: Verify your understanding of the reading before we begin the Code-Along.
Drill 1: The Stack
We distinguish between the Terminal, the Shell, and the Kernel. If you cannot identify which component is responsible for an error, you cannot fix it.
In the following three scenarios, which component is the “point of failure”?
- Scenario A: You type a command, but the letters do not appear on the screen.
- Scenario B: You type
run_analysis, and the text returnscommand not found. - Scenario C: You run a script, and the computer completely freezes; the mouse cursor stops moving.
(Instructor Note: We will debrief this in 2 minutes. Be ready to justify your answer.)
Drill 2: The Navigation Maze
Execute the following commands in order.
mkdir -p ~/sandbox/data/raw
cd ~/sandbox/data/rawFrom this location, attempt to list your shell configuration file using the following three commands. You must be able to explain where Command A looked and where Command B looked.
- Command A:
ls .bashrc(or.zshrc) -> Why does this fail? - Command B:
ls /.bashrc(or/.zshrc) -> Why does this fail?
Now, construct a relative path using the parent operator (..) to list that file without using the tilde (~) or a starting slash (/).
Code Along
Goal: Configure the shell for research.
The Identity Check
Software often relies on hidden configuration files (Dotfiles). We need to verify which shell you are running to know which file to edit.
echo $SHELLIf the output ends in /zsh, your config file is ~/.zshrc.
If the output ends in /bash, your config file is ~/.bashrc.
The PATH Inspection
Your shell relies on an ordered list of directories (The PATH) to find programs. We need to verify that this list exists.
echo $PATHIs your home directory’s local bin folder (usually ~/.local/bin or similar) included in this list?
If not, any tool you install locally effectively “does not exist” to the shell.
We will fix this during the Pixi installation next week.
Aliases
Programmers are productively lazy. We do not type the same long commands repeatedly. We create aliases.
The command ls -lah (List All, Human-readable sizes) is a standard research reflex.
Typing seven characters is inefficient. Let’s map it to two.
alias ll="ls -lah"llClose your terminal window and open a new one.
Type ll.
It fails.
Why?
Commands typed in the terminal are temporary.
To make them permanent, we must write them into your “Morning Routine” file (.bashrc or .zshrc) using a text editor.
We will do this shortly.
Pipes & Redirection
The power of the shell comes from “piping” the output of one tool into the input of another. This is the foundation of biological data pipelines.
Instead of printing text to the screen, we can catch it and save it to a file.
cd ~/sandbox
echo "Sample_01" > samples.txtcat samples.txtUse the double arrow (>>) to add to the bottom.
echo "Sample_02" >> samples.txt
cat samples.txtHow many files are in your /bin directory?
Do not count them manually.
ls /bin | wc -lTranslation: “Take the list from ls and hand it directly to wc (word count) to count the lines.”
Tab Completion
Typos are the enemy of reproducibility. The shell can type for you.
- Type
cd ~/sand(Do not press Enter). - Press the Tab key.
The shell should autocomplete to cd sandbox/ (from our previous drill).
Press Enter.
- Type
cd dand press Tab.
It autocompletes to data/.
Never type a full path manually.
Always tab-complete to verify the path exists before you hit Enter.
Text Editors
To make our alias permanent, or to write a Python script later, we need a terminal-based editor.
cd ~/sandbox
nano draft.txt- Write: Type “This is my first artifact.”
- Save: Press
Ctrl+O(Write Out), thenEnterto confirm the filename. - Exit: Press
Ctrl+X.
(Note: If you are brave and installed Helix or Neovim, use hx draft.txt or nvim draft.txt instead.)
man
The man (manual) command opens the definitive documentation for any tool installed on your system.
- Navigation: Use
Arrow Keysto scroll. - Search: Type
/keywordto search for a word. Pressnto find the next occurrence. - Exit: Press
qto quit.
grep
grep (Global Regular Expression Print) is the standard search tool. But how do we tweak it?
We want to search for the word “error” in a log file, but we don’t know whether it’s spelled “Error”, “ERROR”, or “error”. We need to ignore the case.
- Open the manual:
man grep - Do not read the whole thing. It is too long.
- Type
/caseand pressEnter. - Look for the option that enables “ignore-case”.
find
Imagine you are looking for a specific file. In a GUI, you might click through folders or use a search bar that scans for filenames. But in computational biology, we often need to search for attributes, not names.
You generated a massive dataset yesterday, but you cannot remember where you saved it. You don’t know the filename, but you it is very large (over 10 Megabytes).
To locate this needle in the haystack, we use find.
This tool crawls the directory tree, inspecting every single file to see if it matches your criteria. Because find has hundreds of options, we will not memorize them. We will ask the Oracle.
- Open the manual:
man find - We need to filter by size. Type
/sizeand pressEnter. - Read the syntax. You will see that
kstands for kilobytes andMstands for Megabytes. - Construct the command based on what you just read.
head and tail
Before you run an analysis, verify that your input file is formatted correctly. Is it comma-separated? Does it have a header row?
Instead of opening the whole file, we use head to slice off just the top.
head -n 5 data.txtThis prints the first five lines instantly.
Conversely, tail shows you the end of the file.
This is particularly useful for checking log files; if a long simulation crashes, the error message is almost always in the last few lines.
wc
Data loss is a silent killer in research. If you start with 1,000 DNA samples and your pipeline outputs 998 results, you have a problem. You need a quick way to audit your data.
wc (Word Count) is your auditor. Passing the -l flag causes it to count newlines.
wc -l samples.txtIf that number doesn’t match your expectations, stop immediately. You have found a bug.
less
Sometimes you do need to read the file, scrolling through it to spot-check values.
Since we cannot load 50GB into RAM, we use a “Pager” called less.
less big_data.txtless is efficient.
It only loads the specific chunk of the file you are currently looking at.
This allows you to open terabyte-sized files instantly.
You can navigate using the arrow keys or Spacebar, and exit by pressing q.
history
One of the greatest virtues of a programmer is laziness. We hate doing the same thing twice.
You will often construct complex commands with long pipelines, specific flags, and file paths. Five minutes later, you will need to run that command again. Do not retype it.
You can view your entire timeline by typing:
historyThis prints a numbered list of your past actions.
If you see that the complex command you want was number 105, you can instantly re-run it by typing !105.
The Search (Magic)
Scanning a list of 1,000 commands is slow. Instead, use the Reverse Search feature.
- Press
Ctrl + r - Start typing a snippet of the command you remember (e.g.,
ssh). - The shell will time-travel backwards and show you the most recent command that matches that text.
- Press
Enterto run it, or theRight Arrowkey to edit it.
This single keyboard shortcut is often the difference between a novice user and a power user.
Open Lab
You are now operating autonomously. The teaching team is circulating to provide “Consulting Support.” If you are stuck for more than 5 minutes on a syntax error, use the Pink Card.
Your assignment (Ticket 01) is to complete Levels 0 through 12 of the Bandit Wargame. This is a standard industry simulation used to train security professionals.
Your proof of work will be the passwords you retrieve. You must submit the passwords for Levels 1, 4, 8, and 13 to Gradescope by Tuesday at 11:59 PM.
Exit Criteria
You are free to leave once you have:
- Successfully logged into Bandit Level 3.
- Retrieved the password for Level 4.