Computing Basics

Ticket: T01 due Jan 20 by 11:59 pm

For most of our lives, we have been consumers of technology. You interact with computers through Graphical User Interfaces (GUIs): the familiar world of windows, icons, and menus. These systems are designed to be intuitive. They guide you down safe, pre-determined paths. They hide the complex machinery of the operating system behind polished buttons, preventing you from making mistakes.

But this safety comes at a cost: abstraction. A GUI is a simplified layer that sits between you and the machine. It limits you to the actions the software developer decided you might need. If there isn’t a button for the task you want to do, you cannot do it.

As a computational biologist, you are no longer a consumer. You are transitioning from being a passenger on an airplane to being the pilot in the cockpit. You need access to the controls. You need to manipulate thousands of files, automate complex pipelines, and manage massive datasets on remote supercomputers. You cannot do this by clicking a mouse ten thousand times.

We interact with the machine directly via the command-line interface. In the CLI, we do not click; we type. We issue precise text commands that the computer executes exactly as written. This shift unlocks two critical capabilities essential for modern science:

  1. A GUI requires you to repeat a manual action for every file. In the CLI, you can write a single command to process one file, and then loop that command to process ten thousand files instantly.
  2. Science must be reproducible. It is nearly impossible to document exactly which sequence of twenty menus you clicked to get a result. A script, however, is a perfect, permanent record of your analysis. It allows anyone, anywhere, to verify your work.

The terminal window may feel stark compared to the polished apps on your phone. It is unforgiving, but it is also limitless. Mastering this environment is your first step toward true technical competency.

The Shell

Imagine trying to write an essay by selecting words from a drop-down menu. You could eventually construct sentences, but you would be limited to the words the menu designer chose for you. This is what it is like to use a GUI: the point-and-click environment you use daily. It is intuitive, but it restricts you to pre-defined actions.

As a computational biologist, you need precision and power. You need to speak to the computer directly, without a menu getting in the way. You need the Command Line Interface (CLI). In the CLI, you type text commands, and the computer obeys. It is the difference between pointing at a picture of a meal and telling the chef exactly how you want your steak cooked.

The “black box” you type into is powered by two distinct components working in tandem. It is vital to distinguish between them:

  1. The Kernel: This is the core of the operating system. It controls the hardware—the CPU, memory, and hard drive. It is the “chef” in the kitchen doing the actual work.
  2. The Shell: This is the program that listens to you. It takes your text command, interprets it, and tells the kernel what to do. It is the “waiter” taking your order.

The Shell operates in a continuous cycle known as a REPL (Read-Evaluate-Print Loop).

  1. Read: The Shell waits for you to type a command and press Enter.
  2. Evaluate: It figures out what program you want to run.
  3. Print: It runs the program and displays the result on your screen.
  4. Loop: It displays a prompt (usually a $) and waits for your next command.

Example

Let’s look at what actually happens when you type the common command ls (which stands for “list”):

  1. You type: ls and hit Enter.
  2. The Shell reads: It sees the text “ls”.
  3. The Shell interprets: It searches your computer for a program named ls. It finds it and tells the Kernel, “Run this program.”
  4. The Kernel executes: The hardware spins up, reads the current directory, and sends the names of your files back to the Shell.
  5. The Shell prints: You see a list of your files (e.g., lab_report.docx, data.csv).
  6. The Shell loops: The prompt returns, ready for your next instruction.

Newcomers often confuse the Terminal with the Shell. The Terminal is the window on your screen. It accepts keyboard input and displays text. It is just a wrapper (like a television set). The Shell is the program running inside that window, processing your commands (like the broadcast showing on the TV).

Alex’s favorite terminal is Ghostty, an open-source project written in Zig by Mitchell Hashimoto ran as a non-profit.

Course Standards and Setup

Biology software lives on Linux. To ensure compatibility, we enforce a strict environment standard for this course.

For Windows Users

You must install the Windows Subsystem for Linux (WSL). This is not a simulator; it runs a genuine Linux kernel inside Windows.

Warning

Do not use PowerShell or Command Prompt. They use different syntax and logic. Using them here will lead to immediate errors.

Microsoft provides some helpful installation instructions. You should be able to open up PowerShell and run the following command to install WSL (while inside PowerShell).

wsl --install

You will then have to set up a new Linux username and password. From this point forward, all commands should be executed in your WSL.

Once you are logged in, you can update your WSL with the following bash command.

sudo apt update && sudo apt upgrade

Install the following applications in WSL to make your class experience smoother.

sudo apt-get install ssh git helix nano vim wget curl build-essential htop python3 python3-pip

Then you should install pixi using the following command.

curl -fsSL https://pixi.sh/install.sh | sh

You can still use Visual Studio Code with WSL; see here for more information.

For macOS Users

macOS is built on Unix, the cousin of Linux. You can use the default Terminal.app found in your Applications folder.

For Linux Users

You are already home. No preparation is required.

You will encounter different shells, most commonly Bash (Bourne Again Shell) and Zsh (Z Shell). For this course, they are effectively identical. If you are a beginner, stick to the default shell provided by your operating system (Bash for WSL, Zsh for modern macOS).

Learning Resources

We do not use a static textbook for this section. The Shell is a tool best learned by doing.

The Missing Semester of Your CS Education

Read the Course overview + the Shell and Shell Tools and Scripting. Focus on the concepts; you do not need to complete the exercises yet.

Software Carpentry: Introducing the Shell

Read this entire lesson to reinforce the basics of navigation and file manipulation.

The File System

To run an analysis, you must first answer a simple question: Where is the data?

In a Graphical User Interface (GUI), you answer this by double-clicking folders until you see the file you want. You navigate visually. In the CLI, you are blind. You cannot “see” the folders around you; you must know exactly where you are and exactly where you are going. If you understand the file system, you can instantly pinpoint any file among millions. If you do not, you will spend hours staring at a “File Not Found” error.

The Inverted Tree

Some of you are used to Windows, which splits storage into separate islands called drives (e.g., C:, D:). Unix-like systems (Linux and macOS) are different. They act as a single, unified landmass. We structure this landmass as an inverted tree.

  • The Root (/): This is the very top of the hierarchy. Every file, directory, hard drive, and process branches out from this single point.
  • Nodes: Every file or directory is a “node” on this tree.

Root vs. Home

To survive in this environment, you must distinguish between two locations:

  1. The Root (/): Think of this as the lobby of a high-security apartment building. It contains the machinery that keeps the building running (system files). You can look around, but you are generally forbidden from changing anything here.
  2. Home (~): Think of this as your personal apartment (/home/alex or /Users/alex). This is your sanctuary. You have full permission to create, edit, and destroy files here.

Key Idea: The Tilde (~) In the Shell, the tilde symbol (~) is a shorthand for your home directory. If your user path is /home/student, then ~/data is exactly the same as /home/student/data.

Absolute vs. Relative Paths

A Path is simply an address: a string of text that tells the computer how to reach a specific file. There are only two ways to write an address. Understanding the difference is the most critical skill in this chapter.

Absolute Paths

An absolute path is like a GPS coordinate. It always describes the exact same location, no matter where you are currently standing.

  • Rule: It always starts with the Root slash (/).
  • Example: /home/alex/data/dna.fa

Relative Paths

A relative path is like giving turn-by-turn directions. It describes a location relative to where you are standing right now.

  • Rule: It does not start with a slash.
  • Example: data/dna.fa (Translation: “Look in the folder I am currently in, find the ‘data’ folder, and look inside.”)

Example

Imagine you are currently working in your home folder (/home/alex) and you want to list the files in a folder called experiments.

Scenario A: Using the Absolute Path You type: ls /home/alex/experiments

You are giving the computer the full postal address. It ignores where you are standing and goes straight to that specific address.

Scenario B: Using the Relative Path You type: ls experiments

You are saying, “Look right next to me.” Since you are standing in /home/alex, the computer looks for /home/alex/experiments.

If you change directories to /tmp and type ls experiments, the computer will look for /tmp/experiments. The command fails because the relationship has changed.

Caution

The “File Not Found” Error When you see File Not Found, do not panic. It almost always means you provided a Relative Path while standing in the wrong directory.

  1. Check your current location using the pwd (Print Working Directory) command.
  2. Ask yourself: “Is the file actually inside this folder?”

Software Carpentry: Navigating Files and Directories

Read the entire lesson to practice moving through the tree.

Software Carpentry: Working With Files and Directories

Read the entire lesson to learn how to manipulate the nodes of the tree.

Configuration

When you move into a new lab, the first thing you do is organize your bench. You place your pipettes on the right, your notebook on the left, and your reagents on the top shelf. You set it up so you can work without having to think.

The Shell allows you to do the same thing. It is not just a tool; it is a fully programmable environment. You can change how it looks, how it behaves, and how it understands your commands.

Dotfiles

How does the computer remember your preferences? It relies on Dotfiles. These are simple text files that sit in your home directory. They are named with a leading period (e.g., .bashrc or .zshrc).

Why the dot? The period tells the operating system to treat the file as “hidden.” It prevents your home folder from looking cluttered, keeping these configuration gears turning silently in the background.

The most critical of these are the Run Control (rc) files.

  • For Bash: ~/.bashrc
  • For Zsh: ~/.zshrc

Think of the rc file as the Shell’s “morning routine.” Every single time you open a new terminal window, the Shell wakes up, reads this file from top to bottom, and executes every command inside it before it lets you type a single word.

Caution

When you edit your .bashrc or .zshrc, the changes do not apply to your current open window (because the morning routine already happened!). You must close the terminal and open a new one, or type source ~/.bashrc to force the Shell to re-read the file.

What Do We Configure?

We typically use the rc file to control three major behaviors.

Environment Variables

These are global settings that act like the “physics” of your shell universe. They tell other programs how to behave.

  • Example: export EDITOR=nano
  • Intuition: This tells your system, “Whenever a program needs me to edit text, launch Nano (not Vim).”

Aliases

Aliases are your personal shortcuts. They allow you to map a long, complex command to a short keystroke.

  • The Problem: You find yourself typing ls -lah (list all files, hidden files, in human-readable sizes) fifty times a day.
  • The Fix: You add this line to your rc file: alias ll="ls -lah"
  • The Result: Now, you simply type ll, and the Shell expands it for you. This saves thousands of keystrokes over a semester.

The PATH

The $PATH variable is the source of the most common frustration for beginners: the dreaded Command not found error. To fix it, you must understand how the Shell finds programs.

Imagine you ask a friend to find a specific book. You hand them a list of five libraries and say, “Check these libraries in this exact order. Bring me the first copy you find.” The PATH is that list of libraries. It is an ordered list of directories where the Shell looks for programs.

Example

Let’s look at what happens when you type python and hit Enter:

The Shell checks your PATH

It sees a list like:

  1. /usr/local/bin
  2. /usr/bin
  3. /bin

The Hunt Begins

  • It looks in /usr/local/bin. Is there a file named python here? No.
  • It looks in /usr/bin. Is there a file named python here? No.
  • It looks in /bin. Is there a file named python here? Yes.

Execution

It stops searching and runs the program found in /bin.

If you install a new bioinformatics tool in a custom folder (e.g., /home/alex/my_tools) but you do not add that folder to your $PATH, the Shell will never look there. It will check the standard folders, fail to find the tool, and give up. You must append your new folder to the search list in your rc file: export PATH="$PATH:/home/alex/my_tools"

Software Carpentry: Shell Scripts (Variables)

Read the section explaining variables. While this lesson focuses on scripts, the concept of assigning values to names (variables) is identical to how you configure your environment.

The Missing Semester: Shell Tools and Scripting

Read the “Shell Scripting” section up to “Shell Tools.” The explanation of aliases and dotfiles here is excellent and industry-standard.

Text Editors

On your personal laptop, editing a file is trivial. You double-click an icon, a window pops up, and you use your mouse to highlight text and click “File > Save.” You rely on a GUI.

But as a computational biologist, your work lives on high-performance computing clusters (supercomputers). These machines are hundreds of miles away. When you connect to them, you do not get a window. You do not get a mouse. You do not get a scrollbar.

You get a black box with blinking text.

To write code or configure software in this environment, you must use a Terminal-Based Editor. This program runs entirely inside the Shell. Mastering one of these tools is not optional; it is the only way to communicate with the machine effectively. While tools like VS Code Remote exist, connections break. When they do, you must be able to fix your code directly in the terminal.

The Options

There are two main categories of editors you will encounter.

Basic

Nano is the “safe default” installed on virtually every Unix system in existence. It operates like a very primitive Notepad. You type, and letters appear. You don’t need to memorize commands to start. It is basic. It lacks powerful features for complex coding.

Modal

Vim (and its modern successor, Neovim) is the standard for system administrators and serious programmers. It is famous for its steep learning curve, which stems from one core concept: Modes.

  • Insert Mode: The keyboard acts like a typewriter. You press j, and the letter “j” appears.
  • Normal Mode: The keyboard acts like a control panel. You press j, and the cursor moves down one line. You press d, and it deletes a line.

Once you master the “language” of Vim, you can edit text at the speed of thought, without your fingers ever leaving the home row.

Alex strictly uses Helix. It is a “post-modern” modal editor written in Rust. It maintains the speed of Vim but replaces archaic keybindings with a more logical, visual selection model (Selection -> Action). It is faster, smarter, and comes with modern features like multi-cursors built-in by default.

Which one should I learn?

If you are panicked and just need to change a single line of text, use Nano. If you want to become proficient at working in a terminal environment long-term, you must learn a modal editor.

Nano Tutorial

Just understand the basics.

VimTutor

If you choose the Vim path, type vimtutor directly into your terminal. It is a built-in interactive program that teaches you by having you edit a real file.

Helix Tutor

If you choose to follow Alex’s lead, install Helix and run hx --tutor to launch the interactive learning environment.

The Remote Server (SSH)

You have mastered the Shell on your own machine. Now, it is time to leave your laptop behind. In computational biology, your personal computer is rarely powerful enough to do the heavy lifting. Instead, we use it merely as a remote control. The actual work happens on massive, customized supercomputers located in data centers hundreds of miles away.

To bridge the gap between your laptop (the client) and the supercomputer (the server), we use the Secure Shell (SSH) protocol. SSH creates an encrypted tunnel through the internet. When you run SSH, your terminal window stops talking to your local operating system and starts sending your commands through this tunnel to the remote server.

Because the interface looks identical (a black box with text), it is dangerously easy to forget which machine you are controlling. You might think you are deleting a temporary test file on your laptop, only to realize you just wiped a critical dataset on the lab server. Always check your prompt.

Authentication: Keys vs. Passwords

How do you prove to the remote server that you are allowed inside? Historically, you would type a username and a password. In professional engineering, we reject this method.

  1. Passwords can be brute-forced or stolen.
  2. You cannot write a script to upload data at 3:00 AM if a human needs to wake up and type a password.

Instead, we use Cryptographic Key Pairs.

Alex’s Soap Box

Post-quantum cryptography has been making some very interesting progress, especially with the release of standardized post-quantum algorithms from NIST.

Think of this system like a custom physical lock and a key.

  • The Public Key (The Lock): You upload this to the server. You can give this to anyone. It is effectively saying, “Here is a lock that only I can open. Please install this on your door.”
  • The Private Key (The Key): This lives on your laptop. You never give it to anyone. It is the only thing in the universe that can turn the lock.

When you try to log in, the server sees your username and looks at the Public Key (lock) you installed. It challenges your laptop: “I am locking this message. If you are who you say you are, use your Private Key to unlock it.” Your laptop solves the puzzle instantly and returns the result. The server grants you access without you ever having to type a password.

We will generate an Ed25519 key pair. This is the modern, high-performance standard that replaces the older RSA keys. It is smaller, faster, and more secure.

Your Private Key (id_ed25519) is your digital identity. NEVER email it. NEVER upload it to GitHub. NEVER share it with a collaborator. If you expose this file, you must treat your identity as compromised, delete the key, and generate a new one immediately.

GitHub Docs: Generating a new SSH key

Follow the instructions for your specific OS. When asked for the key type, select Ed25519. You do not need to add the key to GitHub yet; we just need the file generated on your machine.

DigitalOcean: SSH Essentials

Read the “SSH Overview” section. Pay close attention to the diagrams explaining the encryption tunnel and the handshake process.

Last updated on