Virtual Environments

Ticket: T03 due Jan 27 by 11:59 pm

Imagine you spend three weeks refining a Python script for your research. You debug the logic, polish the figures, and verify the statistical output. It runs perfectly on your laptop. Proud of your work, you email the script to your Principal Investigator (PI) or a collaborator at another university.

An hour later, you get a reply: “It doesn’t work.”

They see an ImportError. Or, far worse, the code runs but produces completely different scientific results. You stare at your screen, baffled. You type the command again. It works for you. You reply, “It works on my machine.”

This phrase is the punchline of a joke nobody wants to hear. In computational science, if your code only runs on your laptop, it is useless.

The failure happens because your script does not exist in a vacuum. Your code floats on a sea of invisible dependencies: the specific version of Python you installed, the numerical libraries (like NumPy) you downloaded months ago, and the operating system tools underlying them all. When you send your script to a collaborator, you send only the tip of the iceberg. You assume their computer has the exact same submerged foundation as yours. It almost never does.

If your colleague runs numpy version 1.21 and you wrote your code using version 2.0, functions may have changed names or arguments. If they use a Mac and you use Linux, the underlying math libraries might round floating-point numbers differently. These subtle discrepancies cause reproducibility to vanish.

To solve this, we cannot just ship code. We must ship the context. We need a way to wrap your project in a protective bubble, a virtual environment that contains not just your script, but the specific version of Python, the exact libraries, and the precise tools it requires to function. This ensures that anyone, on any machine, can step inside your bubble and see exactly what you see.

Reliability requires isolation. Never change the global state of your computer to solve a local problem for one project. You must explicitly define and package the environment alongside the code.

The Ecosystem

Before we manage our tools, we must understand the supply chain. When you type a command to install software, it does not magically appear from the ether. It is downloaded from a specific repository, similar to how apps are downloaded from an App Store. In the Python world, there are two primary supply chains, and understanding the difference between them is critical for scientific computing.

PyPI

The Python Package Index (PyPI) is the default repository for the Python community. When you use the standard pip install command, you are usually pulling from here.

Think of PyPI as a massive furniture warehouse selling flat-pack kits. When you order a package, PyPI sends you the blueprints and the raw materials (the source code). Your computer then acts as the factory: it must read the blueprints, compile the code, and assemble the final product before it can be used.

For simple tools written entirely in Python, this works seamlessly. However, scientific computing relies on heavy machinery. Libraries like NumPy, PyTorch, and SciPy are not just Python scripts; they are wrappers for highly optimized mathematical engines written in low-level languages such as C, C++, and Fortran.

If you try to install these from PyPI, your computer is suddenly expected to know how to compile complex Fortran code. If you lack the correct compilers or system headers, which most standard laptops do, the installation fails with a wall of cryptic red error text. You are trying to build a jet engine in your garage with only a hammer.

Conda

Conda was created specifically to solve this “compilation problem” for scientists. Unlike PyPI, Conda is a binary package manager.

Returning to our furniture analogy, Conda delivers the furniture fully assembled. The authors of the package have already run the compilers on a dedicated server farm. They package the resulting “binary” (the ready-to-run software) and upload it. When you install via Conda, your computer simply downloads the finished product and places it in the correct folder. It does not need to know what a C++ compiler is.

This distinction is why tools like Pixi build upon the Conda ecosystem. By using Conda packages, Pixi lets you install Python, C++, and R tools seamlessly, so the heavy mathematical lifting runs instantly on Windows, macOS, and Linux without requiring you to be a systems administrator.

Channels

Within the Conda ecosystem, packages are organized into “channels.” You can think of a channel as a specific aisle in a grocery store or a specific vendor in a marketplace.

  • defaults: This is the commercial aisle maintained by Anaconda, Inc. It is stable but moves slowly. The software here is often several versions behind the cutting edge.
  • conda-forge: This is the community-run aisle. It is the gold standard for scientific research. It is updated rapidly, contains a massive variety of packages, and is maintained by thousands of volunteers. In this course, and in most modern research, we almost exclusively use conda-forge.

Metapackages and Complex Dependencies

Some scientific tools are so complex that they are not just a single piece of software, but a collection of strictly coordinated parts.

Consider PyTorch, a library for deep learning. It requires the Python library, the underlying C++ tensor library, and potentially the NVIDIA CUDA drivers to talk to your graphics card. If these versions do not match perfectly, the software breaks.

To handle this, ecosystem maintainers use metapackages. A metapackage is like a travel itinerary. It contains no code itself; it is simply a list of other packages that must be installed together. When you install a complex tool like pytorch or mojo, you are often installing a metapackage that automatically fetches the correct, compatible versions of all the underlying drivers and libraries for your specific hardware. This ensures that you get the “GPU-accelerated” version if you have a GPU, without needing to wire up the drivers yourself manually.

The Tooling Landscape

If you search online for “how to manage Python projects,” you will not find one answer. You will find a dozen. The Python ecosystem has evolved through a fractured history of competing standards, leaving beginners confused about which tool to reach for.

To understand why we use Pixi, we must briefly survey the tools that came before it and the specific problems they failed to resolve for scientists fully.

The Others

Pip & Venv

These are the default tools built into Python. They are lightweight and standard. However, they are strictly Python-only. They cannot manage the Python version (you must install Python separately), nor can they handle non-Python dependencies such as C++ compilers or system libraries. If your project needs ffmpeg to process video or gcc to compile code, pip cannot help you.

Conda

As discussed, Conda solved the “binary” problem for scientists. It handles Python versions and non-Python libraries excellently. However, the traditional conda command-line tool became notorious for being slow and “heavy.” Its dependency solver could take minutes (or hours) to calculate complex environments, and it often encouraged a workflow in which users created a single, bloated environment for all their work, leading to conflicts.

Poetry & UV

In recent years, new tools like Poetry and UV emerged to modernize the experience. They are fast and offer excellent features for pure software development. However, they often struggle to bridge the gap between “pure Python” software and the messy reality of computational research, which often involves a mix of languages and system-level tools.

Pixi

In this course, we use Pixi. It is not just “another tool”; it is a unification of the best features from the entire history of package management. Pixi is a modern package manager built on the Conda ecosystem, but it is written in Rust. This makes Pixi orders of magnitude faster than traditional Conda tools.

Why is Pixi the best tool for this course?

  1. Unlike pip or poetry, Pixi is not limited to Python. It supports Python, R, C++, Zig, MATLAB, Rust, Julia, Mojo, Java, generic system tools, … the list goes on. If your analysis pipeline needs a Python script, an R plotting library, and a C++ compiler, Pixi manages all of them in a single file.
  2. Pixi treats your project directory as an island. It installs dependencies locally within the project folder (in a hidden .pixi directory), rather than in a central location on your hard drive. This treats your computer as a “blank slate.” You could have five different projects using five different versions of Python and GCC, and they would never know the others exist.
  3. For those familiar with Rust’s cargo or JavaScript’s npm, Pixi brings that same developer-friendly experience to science. It simplifies complex tasks into intuitive commands, allowing you to focus on your research rather than fighting your terminal.

The Lock File

In the world of package management, there is a dangerous ambiguity between what you want and what you get. To solve this, Pixi relies on two distinct files that work in tandem: the manifest and the lock file. Understanding the difference between them is the difference between “hoping” your code works and “knowing” it works.

The Grocery List vs. The Receipt

Think of your project like a trip to the grocery store to bake an apple pie.

The Manifest (pixi.toml) is your Grocery List. On it, you write “Apples” and “Flour.” You might be specific and say “Apples must be red,” but you don’t care if they are Gala or Fuji, nor do you care which specific farm they came from. You just need apples to make the pie. In Pixi, this file contains your loose, high-level requirements (e.g., python >= 3.9 or numpy). It states your intent.

The Lock File (pixi.lock) is your receipt. When you leave the store, you have a precise record of exactly what was purchased. The receipt does not say “Apples”; it says “Gala Apples, SKU #4135, Harvest Batch 29, Supplier X.” It records the exact, immutable reality of what ended up in your basket. In Pixi, this file records the specific version, the build string, and the cryptographic hash of every single package installed (e.g., python 3.9.12-h12345_0).

Reliability lives in the receipt

Novices often confuse these two, thinking the pixi.toml is the most important file because it is the one humans edit. They are wrong. The pixi.lock file is the source of truth. It is the guarantee of reproducibility.

If you send your pixi.toml (Grocery List) to a collaborator, their computer goes to the store and might buy “Fuji Apples” instead of “Gala Apples” because the store inventory changed since you last shopped. They followed your instructions, but they ended up with a different environment. Your code breaks.

However, if you send them your pixi.lock (Receipt), Pixi forces their computer to acquire the same artifacts that you used. It creates a mathematically identical clone of your environment on their machine.

Pixi is designed to prioritize safety; it always includes an up-to-date lock file to ensure your workspace remains consistent. When you run an installation or update, Pixi solves the environment and immediately freezes the solution into this lock file.

The pixi.toml defines what your project needs. The pixi.lock defines what your project is. You must always commit both to version control. The lock file ensures that if your code runs today, it will run ten years from now, on any machine, with bit-for-bit precision.

Zero to Hero with Pixi

Now that we understand the theory, let’s build a real environment. We will create a project, install the tools needed for scientific analysis, and run code without ever touching your computer’s system-wide settings.

Initialization

First, we create the workspace. Open your terminal. We will create a new directory for our research project called my_project.

pixi init my_project

If you look inside the new folder (cd my_project), you will see a single file: pixi.toml. This command did not install Python. It did not change your system’s PATH. It simply placed a manifest file in a folder, marking this specific directory as a future Pixi workspace. It is currently a blank slate.

Adding Packages

Now we need to populate our environment. We want to use Python, and we want the numpy library for calculations.

cd my_project
pixi add python numpy

You will see output indicating that Pixi is “Resolving” and “Downloading.” This is the engine at work.

  1. Pixi looked at your request (“I want python and numpy”) and checked the conda-forge channel for compatible versions.
  2. It calculated the exact solution and wrote it to pixi.lock.
  3. It downloaded the binary packages and unpacked them into a hidden folder: .pixi/envs/default.

Your project now has its own private installation of Python and NumPy inside that hidden folder. The rest of your computer still doesn’t know they exist.

Running Code

This is the most critical workflow shift. In older tools, you would “activate” an environment, changing your shell session to use the new tools until you deactivated it. In Pixi, we prefer to execute commands through the environment.

Let’s create a simple script. Create a file named calc.py:

import numpy as np
print(f"Success! NumPy version: {np.__version__}")

Now, run it:

pixi run python calc.py

When you type pixi run, Pixi temporarily modifies the environment variables (like PATH) only for the duration of that single command.

  1. It locates the python executable inside the hidden .pixi folder.
  2. It runs your script using that Python.
  3. It instantly tears down the environment context once the script finishes.

This prevents the “forgot to deactivate” error. You never accidentally run a script with the wrong Python because you must explicitly ask Pixi to run it every time.

The Shell

Sometimes, you need to explore. You might want to open the Python interpreter (REPL) to test a line of code, or you might need to run multiple commands in a row to debug a problem. For this, we step inside the bubble.

pixi shell

Your terminal prompt will change. You are now “inside” the active environment.

If you type which python (on Mac/Linux) or Get-Command python (on Windows), you will see it points to the .pixi directory, not your system install. You can run python directly, import libraries, and inspect files manually.

When you are done exploring, type exit to leave the shell. You will return to your normal system prompt, and the Pixi environment will vanish from your path, leaving your computer clean.

Use pixi run for your official work (running pipelines, generating figures). It is precise and reproducible. Use pixi shell only for human tasks (debugging, exploring, learning). This discipline prevents “stateful” errors where you forget which environment you are currently in.

Last updated on