Remote Computing

Biology operates on a scale our eyes cannot reach. Proteins fold, enzymes catalyze reactions, and drugs bind to receptors—all driven by the chaotic, high-speed dance of atoms. To understand these processes, we often cannot rely solely on wet-lab experiments, which give us averages over billions of molecules. We need to see the individual actors. We need a virtual microscope.

Molecular Dynamics (MD) simulations serve as this microscope. By defining the physical laws that govern how atoms interact, we can calculate their movements over time. This allows us to watch a protein fold or a drug find its binding pocket in atomic detail. However, these simulations come at a steep cost. Calculating the forces between thousands of atoms for millions of time steps requires immense computational power.

This certification challenges you to bridge the gap between theory and high-performance computing. You will build a primitive MD engine from scratch using Python to simulate a box of interacting particles. You will then deploy this simulation to the university’s supercomputer, leaving the graphical interface on your laptop behind to run on the raw power of a remote Linux cluster.

The Lennard-Jones Potential

To simulate how atoms move, we must first understand how they interact. Atoms are not hard billiard balls; they are soft, sticky spheres. They have a “comfort zone.” If two atoms get too close, their electron clouds overlap and repel each other violently. If they drift too far apart, weak attractive forces (van der Waals forces) try to pull them back together.

We model this interaction mathematically using the Lennard-Jones Potential. This simple equation captures the essence of atomic interaction:

V(r)=4ϵ[(σr)12(σr)6] V(r) = 4\epsilon \left[ \left( \frac{\sigma}{r} \right)^{12} - \left( \frac{\sigma}{r} \right)^{6} \right]

Here, rr represents the distance between two particles. The term (σ/r)12(\sigma/r)^{12} accounts for the repulsion at short ranges (it shoots up to infinity as particles get close, preventing them from fusing). The term (σ/r)6(\sigma/r)^{6} accounts for the attraction at long ranges. The parameters ϵ\epsilon (depth of the potential well) and σ\sigma (the distance where the potential is zero) define the specific chemical properties of the atom.

The Lennard-Jones potential tells us the energy landscape. To find the force pushing on an atom, we look at the slope of this landscape. A steep slope means a strong force. Mathematically, force is the negative gradient of the potential: F=VF = -\nabla V.

The Engine

Knowing the forces is only half the battle. We must convert these forces into movement. We do this by slicing time into tiny, discrete chunks called time steps (Δt\Delta t).

Imagine you are watching a movie frame by frame. In Frame 1, you know where an atom is and how fast it is moving. You also calculate the force acting on it using the Lennard-Jones equation. Newton’s Second Law (F=maF=ma) allows you to calculate the acceleration.

With the acceleration known, you can predict where the atom will be in Frame 2. You update the position based on the current velocity, and you update the velocity based on the current acceleration. This cycle repeats thousands or even millions of times.

  1. Calculate Forces: Look at all particle pairs and compute the push/pull using Lennard-Jones.
  2. Update Positions: Move particles based on their velocity.
  3. Update Velocities: Change speeds based on the forces/acceleration.
  4. Advance Time: Move to the next frame (t+Δtt + \Delta t).

This loop is the heartbeat of every molecular simulation engine, from the simple script you will write to the massive software packages running on national supercomputers.

Controlling the Temperature

In our simulation, “temperature” is not a static measurement. It is microscopic chaos. Temperature is simply the physical manifestation of kinetic energy—how violently the atoms jiggle. If we allow our simulation to run naturally, the total energy remains constant, but the temperature fluctuates wildly as potential energy converts to kinetic energy and back.

This presents a problem. Most real-world chemistry occurs at a specific, constant temperature, such as a beaker sitting on a hot plate. To mimic this reality, we cannot let the system evolve in isolation. We must connect it to a virtual “heat bath”—a massive reservoir of energy that acts as a thermal buffer.

We achieve this connection using the Andersen Thermostat. Unlike a crude velocity scaler that multiplies every particle’s speed by a fixed number, the Andersen Thermostat acts like a subtle, stochastic environment. It models the system as if it were constantly colliding with imaginary particles from the heat bath.

Implementing the Algorithm

To implement this, you must determine which particles collide with the heat bath during a given time step. We control this via a parameter called the collision frequency (ν\nu), which represents the average rate of collisions per unit of time.

First, we calculate the probability (PP) that any single particle undergoes a collision during the current time step (Δt\Delta t):

P=ν×ΔtP = \nu \times \Delta t

We then iterate through every particle in the system. For each particle, we generate a random number between 0 and 1. If that random number is less than PP, the particle has “collided” with the bath. We immediately replace its velocity with a new value generated from a random normal distribution defined by our target temperature. If the random number is greater than PP, the particle is unaffected and continues moving based on Newton’s laws.

Worked Example

Imagine we are simulating a gas of Argon atoms with a target temperature of 300 K. We set our collision frequency (ν\nu) to 0.1 collisions per femtosecond, and our time step (Δt\Delta t) is 1.0 fs.

First, we calculate the collision probability:

P=0.1×1.0=0.1P = 0.1 \times 1.0 = 0.1

This means there is a 10% chance for any specific atom to be reset in this step. Now, consider two particles, Atom A and Atom B:

  1. Atom A: We generate a random number, say 0.05. Since 0.05<0.10.05 < 0.1, Atom A hits the heat bath. We ignore its previous speed and assign it a new, random thermal velocity corresponding to 300 K.
  2. Atom B: We generate a random number, say 0.85. Since 0.85>0.10.85 > 0.1, Atom B is safe. It retains its current velocity and continues on its trajectory.

By applying this process to all particles at every step, the system naturally settles into the correct temperature distribution without the artificial artifacts introduced by simple velocity scaling.

The Skeleton Code

Here is a skeleton script to get you started.


import os

try:
    import numpy as np
    import numpy.typing as npt
except ImportError as e:
    raise ImportError(
        "Make sure you install numpy by running `pip install numpy`"
    ) from e

try:
    import scipy.constants as const
except ImportError as e:
    raise ImportError(
        "Make sure you install scipy by running `pip install scipy`"
    ) from e

try:
    import zarr
except ImportError as e:
    raise ImportError("Make sure you install zarr by running `pip install zarr`") from e

try:
    import polars as pl
except ImportError as e:
    raise ImportError(
        "Make sure you install polars by running `pip install polars`"
    ) from e

os.chdir(path=os.path.dirname(os.path.abspath(__file__)))

# ==================================================================================
# PARAMETERS (HARD-CODED)
# ==================================================================================

# System Parameters
N_PARTICLES = 100
BOX_SIZE = 20.0  # Angstroms
TEMPERATURE = 300.0  # Kelvin

# Simulation Settings
N_STEPS = 10000
DT = 1.0  # femtoseconds (1e-15 s)
ZARR_FILENAME = "simulation.zarr"
PARQUET_FILENAME = "simulation_stats.parquet"

# Particle Properties (Argon)
# Epsilon in kJ/mol
EPSILON = 0.997
# Sigma in Angstroms
SIGMA = 3.4
# Mass in Atomic Mass Units (u)
MASS = 39.948

# Student ID (Seed)
# TODO: Replace with your actual Pitt Student ID
STUDENT_ID = 42

# ==================================================================================
# UNIT CONVERSION HELPERS
# ==================================================================================
# MD simulations require strict unit consistency.
# We are using: Distance=Angstrom, Energy=kJ/mol, Mass=AMU, Time=fs.
# You must convert constants to match this system.

# Boltzmann constant in kJ/(mol*K)
# scipy.constants.R is Joules/(mol*K). Divide by 1000 for kJ.
KB = const.R / 1000.0

# Kinetic Energy Conversion
# When calculating KE = 0.5 * m * v^2 using AMU and A/fs,
# the result is in "Simulation Energy Units".
# Multiply by this factor to get kJ/mol.
# Factor derivation: 1e5 (velocity scaling) squared / 1000 (mass scaling) = 1e7/1000 -> 1e4
CONV_SIM_ENERGY_TO_KJMOL = 1e4

# Force/Acceleration Conversion
# When calculating F = ma, we need 'a' in A/fs^2.
# F is calculated in kJ/mol/A. Mass is in AMU.
# This factor converts (Force / Mass) into (Acceleration).
# It is the inverse of the energy conversion factor: 1 / 1e4 = 1e-4.
CONV_KJMOL_FORCE_TO_ACC = 1e-4

# Thermal Velocity Initialization
# When calculating sigma = sqrt(KB * T / Mass), the units are sqrt(kJ/g).
# This factor converts that result (approx m/s) into Angstrom/fs.
CONV_SQRT_KJG_TO_AFS = 1e-2


# ==================================================================================
# CORE FUNCTIONS
# ==================================================================================


def initialize_positions(n: int, box_len: float) -> npt.NDArray[np.float64]:
    """
    Places particles in the simulation box using a Simple Cubic (SC) lattice.

    This method is preferred over random placement for liquid/solid densities
    because it guarantees that no two particles start closer than the lattice
    spacing. This prevents 'energy explosions' caused by overlapping atoms
    in the Lennard-Jones potential.

    Args:
        n: The total number of particles to place.
        box_len: The length of one side of the cubic simulation box (Angstroms).

    Returns:
        A (n, 3) array of particle positions in x, y, z.
    """
    # Determine grid dimensions
    particles_per_side = int(np.ceil(n ** (1 / 3.0)))
    spacing = box_len / particles_per_side

    # Create a 1D array of coordinates for a single axis
    # effectively: [0.5, 1.5, 2.5, ...] * spacing
    axis_indices = np.arange(particles_per_side)
    axis_coords = (axis_indices * spacing) + (spacing / 2.0)

    # meshgrid creates dense coordinate matrices
    # indexing='ij' ensures matrix indexing (row, col, depth) rather than Cartesian (x, y)
    x, y, z = np.meshgrid(axis_coords, axis_coords, axis_coords, indexing="ij")

    # Stack combines them into shape (M, M, M, 3)
    # Reshape flattens the grid dimensions into a list of points: (M*M*M, 3)
    lattice = np.stack((x, y, z), axis=-1).reshape(-1, 3)

    # The lattice will likely have slightly more slots (M^3) than N.
    # We just slice off the excess.
    return lattice[:n]


def initialize_velocities(n: int, temp: float, mass: float) -> npt.NDArray[np.float64]:
    """
    Initializes velocities based on the Maxwell-Boltzmann distribution.

    This function generates random velocities from a Gaussian distribution
    defined by temperature and mass. It also subtracts the center-of-mass
    velocity to ensure the system does not drift.

    Args:
        n: Number of particles.
        temp: Temperature in Kelvin.
        mass: Particle mass in AMU.

    Returns:
        Shape (n, 3) containing vx, vy, vz in Angstroms/fs.
    """
    velocities = np.zeros((n, 3))

    # Calculate Standard Deviation (Sigma) for the Maxwell-Boltzmann distribution.
    sigma = np.sqrt((KB * temp) / mass) * CONV_SQRT_KJG_TO_AFS

    # Generate random velocities from normal distribution
    velocities = np.random.normal(loc=0.0, scale=sigma, size=(n, 3))

    # Remove Center of Mass (CoM) drift to keep box stationary
    average_velocity = np.mean(velocities, axis=0)
    velocities -= average_velocity

    return velocities


def compute_forces_and_energy(
    pos: npt.NDArray[np.float64], box_len: float, epsilon: float, sigma: float
) -> tuple[npt.NDArray[np.float64], float]:
    """
    Computes the Lennard-Jones forces and total potential energy.

    Args:
        pos: Shape (n, 3) positions in Angstroms.
        box_len: Box side length in Angstroms.
        epsilon: LJ well depth in kJ/mol.
        sigma: LJ zero-crossing distance in Angstroms.

    Returns:
        Shape (n, 3) forces in (kJ/mol/A).
        Total system potential energy in kJ/mol.
    """
    # Compute Distance Vectors (Broadcasting)
    # Shape becomes (n, n, 3). delta[i, j] is the vector pointing from j to i
    delta = pos[:, np.newaxis, :] - pos[np.newaxis, :, :]

    # TODO: Implement Minimum Image Convention (PBC)
    # Ensure that delta reflects the shortest distance across periodic boundaries.
    # Hint: Use np.round().
    # delta -= ...
    raise NotImplementedError("Implement Minimum Image Convention")

    # Compute Squared Distances
    r_sq = np.sum(delta**2, axis=2)

    # Handle Self-Interaction (Diagonal)
    np.fill_diagonal(r_sq, np.inf)

    # TODO: Calculate Force Scalar and Potential Energy using LJ 12-6 Potential
    # forces = ...
    # potential_energy = ...

    raise NotImplementedError("Implement LJ Force and Energy calculations")

    return forces, potential_energy


def andersen_thermostat(
    vel: npt.NDArray[np.float64],
    target_temp: float,
    mass: float,
    dt: float,
    collision_freq: float = 0.1,
) -> npt.NDArray[np.float64]:
    """
    Applies the Andersen Thermostat.

    Randomly selects particles to 'collide' with a heat bath, resetting
    their velocities to a Maxwell-Boltzmann distribution.

    Args:
        vel: Current velocities.
        target_temp: Desired temperature (Kelvin).
        mass: Particle mass (AMU).
        dt: Time step (fs).
        collision_freq: Frequency of collisions (1/fs).

    Returns:
        Velocities with stochastic updates applied.
    """
    n_particles = len(vel)

    # Calculate probability of collision for this step
    prob = collision_freq * dt

    # TODO: Implement Stochastic Collision Logic
    # 1. Generate a boolean mask identifying which particles collide (random < prob)
    # 2. For those particles, generate new velocities from a Normal Distribution
    #    (Mean=0, Scale=sigma based on target_temp and mass)
    # 3. Update 'vel' using the mask.

    # hit_mask = ...

    raise NotImplementedError("Implement Andersen Thermostat logic")

    return vel


def calculate_kinetic_energy(vel: npt.NDArray[np.float64], mass: float) -> float:
    """
    Calculates the total Kinetic Energy of the system in kJ/mol.

    Args:
        vel: Shape (n, 3) velocities in Angstrom/fs.
        mass: Particle mass in AMU.

    Returns:
        Total Kinetic Energy in kJ/mol.
    """
    # Square the velocities (v_x^2 + v_y^2 + v_z^2)
    # Result is an array of shape (n,) containing v^2 for each particle
    v_sq = np.sum(vel**2, axis=1)

    # Sum over all particles
    sum_v_sq = np.sum(v_sq)

    # Calculate Total Kinetic Energy
    # Formula: KE = 0.5 * m * sum(v^2)
    ke_total = 0.5 * mass * sum_v_sq * CONV_SIM_ENERGY_TO_KJMOL

    return ke_total


def velocity_verlet_part1(
    pos: npt.NDArray[np.float64],
    vel: npt.NDArray[np.float64],
    forces: npt.NDArray[np.float64],
    mass: float,
    dt: float,
) -> tuple[npt.NDArray[np.float64], npt.NDArray[np.float64]]:
    r"""
    Executes the first half of the Velocity Verlet integration algorithm.

    The Velocity Verlet algorithm is a symplectic integrator that is time-reversible
    and offers second-order stability. This function updates the
    velocities to the half-step ($t + 0.5\Delta t$) and positions to the full
    step ($t + \Delta t$).

    Mathematical formulation:

    $$ v(t + 0.5 \Delta t) = v(t) + 0.5 \cdot a(t) \cdot \Delta t $$

    $$ r(t + \Delta t) = r(t) + v(t + 0.5 \Delta t) \cdot \Delta t $$

    Args:
        pos: The current particle positions.
            Shape: (N_particles, 3). Units: Angstroms.
        vel: The current particle velocities.
            Shape: (N_particles, 3). Units: Angstroms/fs.
        forces: The forces acting on particles at the current positions.
            Shape: (N_particles, 3). Units: kJ/(mol*Angstrom).
        mass: The mass of the particles. Units: Atomic Mass Units (u).
        dt: The integration time step. Units: femtoseconds (fs).

    Returns:
        Positions at $t + \Delta t$.

        Velocities at $t + 0.5\Delta t$.

    Note:
        This function applies `CONV_KJMOL_FORCE_TO_ACC` internally to convert
        Force/Mass into Acceleration (Angstrom/fs^2).
    """
    # Calculate acceleration from current forces
    acc = (forces * CONV_KJMOL_FORCE_TO_ACC) / mass

    # TODO: Implement Part 1 of Velocity Verlet
    # vel_half = ...
    # pos_new = ...

    raise NotImplementedError("Implement Velocity Verlet Part 1")

    return pos_new, vel_half


def velocity_verlet_part2(
    vel_half: npt.NDArray[np.float64],
    forces_new: npt.NDArray[np.float64],
    mass: float,
    dt: float,
) -> npt.NDArray[np.float64]:
    r"""
    Executes the second half of the Velocity Verlet integration algorithm.

    This function completes the integration step by updating the velocities
    from the half-step ($t + 0.5\Delta t$) to the full step ($t + \Delta t$)
    using forces calculated at the new positions.

    Mathematical formulation:

    $$
    v(t + \Delta t) = v(t + 0.5 \Delta t) + 0.5 \cdot a(t + \Delta t) \cdot \Delta t
    $$

    Args:
        vel_half: The velocities at the half-step ($t + 0.5\Delta t$).
            Typically returned by `velocity_verlet_part1`.
            Shape: (N_particles, 3). Units: Angstroms/fs.
        forces_new: The forces calculated at the new positions ($t + \Delta t$).
            Shape: (N_particles, 3). Units: kJ/(mol*Angstrom).
        mass: The mass of the particles.
            Units: Atomic Mass Units (u).
        dt: The integration time step.
            Units: femtoseconds (fs).

    Returns:
        Velocities at the full step $t + \Delta t$.
            Shape: (N_particles, 3). Units: Angstroms/fs.
    """
    acc_new = (forces_new * CONV_KJMOL_FORCE_TO_ACC) / mass

    # TODO: Implement Part 2 of Velocity Verlet
    # vel_new = ...

    raise NotImplementedError("Implement Velocity Verlet Part 2")

    return vel_new


# ==================================================================================
# MAIN SIMULATION LOOP
# ==================================================================================


def main():
    # Set seed for reproducibility
    np.random.seed(STUDENT_ID)

    print(f"Starting Simulation with {N_PARTICLES} atoms for {N_STEPS} steps.")

    # Initialization
    pos = initialize_positions(N_PARTICLES, BOX_SIZE)
    vel = initialize_velocities(N_PARTICLES, TEMPERATURE, MASS)

    traj_pos = np.zeros((N_STEPS, N_PARTICLES, 3), dtype=np.float64)
    traj_vel = np.zeros((N_STEPS, N_PARTICLES, 3), dtype=np.float64)

    stats_data = {
        "step": np.zeros(N_STEPS, dtype=np.uint32),
        "time_fs": np.zeros(N_STEPS, dtype=np.float64),
        "potential_energy": np.zeros(N_STEPS, dtype=np.float64),
        "kinetic_energy": np.zeros(N_STEPS, dtype=np.float64),
        "total_energy": np.zeros(N_STEPS, dtype=np.float64),
        "temperature": np.zeros(N_STEPS, dtype=np.float64),
    }

    # Initial force calculation
    # Forces are in (kJ/mol / Angstrom)
    try:
        forces, pe = compute_forces_and_energy(pos, BOX_SIZE, EPSILON, SIGMA)
    except NotImplementedError as e:
        print(f"Stopping: {e}")
        return

    # --------------------------------------------------------------------------
    # TIME INTEGRATION LOOP
    # --------------------------------------------------------------------------
    for step in range(N_STEPS):
        try:
            pos, vel_half = velocity_verlet_part1(pos, vel, forces, MASS, DT)
            pos = np.mod(pos, BOX_SIZE)  # Periodic Boundary Conditions
            forces_new, pe = compute_forces_and_energy(pos, BOX_SIZE, EPSILON, SIGMA)
            vel = velocity_verlet_part2(vel_half, forces_new, MASS, DT)
            vel = andersen_thermostat(vel, TEMPERATURE, MASS, DT, collision_freq=0.1)
        except NotImplementedError as e:
            print(f"\nSimulation loop failed at step {step}: {e}")
            print("Please implement the missing physics functions.")
            return

        # Update references for next loop
        forces = forces_new

        # Store Data
        traj_pos[step] = pos
        traj_vel[step] = vel

        # Calculate Statistics
        ke = calculate_kinetic_energy(vel, MASS)
        dof = 3 * N_PARTICLES - 3
        current_temp = (2.0 * ke) / (dof * KB)

        stats_data["step"][step] = step
        stats_data["time_fs"][step] = step * DT
        stats_data["potential_energy"][step] = pe
        stats_data["kinetic_energy"][step] = ke
        stats_data["total_energy"][step] = pe + ke
        stats_data["temperature"][step] = current_temp

        if step % 100 == 0:
            print(f"Step {step}/{N_STEPS} complete. Temp: {current_temp:.2f} K")

    print(f"Saving trajectory to {ZARR_FILENAME}...")
    store = zarr.storage.LocalStore(ZARR_FILENAME)
    root = zarr.group(store=store, overwrite=True)
    root.create_array("positions", data=traj_pos)
    root.create_array("velocities", data=traj_vel)

    print(f"Saving statistics to {PARQUET_FILENAME}...")
    df = pl.DataFrame(stats_data)
    df.write_parquet(PARQUET_FILENAME)
    print("Done.")


if __name__ == "__main__":
    main()

The Development Workflow

High-performance computing (HPC) resources are powerful but also shared and remote. You do not have the luxury of a graphical interface, and you often have to wait in a queue for your code to run. If you wait two hours for a job to start, only to have it crash instantly because of a single typo, you have wasted your day.

To avoid this frustration, professional engineers follow a strict protocol: Develop Locally, Deploy Remotely.

This means you must write, test, and debug your logic on your own personal computer before you ever log in to the cluster. The cluster is for production runs, heavy calculations that take hours, not for finding syntax errors.

  1. Create the skeleton script lj-sim.py to your personal machine.
  2. Open the script and implement your functions.
  3. If the script crashes, fix the logic on your machine. Continue this cycle until the script runs to completion and produces the output files.
  4. Write your SLURM submission script and enqueue the job.

The Deliverable

Please submit the following files to the corresponding gradescope assignment.

  • The completed lj-sim.py file with all requested implementations.
  • The submit.slurm file that you used to submit the sbatch job. The script must charge the biosc1640-2026s account and call the crc-job-stats command at the end.
  • The output file produced by your completed sbatch job.
Last updated on