UOA Cloud Server: Setup and Usage Guide
This guide walks through everything you need to start running deep learning workloads on the UOA compute cluster — from initial login to submitting GPU jobs and monitoring training.
0. First-Time Setup
When you first get access to your machine, point your data and home directories at the large /data/<UPI>/ partition so that datasets, checkpoints, and virtual environments don't fill up the small system disk.
|
|
Tip: add these lines to your
~/.bashrcor~/.zshrcso they persist across sessions.
1. SSH Login
|
|
Replace <UPI> with your University of Auckland UPI (e.g., jdoe001).
2. Python Environment with uv
We use uv for fast, reproducible Python environment management.
Install uv
|
|
Set Up an Environment
If the project has a pyproject.toml:
|
|
uv creates the virtual environment and installs all dependencies in one step.
If the project only has a requirements.txt:
|
|
3. Submitting GPU Jobs with SLURM
To run training on a GPU, submit a shell script to the SLURM scheduler — don't run heavy workloads directly on the login node.
Example Job Script
|
|
Common SLURM Commands
| Action | Command |
|---|---|
| Submit a job | sbatch xxx.sh |
| Check your jobs | squeue -u $USER |
| Cancel a job | scancel <job_id> |
4. Monitoring Your Workflow
Watch your job queue in real time:
|
|
Tail a log file to follow output or errors live:
|
|
This is the fastest way to debug a running job without waiting for it to finish.
5. Downloading Output Files
For terminal users who want to pull results back to a local machine:
|
|
The pattern is scp <user>@<host>:<remote_path> <local_path>.