Cluster Configuration | User Guide | Administration Guide

Download GPU Cluster User Guide.pdf

GPU Cluster User Guide

Quick Start

  1. Log into the cluster via SSH. (Recommended SSH client software:
    PuTTY/MobaXterm for Windows, iterm2/Terminus for Mac.)

    For example:

    ssh [username]@[hostname]

    IMPORTANT! If this is your first time logging into the system, please update your account IMMEDIATELY with a strong password using the command line:passwd [username]

  2. Log into the Docker Container using received container-id. sudo docker attach [container-id]
  3. You are sudo user in the container now. Get started with your program!
  4. Detach from the container using the command: Ctrl+P followed by Ctrl+Q. The container will be stopped if you exit the container (e.g., using the command exit). If you want to keep the container running, please use the detach command.

Start the container if you accidentally exit it:
sudo docker start [container-id]

Restart the container if required:
sudo docker restart [container-id]

Copy files from/to the container
sudo docker cp [OPTIONS] [container-id]:[src_path] [dest_path]
sudo docker cp [OPTIONS] [src_path] [container-id]:[dest_path]

More details at https://docs.docker.com/engine/reference/commandline/cp/

If you need a large space of storage (>200GB), please contact the administrator to create a volume for your container without the need of copying the files.

Tips

  1. Want to keep your program running after network disconnection?
    Try SSH session management tools, e.g., Byobu (https://www.byobu.org/home), screen (https://www.digitalocean.com/community/tutorials/how-to-install-and-use-screen-on-an-ubuntu-cloud-server).
  2. Check the status of GPU devices
    nvidia-smi
  3. Select a GPU device for your program if you have multiple GPUs
    CUDA_VISIBLE_DEVICES=[GPU-IDs] python your_program.py