Content Comparison

Table of Contents

Overview

Austin's own Advanced Micro Devices (AMD) has most generously donated a number of GPU-enabled servers to UT.

While it is still true that AMD GPUs do not support as many 3rd party applications as NVIDIA, they do support many popular Machine Learning (ML) applications such as TensorFlow, PyTorch, and AlphaFold, and Molecular Dynamics (MD) applications such as GROMACS, all of which are installed and ready for use.

Two BRCF research pods have AMD GPU servers available: the Hopefog and Livestong PODs. Their use is restricted to the groups who own those pods. See Livestrong and Hopefog pod AMD servers for specific information.

The BRCF's AMD GPU pod is available for instructional use and for research use for qualifying UT-Austin affiliated PIs. Allocations are granted to groups who will only perform certain GPU-enabled workflows. To request an allocation, contact us at rctf-support@utexas.edu, and provide the UT EIDs of those who should be granted access.

The ROCm framework

ROCm is AMD's equivalent to the CUDA framework. ROCm is open source, while CUDA is proprietary.

ROCm versions and GPU type

We have multiple versions of the ROCm framework installed in the /opt directory, designated by a version number extension (e.g. /opt/rocm-5.7.2, /opt/rocm-5.2.3). The default version is the one pointed to by the /opt/rocm symbolic link, which is generally the latest version supported on the specific server.

Livestrong and Hopefog pod AMD GPU servers have MI-50 GPUs (livecomp02/03, hfogcomp02/03. As of May 2024, the highest ROCm version supported for the MI-50 GPUs is rocm-5.7.2, which is the last minor version in the ROCm 5.x series. ROCm 5.7.2 is the default ROCm for these MI-50 servers, but a lower ROCm version may be selected.

AMD GPU pod servers have MI-100 GPUs (amdgcomp01/02/03), which support the newer ROCm 6.x series. As of July 2025, the ROCm default for these MI-100 servers is currently ROCm 6.3.1.

Changing ROCm versions

To specify a particular ROCm version other than the default, set the ROCM_PATH environment variable; for example:

Code Block
export ROCM_PATH=/opt/rocm-5.1.3

You may also need to adjust your LD_LIBRARY_PATH as follows:

Code Block
export LD_LIBRARY_PATH="/opt/rocm-5.1.3/hip/lib:$LD_LIBRARY_PATH"

GPU-enabled software

AlphaFold

The AlphaFold protein structure solving software is available on all AMD GPU servers. The /stor/scratch/AlphaFold directory has the large required database, under the data.4 sub-directory. There is also an AMD example script /stor/scratch/AlphaFold/alphafold_example_amd.shand an alphafold_example_nvidia.sh script if the POD also has NVIDIA GPUs, (e.g. the Hopefog pod).

On AMD GPU servers, AlphaFold is implemented by a run_alphafold.py Python script inside a Docker image, See the run_alphafold_rocm.sh and run_multimer_rocm.sh scripts under /stor/scratch/AlphaFold for a complete list of options to that script.

Pytorch and TensorFlow

All pod compute servers have 2 main Python environments, which are all managed separately (see About Python and JupyterHub server for more information about these environments):

command-line Python 2.7 (python2.7, pip2.7)
command-line Python 3.12 (python, python3, python3.12, pip, pip3, pip3.12)
web-based JupyterHub which uses the Python 3.12 kernel

We are working hard to get The status of AMD-GPU-enabled versions of TensorFlow and PyTorch working in all three environments. Current status each environment is as follows:

POD

GPU-enabled PyTorch on (all AMD - GPU servers)

GPU-enabled TensorFlow

AMD GPU

command-line python3, python3.12
JupyterHub (
- e.g. https://amdgcomp02.ccbb.utexas.edu/
)

command-line python3, python3.8
JupyterHub (e.g. https://amdgcomp02.ccbb.utexas.edu/)

HopefogLivestrong

command-line python3, python3.8 (upgrade coming soon)12
JupyterHub (
- e.g. https://
hfogcomp02
- livecomp02.ccbb.utexas.edu/
)

command-line python3, python3.8 (upgrade coming soon)

Livestrong

12
- must do this first:
  source /stor/scratch/GPU_info/activate_tensorflow_amd_conda
AMD-GPU-enabled TensorFlow in JupyterHub is not supported

Hopefog

command-line python3, python3.128
- upgrade coming soon
JupyterHub (
- e.g. https://
livecomp02
- hfogcomp02.ccbb.utexas.edu/
)

command-line python3, python3.128
- upgrade coming soon; will be the same as Livestrong pod

Pytorch/TensorFlow example scripts

...

benchmarks/ - a set of MD benchmark files from https://www.mpinat.mpg.de/grubmueller/bench
gromacs_amd_example.sh - a simple GROMACS example script taking advantage of the GPU, running the benchMEM.tpr benchmark by default.
gromacs_cpu_example.sh - a GROMACS example script using the CPUs only.

Resources

ROCm environment

ROCm is AMD's equivalent to the CUDA framework. ROCm is open source, while CUDA is proprietary.

We have multiple versions of the ROCm framework installed in the /opt directory, designated by a version number extension (e.g. /opt/rocm-5.1.3, /opt/rocm-5.2.3). The default version is the one pointed to by the /opt/rocm symbolic link, which is generally the latest version.

As of May 2024, the highest ROCm version installed (and the default) is rocm-5.7.2. This is the last minor version in the ROCm 5.x series. ROCm series 6.x versions have now been published, but we do not yet have them installed on the AMD compute servers.

To specify a particular ROCm version other than the default, set the ROCM_HOME environment variable; for example:

Code Block
export ROCM_HOME=/opt/rocm-5.1.3

You may also need to adjust your LD_LIBRARY_PATH as follows:

...

Command-line diagnostics

GPU usage: rocm-smi
CPU and GPU details: rocminfo
What ROCm modules are installed: dpkg -l | grep rocm
GPU ↔ GPU/CPU communication bandwidth test
- between GPU2 and CPU: rocm-bandwidth-test -b2,0
- between GPU3 and GPU4: rocm-bandwidth-test -b3,4

...

ROCm Video series
- https://community.amd.com/t5/instinct-accelerators-blog/rocm-open-software-ecosystem-for-accelerated-compute/ba-p/418720
- Especially the Introduction to AMD GPU Hardware: Link
  - Provides hardware background and terminology used throughout other guides
- Also
  - GPU Programming Concepts
    - Part 1 - HIP framework (like NVIDIA CUDA): Link
    - Part 2 - Device management, synchronization, MPI programming: Link
    - Part 3 - Device code, shared memory & thread synchronization: Link
  - GPU Programming Software (compilers, libraries & tools): Link
AMD ROCm resources Learning Center: https://developer.amd.com/resources/rocm-resources/rocm-learning-center/
- Especially:
  - Introduction to ROCm (Video, PDF)
  - Introduction to HIP (Video, PDF)
  - Introduction to Deep Learning on ROCm (Video, PDF)

...

Version	Old Version 23	New Version 24
Changes made by	Anna Battenhouse	Anna Battenhouse
Saved on	Jul 18, 2025	Jul 18, 2025

Versions Compared

Key

Overview

The ROCm framework

ROCm versions and GPU type

Changing ROCm versions

GPU-enabled software

AlphaFold

Pytorch and TensorFlow

Pytorch/TensorFlow example scripts

Resources

ROCm environment

Command-line diagnostics