Notes

Implementing new functionalities for the LAMMPS GPU Package

Suppose that you have already implemented a pair style class named PairFoo. Now you want to have a new class PairFooGPU for GPU acceleration. The following steps will guide you through the process of adding a new pair style to the GPU package in LAMMPS. There are two places you need to add new source files: the Foo class and exported functions the GPU library in lib/gpu (i.e. libgpu.a) and the new /gpu styles in src/GPU.

1. Addition to the GPU library under `lib/gpu`

You will need to add/implement four source files:

lal_foo.h: header of the class Foo
lal_foo.cpp: contains the implementation of the class Foo
lal_foo.cu: contains the GPU kernel(s) for the force compute, where the computation mirrors what you have in PairFoo.
lal_foo_ext.cpp: creates an instance of the Foo class and exports the necessary functions to be invoked by the GPU version of the pair style to initialize a Foo instance.

A good start for implementing a new pair style to look at is the corresponding files for the Gauss class in lib/gpu. You will see how the per-type arrays are declared and allocated (lal_gauss.h and lal_gauss.cpp), how the kernels are implemented (lal_gauss.cu) and how the exported functions are defined (lal_gauss_ext.cpp).

The class Gauss is derived from BaseAtomic, which handles the host device transfers for the atom properties (positions and types) and the force computation in two compute() functions. One compute() function is invoked when the atom neighbor lists are copied from the host, the other (with the sub-domain coordinates arguments sublo and subhi) when the atom neighbor lists are built from the device. After the neighbor lists are ready, both compute() functions invoke the loop() function to launch the force compute kernels. Any child class of BaseAtomic, Gauss in this case, needs to override this virtual loop() function. The pair force kernels are defined in the .cu file: the _fast kernel is invoked in the cases where all the pair coefficients can be loaded in the thread block shared memory, i.e., when the number of atom types is smaller than 12.

The Gauss class also needs to allocate and deallocate memory for the arrays that store pair coefficients, and transfer them to the device in the init() member function. These arrays are data structures provided by the Geryon library that allow for efficient memory management and data transfer between host and device.

2. Addition to src/GPU

Next, you will create an entry from src/GPU to call the external functions defined in lal_foo_ext.cpp. You will need to create the PairFooGPU class (pair_foo_gpu.h and pair_foo_gpu.cpp) derived from your PairFoo class. pair_gauss_gpu.h and pair_gauss_gpu.cpp are good examples to start with.

In the PairFooGPU class, you will invoke the external functions from the Gauss class provided by lal_gauss_ext.cpp at the proper place. For instance, in the init_style() function, you will call the init() function of the Gauss class to allocate memory for the pair coefficients and transfer them to the device. In the PairFooGPU::compute() function, you will call the corresponding functions: _compute() function for "neigh no" (neighbor builds on the host) or _compute_n() for "neigh yes" (neighbor builds on the device) to compute the atom forces, potential energy and virial. The class destructor will invoke the clear() function to release the memory allocated by the GPU library both on host and device.

3. Build LAMMPS with the new pair style

For CMake builds, you may want to modify the file cmake/Module/Packages/GPU.cmake if the new pair style depends on certain packages to be installed. You then run cmake with the following command to build LAMMPS with the GPU package with your new pair style:


    cmake -S cmake -B build -C cmake/presets/basic.cmake -D PKG_GPU=ON -D GPU_API=cuda -D GPU_PREC=mixed
    cmake --build build -j4

By design, your PairFooGPU class depends on PairFoo, so if PairFoo is part of the MISC package, for example, PairFooGPU will be built with both -D PKG_MISC=ON and -D PKG_GPU=ON.

For GNU make builds, similarly, you need to modify Install.sh in src/GPU so that the newly added pair style got installed when users (you) run "make yes-gpu" or "make package-update" from src/. The lines to be added should look similar to what is done to pair_gauss_gpu.cpp and pair_gauss_gpu.h. Once you are done implementing the PairFooGPU class, you can copy the source files into src/ and rebuild LAMMPS with the updated GPU package:


    make yes-gpu
    make your_machine

If you need additional per-atom properties in the new pair style, then lal_dpd_coul_slater_long.h, lal_sph_lj.h and lal_base_amoeba.h are good examples.

4. Test the new pair style

To test the new pair style, you start with a simple input script, in.foo, that uses your PairFoo class. It is recommended that the input script require minimal setup, perform a sufficient number of timesteps (for activating neighbor builds) and not include irrelevant features (such as computes and fixes). The example input script should complete successfully and produce the expected output, such as the steady potential energy and pressure (as a measure of atom forces). The log file for a parallel run with 4 MPI procs can be saved as reference.

The next step is to run the input script with the GPU version of the pair style with the same number of MPI processes with the CPU-only version:


    mpirun -np 4 /path/to/lmp -in in.foo -sf gpu -pk gpu 1

The output should be almost identical to the CPU-only version at the very first time step, and statistically consistent over time. To be careful, you can rebuild the GPU version in the double precision mode (-D GPU_PREC=double) to compare the results with the CPU version. For pair styles that involve transcendental functions, deviations with the CPU-only version over time are unavoidable even in double precision. To debug the GPU pair style, switch to using 1 MPI process on 1 GPU and rebuild with the debug mode -D GPU_DEBUG=ON to get more information about the GPU kernel execution. You can use gdb, and printf in the kernels with the CUDA backend (-D GPU_API=cuda) to debug the GPU code. and

Notes

If you want to add new functionalities beyond pair styles, you can look at the PPPM class.
To add new bond/angles/dihedral styles, fixes and computes to the GPU package, you need to provide the GPU kernels and the exported functions to handle allocation/deallocation of the data structures and their host-device transfers, similar to what is done with the new pair style here.

Trung D. Nguyen

Implementing new functionalities for the LAMMPS GPU Package

1. Addition to the GPU library under lib/gpu

2. Addition to src/GPU

3. Build LAMMPS with the new pair style

Notes

4. Test the new pair style

Notes

1. Addition to the GPU library under `lib/gpu`