c. Run WRF

Create a Slurm sbatch script to run the CONUS 12-km model:

cd /shared/conus_12km/
cat > slurm-wrf-conus12km.sh << EOF

#SBATCH --job-name=WRF
#SBATCH --output=conus-%j.out
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=16
#SBATCH --exclusive

spack load intel-oneapi-mpi
spack load wrf
module load libfabric-aws
wrf_exe=\$(spack location -i wrf)/run/wrf.exe
set -x
ulimit -s unlimited
ulimit -a

export FI_PROVIDER=efa
export I_MPI_FABRICS=ofi
export I_MPI_PIN_DOMAIN=omp
export KMP_AFFINITY=compact
export I_MPI_DEBUG=4

time mpiexec.hydra -np \$SLURM_NTASKS --ppn \$SLURM_NTASKS_PER_NODE \$wrf_exe

In the above job script we’ve set environment variables to ensure that Intel MPI and OpenMP are pinned to the correct cores and EFA is enabled. See

Environment Variable Value
OMP_NUM_THREADS=6 Number of OpenMP threads. We’re using 16 MPI procs, each with 6 OMP threads to use all 16 x 6 = 96 cores.
FI_PROVIDER=efa Enable EFA. This tells libfabric to use the EFA fabric.
I_MPI_FABRICS=ofi This tells Intel MPI to use libfabric.
I_MPI_OFI_LIBRARY_INTERNAL=0 This tells Intel MPI to use the version of libfabric packaged on the OS and not the one built into impi.
I_MPI_OFI_PROVIDER=efa This tells Intel MPI to use the libfabric efa provider.
I_MPI_PIN_DOMAIN=omp The domain size is equal to OMP_NUM_THREADS, this ensures that each MPI rank is associated with it’s own non-overlapping domain.
KMP_AFFINITY=compact Specifying compact assigns the OpenMP thread N+1 to a free thread context as close as possible to the thread context where the N OpenMP thread was placed.
I_MPI_DEBUG=4 Debugging info including process pinning information.

Submit the job:

sbatch slurm-wrf-conus12km.sh

Monitor the job’s status with squeue:


Using 192 cores, the job took 4 mins 17 seconds to complete.