Mojo: Performance-Portable HPC Kernels on GPUs

Description

The September 25 2025 academic paper **evaluates the performance and portability** of the novel **Mojo programming language** for high-performance computing (**HPC**) scientific kernels on modern GPUs. Researchers compare Mojo’s performance against vendor-specific baselines, **CUDA for NVIDIA H100** and **HIP for AMD MI300A** GPUs, using four workloads: two memory-bound (seven-point stencil and BabelStream) and two compute-bound (miniBUDE and Hartree–Fock). The paper finds that Mojo's performance is highly competitive for memory-bound kernels, particularly on AMD GPUs, but notes performance gaps in compute-bound kernels due to the current **lack of fast-math optimizations** and limitations with atomic operations. Overall, the work suggests Mojo has significant potential to **close performance and productivity gaps** in the fragmented Python ecosystem by leveraging its **MLIR-based compile-time** architecture for GPU programming.

Source:

https://www.arxiv.org/pdf/2509.21039

Listen

Description

Want to check another podcast?