Publication

Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method

Abstract

Nonuniform grid refinement plays a fundamental role in simulating realistic flows with a multitude of length scales. We introduce the first GPU-optimized implementation of this technique in the context of the lattice Boltzmann method. Our approach focuses on enhancing GPU performance while minimizing memory access bottlenecks. We employ kernel fusion techniques to optimize memory access patterns, reduce synchronization overhead, and minimize kernel launch latencies. Additionally, our implementation ensures efficient memory management, resulting in lower memory requirements compared to the baseline LBM implementations that were designed for distributed systems. Our implementation allows simulations of unprecedented domain size (e.g., 1596 x 840 x 840) using a single A100-40 GB GPU thanks to enabling grid refinement capabilities on a single GPU. We validate our code against published experimental data. Our optimization improves the performance of the baseline algorithm by 1.3—2X. We also compare against state-of-the-art current solutions for grid refinement LBM and show an order of magnitude speedup.

Download publication

Associated Researchers

Ahmed Mahmoud

Senior Research Scientist

Hesam Salehipour

Principal Computational Physics Research Scientist

Massimiliano Meneghin

Senior Principal Research Scientist

View all researchers