Optimized GPU Implementation of Grid Refinement in Lattice Boltzmann Method


Nonuniform grid refinement plays a fundamental role in simulating realistic flows with a multitude of length scales. We introduce the first GPU-optimized implementation of this technique in the context of the lattice Boltzmann method. Our approach focuses on enhancing GPU performance while minimizing memory access bottlenecks. We employ kernel fusion techniques to optimize memory access patterns, reduce synchronization overhead, and minimize kernel launch latencies. Additionally, our implementation ensures efficient memory management, resulting in lower memory requirements compared to the baseline LBM implementations that were designed for distributed systems. Our implementation allows simulations of unprecedented domain size (e.g., 1596 x 840 x 840) using a single A100-40 GB GPU thanks to enabling grid refinement capabilities on a single GPU. We validate our code against published experimental data. Our optimization improves the performance of the baseline algorithm by 1.3—2X. We also compare against state-of-the-art current solutions for grid refinement LBM and show an order of magnitude speedup.

Download publication

Related Publications



Neon: A Multi-GPU Programming Model for Grid-based Computations

We present Neon, a new programming model for grid-based computation…



XLB: A Differentiable Massively Parallel Lattice Boltzmann Library in Python

This research introduces the XLB library, a scalable Python-based…



RXMesh: A GPU Mesh Data Structure

We propose a new static high-performance mesh data structure for…

Get in touch

Something pique your interest? Get in touch if you’d like to learn more about Autodesk Research, our projects, people, and potential collaboration opportunities.

Contact us