Tutorials
This section provides step-by-step tutorials for building practical applications with LuisaCompute.
Table of Contents
ShaderToy-Style Mandelbrot - A gentle introduction
Path Tracing Renderer - Global illumination with ray tracing
MPM Fluid Simulation - Material Point Method for fluids
Conway’s Game of Life - Cellular automata with ping-pong buffering
Wave Equation Simulation - Interactive PDE solver
Image Processing Pipeline - Multi-pass filters and effects
N-Body Simulation - Gravitational particle physics
Fire Particle System - Physics-based fire simulation
Reaction-Diffusion - Gray-Scott pattern formation
Voxel Ray Tracer - Real-time voxel rendering
Black Hole Renderer - Interstellar-style gravitational lensing
ShaderToy-Style Mandelbrot Set
Let’s start with a simple yet visually appealing example: rendering the Mandelbrot set. This mimics the style of ShaderToy, a popular platform for fragment shaders.
Final Result
A colorful visualization of the Mandelbrot set with smooth coloring.
Step-by-Step Implementation
Step 1: Project Setup
Create a new project with the following structure:
mandelbrot/
├── CMakeLists.txt
└── main.cpp
CMakeLists.txt:
cmake_minimum_required(VERSION 3.23)
project(mandelbrot LANGUAGES CXX)
set(CMAKE_CXX_STANDARD 20)
# Add LuisaCompute
add_subdirectory(LuisaCompute)
add_executable(mandelbrot main.cpp)
target_link_libraries(mandelbrot PRIVATE luisa::compute)
Step 2: Basic Structure
main.cpp - Initial Setup:
#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <stb/stb_image_write.h>
using namespace luisa;
using namespace luisa::compute;
int main(int argc, char *argv[]) {
// Step 1: Create context and device
Context context{argv[0]};
Device device = context.create_device("cuda"); // or "cpu", "metal", "dx"
Stream stream = device.create_stream();
// Step 2: Create an image to render to
static constexpr uint2 resolution = make_uint2(1024u, 1024u);
Image<float> image = device.create_image<float>(
PixelStorage::FLOAT4, resolution.x, resolution.y);
// We'll add the kernel here...
return 0;
}
Step 3: Understanding the Mandelbrot Set
The Mandelbrot set is defined by the iteration:
z_{n+1} = z_n² + c
Where c is a complex number corresponding to pixel coordinates. If the sequence remains bounded (|z| ≤ 2), the point is in the set.
Step 4: Implementing the Kernel
// Inside main(), after creating the image:
Kernel2D mandelbrot_kernel = [&](ImageFloat image) noexcept {
// Get pixel coordinates
UInt2 coord = dispatch_id().xy();
Float2 uv = make_float2(coord) / make_float2(resolution);
// Map to complex plane: centered at (-0.5, 0), zoomed in
Float2 c = (uv - 0.5f) * 3.0f - make_float2(0.5f, 0.0f);
// Mandelbrot iteration
Float2 z = make_float2(0.0f);
Float n = 0.0f;
static constexpr int M = 100; // Maximum iterations
$for (i, M) {
// z = z² + c
Float2 z_new = make_float2(
z.x * z.x - z.y * z.y + c.x,
2.0f * z.x * z.y + c.y
);
z = z_new;
// Check divergence
$if (dot(z, z) > 4.0f) {
$break;
};
n += 1.0f;
};
// Color using cosine palette (Inigo Quilez's technique)
Float t = n / static_cast<float>(M);
Float3 color = 0.5f + 0.5f * cos(6.28318f * (t + make_float3(0.0f, 0.33f, 0.67f)));
// Write to image
image.write(coord, make_float4(color, 1.0f));
};
Step 5: Compile and Execute
// Compile the kernel
auto shader = device.compile(mandelbrot_kernel);
// Execute
stream << shader(image).dispatch(resolution)
<< synchronize();
// Save result
luisa::vector<uint8_t> pixels(resolution.x * resolution.y * 4);
stream << image.copy_to(pixels.data())
<< synchronize();
stbi_write_png("mandelbrot.png", resolution.x, resolution.y, 4, pixels.data(), 0);
LUISA_INFO("Saved to mandelbrot.png");
Complete Code
Click to expand complete code
#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <stb/stb_image_write.h>
using namespace luisa;
using namespace luisa::compute;
int main(int argc, char *argv[]) {
Context context{argv[0]};
Device device = context.create_device("cuda");
Stream stream = device.create_stream();
static constexpr uint2 resolution = make_uint2(1024u, 1024u);
Image<float> image = device.create_image<float>(
PixelStorage::FLOAT4, resolution.x, resolution.y);
Kernel2D mandelbrot_kernel = [&](ImageFloat image) noexcept {
UInt2 coord = dispatch_id().xy();
Float2 uv = make_float2(coord) / make_float2(resolution);
Float2 c = (uv - 0.5f) * 3.0f - make_float2(0.5f, 0.0f);
Float2 z = make_float2(0.0f);
Float n = 0.0f;
static constexpr int M = 100;
$for (i, M) {
Float2 z_new = make_float2(
z.x * z.x - z.y * z.y + c.x,
2.0f * z.x * z.y + c.y
);
z = z_new;
$if (dot(z, z) > 4.0f) { $break; };
n += 1.0f;
};
Float t = n / static_cast<float>(M);
Float3 color = 0.5f + 0.5f * cos(6.28318f * (t + make_float3(0.0f, 0.33f, 0.67f)));
image.write(coord, make_float4(color, 1.0f));
};
auto shader = device.compile(mandelbrot_kernel);
stream << shader(image).dispatch(resolution) << synchronize();
luisa::vector<uint8_t> pixels(resolution.x * resolution.y * 4);
stream << image.copy_to(pixels.data()) << synchronize();
stbi_write_png("mandelbrot.png", resolution.x, resolution.y, 4, pixels.data(), 0);
return 0;
}
Exercises
Zoom Animation: Modify
cto animate over timeJulia Set: Keep
cconstant and vary initialzSmooth Coloring: Use
n - log2(log2(dot(z,z)))for smoother gradients
Path Tracing Renderer
Now let’s build a more sophisticated renderer using ray tracing. We’ll implement a basic path tracer for the Cornell Box scene.
Final Result
A physically-based rendered image with global illumination.
Step-by-Step Implementation
Step 1: Scene Setup
We need geometry (triangles), materials, and acceleration structures:
#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <luisa/runtime/rtx/accel.h>
#include <luisa/runtime/rtx/mesh.h>
#include <stb/stb_image_write.h>
using namespace luisa;
using namespace luisa::compute;
// Cornell Box geometry data
static constexpr float3 vertices[] = {
// Floor
{0.0f, 0.0f, 0.0f}, {1.0f, 0.0f, 0.0f}, {1.0f, 0.0f, 1.0f}, {0.0f, 0.0f, 1.0f},
// Ceiling
{0.0f, 1.0f, 0.0f}, {1.0f, 1.0f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f, 1.0f},
// ... more vertices
};
static constexpr uint3 triangles[] = {
{0, 1, 2}, {0, 2, 3}, // Floor
// ... more triangles
};
Step 2: Create Acceleration Structure
// Create buffers
Buffer<float3> vertex_buffer = device.create_buffer<float3>(num_vertices);
Buffer<Triangle> triangle_buffer = device.create_buffer<Triangle>(num_triangles);
// Upload data
stream << vertex_buffer.copy_from(vertices)
<< triangle_buffer.copy_from(triangles)
<< synchronize();
// Create mesh and acceleration structure
Mesh mesh = device.create_mesh(vertex_buffer, triangle_buffer);
Accel accel = device.create_accel();
accel.emplace_back(mesh, make_float4x4(1.0f)); // Identity transform
stream << mesh.build() << accel.build() << synchronize();
Step 3: Helper Callables
// Linear congruential generator for random numbers
Callable lcg = [](UInt &state) noexcept {
constexpr uint a = 1664525u;
constexpr uint c = 1013904223u;
state = a * state + c;
return cast<float>(state) / cast<float>(std::numeric_limits<uint>::max());
};
// Cosine-weighted hemisphere sampling
Callable cosine_sample_hemisphere = [](Float2 u) noexcept {
Float r = sqrt(u.x);
Float theta = 2.0f * pi * u.y;
return make_float3(r * cos(theta), r * sin(theta), sqrt(1.0f - u.x));
};
// Build orthonormal basis from normal
Callable make_onb = [](Float3 normal) noexcept {
Float3 b1 = ite(abs(normal.y) < 0.999f,
normalize(cross(normal, make_float3(0.0f, 1.0f, 0.0f))),
make_float3(1.0f, 0.0f, 0.0f));
Float3 b2 = cross(normal, b1);
return make_float3x3(b1, b2, normal);
};
Step 4: The Path Tracing Kernel
Kernel2D path_tracer = [&](ImageFloat image, ImageUInt seed_image,
AccelVar accel, UInt frame_index) noexcept {
set_block_size(16u, 16u, 1u);
UInt2 coord = dispatch_id().xy();
Float2 resolution = make_float2(dispatch_size().xy());
// Initialize random seed
$if (frame_index == 0u) {
seed_image.write(coord, make_uint4(coord.x * 1234567u + coord.y));
};
UInt seed = seed_image.read(coord).x;
// Generate camera ray
Float2 uv = (make_float2(coord) + make_float2(lcg(seed), lcg(seed))) / resolution;
Float2 ndc = uv * 2.0f - 1.0f;
Float3 origin = make_float3(0.0f, 0.0f, 3.0f);
Float3 direction = normalize(make_float3(ndc.x, ndc.y, -1.0f));
// Path tracing loop
Float3 radiance = def(make_float3(0.0f));
Float3 throughput = def(make_float3(1.0f));
$for (bounce, 5u) { // Maximum 5 bounces
// Trace ray
Var<Ray> ray = make_ray(origin, direction);
Var<TriangleHit> hit = accel.intersect(ray, {});
$if (hit->miss()) { $break; };
// Get hit information
Float3 p0 = vertex_buffer->read(hit.prim * 3u + 0u);
Float3 p1 = vertex_buffer->read(hit.prim * 3u + 1u);
Float3 p2 = vertex_buffer->read(hit.prim * 3u + 2u);
Float3 p = hit->interpolate(p0, p1, p2);
Float3 n = normalize(cross(p1 - p0, p2 - p0));
// Simple diffuse material (white)
Float3 albedo = make_float3(0.8f);
// Add emission if hit light
$if (hit.prim >= light_triangle_start) {
radiance += throughput * make_float3(10.0f); // Emissive
$break;
};
// Sample next direction (cosine-weighted)
Float2 u = make_float2(lcg(seed), lcg(seed));
Float3 local_dir = cosine_sample_hemisphere(u);
Var<Float3x3> onb = make_onb(n);
Float3 new_dir = onb * local_dir;
// Update throughput
Float cos_theta = dot(new_dir, n);
throughput *= albedo * cos_theta * (1.0f / pi);
// Russian roulette
Float p_survive = max(throughput.x, max(throughput.y, throughput.z));
$if (lcg(seed) > p_survive) { $break; };
throughput *= 1.0f / p_survive;
// Setup next ray
origin = p + n * 1e-4f;
direction = new_dir;
};
// Accumulate over frames
Float4 prev = image.read(coord);
Float count = prev.w + 1.0f;
Float3 avg = lerp(prev.xyz(), radiance, 1.0f / count);
image.write(coord, make_float4(avg, count));
seed_image.write(coord, make_uint4(seed));
};
Step 5: Rendering Loop
// Create images
Image<float> framebuffer = device.create_image<float>(
PixelStorage::FLOAT4, resolution.x, resolution.y);
Image<uint> seed_image = device.create_image<uint>(
PixelStorage::INT1, resolution.x, resolution.y);
auto renderer = device.compile(path_tracer);
// Render multiple frames for accumulation
for (uint frame = 0; frame < 100; frame++) {
stream << renderer(framebuffer, seed_image, accel, frame).dispatch(resolution)
<< synchronize();
}
// Tonemap and save
Kernel2D tonemap = [&](ImageFloat hdr, ImageFloat ldr) noexcept {
UInt2 coord = dispatch_id().xy();
Float3 color = hdr.read(coord).xyz();
// Simple Reinhard tonemapping
color = color / (color + 1.0f);
// Gamma correction
color = pow(color, 1.0f / 2.2f);
ldr.write(coord, make_float4(color, 1.0f));
};
Image<float> ldr = device.create_image<float>(
PixelStorage::BYTE4, resolution.x, resolution.y);
stream << device.compile(tonemap)(framebuffer, ldr).dispatch(resolution)
<< synchronize();
// Save...
Key Concepts Explained
Path Tracing: Simulates light transport by tracing random paths from camera
Russian Roulette: Probabilistically terminates paths to avoid infinite recursion
Cosine Sampling: Importance sampling for diffuse surfaces
Accumulation: Multiple samples averaged for noise reduction
MPM Fluid Simulation
Material Point Method (MPM) is a hybrid Lagrangian-Eulerian method for simulating continua. We’ll implement a basic 3D fluid simulation.
Final Result
An interactive 3D fluid simulation with particles.
Step-by-Step Implementation
Step 1: Simulation Parameters
static constexpr int n_grid = 64; // Grid resolution
static constexpr uint n_particles = 50000; // Number of particles
static constexpr float dx = 1.0f / n_grid; // Grid spacing
static constexpr float dt = 8e-5f; // Time step
static constexpr float p_vol = (dx * 0.5f) * (dx * 0.5f) * (dx * 0.5f);
static constexpr float p_mass = 1.0f * p_vol;
static constexpr float E = 400.0f; // Young's modulus
static constexpr float gravity = 9.8f;
Step 2: Data Structures
// Particle data (Lagrangian)
Buffer<float3> x = device.create_buffer<float3>(n_particles); // Positions
Buffer<float3> v = device.create_buffer<float3>(n_particles); // Velocities
Buffer<float3x3> C = device.create_buffer<float3x3>(n_particles); // Affine momentum
Buffer<float> J = device.create_buffer<float>(n_particles); // Volume ratio
// Grid data (Eulerian)
Buffer<float4> grid = device.create_buffer<float4>(n_grid * n_grid * n_grid);
// xyz = momentum, w = mass
Step 3: Grid Clearing Kernel
auto clear_grid = device.compile<3>([&] {
set_block_size(8, 8, 1);
UInt idx = dispatch_id().x + dispatch_id().y * n_grid
+ dispatch_id().z * n_grid * n_grid;
grid->write(idx, make_float4(0.0f));
});
Step 4: Particle-to-Grid Transfer
This is the heart of MPM - transferring particle data to the grid:
auto particle_to_grid = device.compile<1>([&] {
set_block_size(64, 1, 1);
UInt p = dispatch_id().x;
// Particle position in grid coordinates
Float3 Xp = x->read(p) / dx;
Int3 base = make_int3(Xp - 0.5f);
Float3 fx = Xp - make_float3(base);
// Quadratic B-spline weights
auto w = [&](Float3 fx) {
return make_float3(
0.5f * (1.5f - fx) * (1.5f - fx),
0.75f - (fx - 1.0f) * (fx - 1.0f),
0.5f * (fx - 0.5f) * (fx - 0.5f)
);
};
Float3 wx = w(fx.x), wy = w(fx.y), wz = w(fx.z);
// Stress-based force (simplified for fluids)
Float stress = -4.0f * dt * E * p_vol * (J->read(p) - 1.0f) / (dx * dx);
Float3x3 affine = make_float3x3(stress, 0.0f, 0.0f,
0.0f, stress, 0.0f,
0.0f, 0.0f, stress);
Float3 vp = v->read(p);
// Scatter to 3x3x3 grid nodes
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
for (int k = 0; k < 3; k++) {
Float3 offset = make_float3(i, j, k);
Float3 dpos = (offset - fx) * dx;
Float weight = wx[i] * wy[j] * wz[k];
Float3 v_add = weight * (p_mass * vp + affine * dpos);
UInt idx = (base.x + i) + (base.y + j) * n_grid
+ (base.z + k) * n_grid * n_grid;
// Atomic add to grid
grid->atomic(idx).x.fetch_add(v_add.x);
grid->atomic(idx).y.fetch_add(v_add.y);
grid->atomic(idx).z.fetch_add(v_add.z);
grid->atomic(idx).w.fetch_add(weight * p_mass);
}
}
}
});
Step 5: Grid Velocity Update
auto grid_update = device.compile<3>([&] {
set_block_size(8, 8, 1);
Int3 coord = make_int3(dispatch_id().xyz());
UInt idx = coord.x + coord.y * n_grid + coord.z * n_grid * n_grid;
Float4 v_and_m = grid->read(idx);
Float3 v = v_and_m.xyz();
Float m = v_and_m.w;
// Normalize by mass
v = ite(m > 0.0f, v / m, v);
// Gravity
v.y -= dt * gravity;
// Boundary conditions
static constexpr int bound = 3;
v = ite((coord < bound && v < 0.0f) ||
(coord > n_grid - bound && v > 0.0f),
0.0f, v);
grid->write(idx, make_float4(v, m));
});
Step 6: Grid-to-Particle Transfer
auto grid_to_particle = device.compile<1>([&] {
set_block_size(64, 1, 1);
UInt p = dispatch_id().x;
Float3 Xp = x->read(p) / dx;
Int3 base = make_int3(Xp - 0.5f);
Float3 fx = Xp - make_float3(base);
// Same weights as P2G
// ... weight computation ...
Float3 new_v = def(make_float3(0.0f));
Float3x3 new_C = def(make_float3x3(0.0f));
// Gather from grid
for (int i = 0; i < 3; i++) {
for (int j = 0; j < 3; j++) {
for (int k = 0; k < 3; k++) {
Float3 offset = make_float3(i, j, k);
Float3 dpos = (offset - fx) * dx;
Float weight = wx[i] * wy[j] * wz[k];
UInt idx = (base.x + i) + (base.y + j) * n_grid
+ (base.z + k) * n_grid * n_grid;
Float3 g_v = grid->read(idx).xyz();
new_v += weight * g_v;
new_C += 4.0f * weight * outer_product(g_v, dpos) / (dx * dx);
}
}
}
// Update particle state
v->write(p, new_v);
x->write(p, x->read(p) + new_v * dt);
J->write(p, J->read(p) * (1.0f + dt * trace(new_C)));
C->write(p, new_C);
});
Step 7: Simulation Loop
// Initialize particles
luisa::vector<float3> x_init(n_particles);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<float> dis(0.2f, 0.8f);
for (uint i = 0; i < n_particles; i++) {
x_init[i] = make_float3(dis(gen), dis(gen), dis(gen));
}
stream << x.copy_from(x_init.data())
<< v.copy_from(luisa::vector<float3>(n_particles, make_float3(0.0f)).data())
<< J.copy_from(luisa::vector<float>(n_particles, 1.0f).data())
<< synchronize();
// Simulation loop
Window window{"MPM Fluid", 1024, 1024};
Swapchain swapchain = device.create_swapchain(stream, {
.display = window.native_display(),
.window = window.native_handle(),
.size = make_uint2(1024u, 1024u),
});
Image<float> display = device.create_image<float>(
swapchain.backend_storage(), make_uint2(1024u, 1024u));
while (!window.should_close()) {
CommandList cmd;
// MPM sub-steps
for (int i = 0; i < 25; i++) {
cmd << clear_grid().dispatch(n_grid, n_grid, n_grid)
<< particle_to_grid().dispatch(n_particles)
<< grid_update().dispatch(n_grid, n_grid, n_grid)
<< grid_to_particle().dispatch(n_particles);
}
// Render particles to display
cmd << clear_display().dispatch(1024, 1024)
<< draw_particles().dispatch(n_particles)
<< swapchain.present(display);
stream << cmd.commit();
window.poll_events();
}
Key Concepts Explained
Lagrangian vs Eulerian: Particles track material; grid handles physics
APIC (Affine Particle-in-Cell): Uses affine momentum matrix for better accuracy
Transfer: P2G (particle to grid) and G2P (grid to particle) are the core operations
MLS-MPM: Moving Least Squares MPM for stability
Conway’s Game of Life
Cellular automata are fascinating discrete systems. Let’s implement Conway’s Game of Life, a classic example that demonstrates ping-pong buffering - a technique where we alternate between two buffers to avoid read-write conflicts.
Final Result
An interactive simulation where cells evolve based on simple rules, creating complex emergent patterns.
Step-by-Step Implementation
Step 1: Understanding the Rules
Conway’s Game of Life follows three simple rules:
Survival: Live cells with 2-3 neighbors survive
Birth: Dead cells with exactly 3 neighbors become alive
Death: All other cells die or stay dead
Step 2: The ImagePair Structure
We’ll use two images that swap each frame:
struct ImagePair {
Image<uint> prev;
Image<uint> curr;
ImagePair(Device &device, PixelStorage storage, uint width, uint height) noexcept
: prev{device.create_image<uint>(storage, width, height)},
curr{device.create_image<uint>(storage, width, height)} {}
void swap() noexcept { std::swap(prev, curr); }
};
Step 3: The Update Kernel
// Helper to read cell state
Callable read_state = [](ImageUInt prev, UInt2 uv) noexcept {
return prev.read(uv).x == 255u;
};
Kernel2D kernel = [&](ImageUInt prev, ImageUInt curr) noexcept {
set_block_size(16, 16, 1);
UInt count = def(0u);
UInt2 uv = dispatch_id().xy();
UInt2 size = dispatch_size().xy();
Bool state = read_state(prev, uv);
Int2 p = make_int2(uv);
// Check all 8 neighbors with toroidal wrapping
for (int dy = -1; dy <= 1; dy++) {
for (int dx = -1; dx <= 1; dx++) {
if (dx != 0 || dy != 0) {
Int2 q = p + make_int2(dx, dy) + make_int2(size);
Bool neighbor = read_state(prev, make_uint2(q) % size);
count += ite(neighbor, 1, 0);
}
}
}
// Apply Conway's rules
Bool c0 = count == 2u;
Bool c1 = count == 3u;
curr.write(uv, make_uint4(make_uint3(ite((state & c0) | c1, 255u, 0u)), 255u));
};
Step 4: Display Kernel
We upscale the low-res simulation for better visibility:
Kernel2D display_kernel = [&](ImageUInt in_tex, ImageFloat out_tex) noexcept {
set_block_size(16, 16, 1);
UInt2 uv = dispatch_id().xy();
UInt2 coord = uv / 4u; // 4x upscaling
UInt4 value = in_tex.read(coord);
out_tex.write(uv, make_float4(value) / 255.0f);
};
Step 5: Initialization and Main Loop
// Initialize with random state (25% alive)
luisa::vector<uint> host_image(width * height);
for (auto &v : host_image) {
auto x = (rng() % 4u == 0u) * 255u;
v = x * 0x00010101u | 0xff000000u; // White or black
}
stream << image_pair.prev.copy_from(host_image.data()) << synchronize();
// Main loop
while (!window.should_close()) {
stream << shader(image_pair.prev, image_pair.curr).dispatch(width, height)
<< display_shader(image_pair.curr, display).dispatch(width * 4u, height * 4u)
<< swap_chain.present(display);
image_pair.swap(); // Ping-pong: swap buffers
std::this_thread::sleep_for(std::chrono::milliseconds{30});
window.poll_events();
}
Key Concepts
Ping-Pong Buffering: We read from
prevand write tocurr, then swapToroidal Boundary:
% sizemakes the grid wrap around (edges connect)Neighborhood: The 8 surrounding cells determine each cell’s fate
Exercises
Different Rules: Try HighLife (born with 3 or 6 neighbors)
Pattern Library: Implement a pattern loader for gliders, blinkers, etc.
Multi-State: Add fading effects for recently-dead cells
Wave Equation Simulation
Simulate realistic water ripples using the finite difference method to solve the 2D wave equation. This demonstrates PDE solving and interactive GPU computing with beautiful visual results.
Final Result
An interactive water simulation where you can click and drag to create waves that propagate realistically, interfere with each other, and gradually dampen. The rendering includes caustics effects and foam at wave crests.
The Physics
The wave equation: ∂²u/∂t² = c²(∂²u/∂x² + ∂²u/∂y²)
Discretized form: u_next = 2*u_curr - u_prev + c²*dt²*laplacian(u)
For stability, the CFL condition requires: c² * dt² < 0.5
Step-by-Step Implementation
Step 1: Setup Three Buffers
We need three height buffers for time integration (past, present, future):
Image<float> height_prev = device.create_image<float>(PixelStorage::FLOAT1, width, height);
Image<float> height_curr = device.create_image<float>(PixelStorage::FLOAT1, width, height);
Image<float> height_next = device.create_image<float>(PixelStorage::FLOAT1, width, height);
Step 2: The Wave Solver Kernel
Kernel2D wave_step = [&](ImageFloat prev, ImageFloat curr, ImageFloat next) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var size = dispatch_size().xy();
Var u_curr = curr.read(uv).x;
// Sample neighbors with clamping (not wrapping)
auto sample = [&](Int2 offset) noexcept {
Int2 p = make_int2(uv) + offset;
p.x = clamp(p.x, 0, cast<int>(size.x) - 1);
p.y = clamp(p.y, 0, cast<int>(size.y) - 1);
return curr.read(make_uint2(p)).x;
};
Var u_left = sample(make_int2(-1, 0));
Var u_right = sample(make_int2(1, 0));
Var u_up = sample(make_int2(0, -1));
Var u_down = sample(make_int2(0, 1));
// Laplacian: sum of neighbors - 4*center
Var laplacian = u_left + u_right + u_up + u_down - 4.0f * u_curr;
// Read previous height
Var u_prev = prev.read(uv).x;
// Wave equation integration
Var u_next = 2.0f * u_curr - u_prev + (c * c * dt * dt) * laplacian;
// Apply damping for energy loss
u_next *= damping;
u_next = clamp(u_next, -2.0f, 2.0f);
next.write(uv, make_float4(u_next, 0.0f, 0.0f, 1.0f));
};
Step 3: Interactive Droplet Drop
// Setup mouse callbacks
window.set_mouse_callback([&mouse_down, &mouse_pos](MouseButton, Action a, float2 p) noexcept {
if (a == Action::ACTION_PRESSED) {
mouse_down = true;
} else if (a == Action::ACTION_RELEASED) {
mouse_down = false;
}
mouse_pos = p;
});
// Droplet kernel - creates Gaussian ripple
Kernel2D drop_droplet = [](ImageFloat height, UInt2 center, Float strength) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var dx = cast<float>(cast<int>(uv.x) - cast<int>(center.x));
Var dy = cast<float>(cast<int>(uv.y) - cast<int>(center.y));
Var dist_sq = dx * dx + dy * dy;
Var radius = 15.0f;
Var gaussian = exp(-dist_sq / (radius * radius * 0.5f));
Var current = height.read(uv).x;
height.write(uv, make_float4(current + strength * gaussian, 0.0f, 0.0f, 1.0f));
};
Step 4: Rendering with Caustics Effect
Kernel2D render_kernel = [](ImageFloat height, ImageFloat output, Float time) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var size = dispatch_size().xy();
Var h = height.read(uv).x;
// Sample neighbors for slope (caustics effect)
Var h_left = sample(make_int2(-2, 0));
Var h_right = sample(make_int2(2, 0));
Var h_up = sample(make_int2(0, -2));
Var h_down = sample(make_int2(0, 2));
// Slope magnitude creates caustic highlights
Var slope_x = abs(h_right - h_left);
Var slope_y = abs(h_down - h_up);
Var slope = sqrt(slope_x * slope_x + slope_y * slope_y);
// Water color palette
Var deep_color = make_float3(0.0f, 0.1f, 0.3f);
Var shallow_color = make_float3(0.0f, 0.4f, 0.6f);
Var foam_color = make_float3(0.9f, 0.95f, 1.0f);
Var highlight_color = make_float3(0.6f, 0.8f, 1.0f);
// Base color based on wave height
Var t = clamp(h * 0.5f + 0.5f, 0.0f, 1.0f);
Var base_color = lerp(deep_color, shallow_color, t);
// Add caustics based on slope
Var caustics = smoothstep(0.1f, 0.5f, slope);
base_color = lerp(base_color, highlight_color, caustics * 0.5f);
// Foam at wave peaks
Var foam = smoothstep(0.5f, 1.0f, h);
base_color = lerp(base_color, foam_color, foam * 0.7f);
output.write(uv, make_float4(base_color, 1.0f));
};
Step 5: Main Simulation Loop
while (!window.should_close()) {
// Handle mouse interaction
if (mouse_down) {
uint sim_x = clamp(cast<uint>(mouse_pos.x / scale), 0u, width - 1);
uint sim_y = clamp(cast<uint>(mouse_pos.y / scale), 0u, height - 1);
// Interpolate for smooth trails when dragging
stream << drop_shader(height_curr, make_uint2(sim_x, sim_y), -0.5f)
.dispatch(width, height);
}
// Update wave simulation (multiple steps per frame)
for (int step = 0; step < 4; step++) {
stream << wave_shader(height_prev, height_curr, height_next).dispatch(width, height);
std::swap(height_prev, height_curr);
std::swap(height_curr, height_next);
}
// Render
stream << render(height_curr, display, time).dispatch(width, height)
<< swap_chain.present(display);
}
Key Concepts
Finite Differences: Approximate spatial derivatives using neighbor samples
Time Integration: Three-buffer scheme for second-order time integration (Verlet-style)
Stability: CFL condition requires careful parameter tuning
Caustics: Surface slope creates realistic water shimmer
Tips for Better Results
Parameters: Try
c = 0.2f,dt = 1.0f,damping = 0.995ffor nice propagationMultiple Steps: Run 4 simulation steps per frame for faster wave propagation
Smooth Trails: Interpolate droplet positions when dragging mouse
Clamped Boundaries: Edges act as walls (wave reflection) rather than wrapping
Exercises
Variable Depth: Make wave speed vary by location for refraction effects
Obstacles: Add static obstacles that block/reflect waves
Rain: Add periodic random droplets for ambient activity
Underwater: Add objects beneath the surface with distortion
Image Processing Pipeline
Build a multi-stage image processing pipeline demonstrating separable filters, convolution, and multi-pass rendering.
Final Result
A real-time image processing demo showing Gaussian blur, Sobel edge detection, and creative compositing.
Step-by-Step Implementation
Step 1: Generate Test Pattern
Kernel2D generate_pattern = [](ImageFloat image) noexcept {
set_block_size(16, 16, 1);
Var uv = make_float2(dispatch_id().xy()) / make_float2(dispatch_size().xy());
// Checkerboard
Var checker = sin(uv.x * 20.0f) * sin(uv.y * 20.0f);
// Concentric circles
Var center = uv - make_float2(0.5f);
Var radius = length(center);
Var circles = sin(radius * 50.0f);
// Combine patterns
Var pattern = checker * 0.3f + circles * 0.4f;
// Add "face-like" features
Var eye1 = exp(-length(uv - make_float2(0.35f, 0.4f)) * 50.0f);
Var eye2 = exp(-length(uv - make_float2(0.65f, 0.4f)) * 50.0f);
Var mouth = exp(-pow(uv.y - 0.6f - 0.1f * pow(uv.x - 0.5f, 2.0f), 2.0f) * 200.0f);
Var final_val = pattern * 0.5f + 0.5f + eye1 * 0.3f + eye2 * 0.3f + mouth * 0.2f;
image.write(dispatch_id().xy(), make_float4(make_float3(final_val), 1.0f));
};
Step 2: Separable Gaussian Blur
Instead of a 2D convolution (81 samples), we do two 1D passes (18 samples total):
// Horizontal pass
Kernel2D gaussian_blur_h = [](ImageFloat input, ImageFloat output) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var size = dispatch_size().xy();
// Gaussian weights (sigma=2.0)
Var weights = make_float9(0.006f, 0.028f, 0.084f, 0.168f, 0.224f,
0.168f, 0.084f, 0.028f, 0.006f);
Var offsets = make_int9(-4, -3, -2, -1, 0, 1, 2, 3, 4);
Var sum = make_float3(0.0f);
for (uint i = 0u; i < 9u; i++) {
Var sample_uv = make_uint2(
clamp(cast<int>(uv.x) + offsets[i], 0, cast<int>(size.x) - 1),
uv.y);
sum += input.read(sample_uv).xyz() * weights[i];
}
output.write(uv, make_float4(sum, 1.0f));
};
// Vertical pass (similar structure)
Kernel2D gaussian_blur_v = [](ImageFloat input, ImageFloat output) noexcept { ... };
Step 3: Sobel Edge Detection
Kernel2D sobel_edge = [](ImageFloat input, ImageFloat output) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var size = dispatch_size().xy();
auto sample = [&](Int2 offset) noexcept {
Var sample_uv = make_uint2(
clamp(cast<int>(uv.x) + offset.x, 0, cast<int>(size.x) - 1),
clamp(cast<int>(uv.y) + offset.y, 0, cast<int>(size.y) - 1));
return input.read(sample_uv).x; // Luminance
};
// Sobel X kernel: [-1 0 1; -2 0 2; -1 0 1]
Var gx = sample(make_int2(-1, -1)) * -1.0f +
sample(make_int2(1, -1)) * 1.0f +
sample(make_int2(-1, 0)) * -2.0f +
sample(make_int2(1, 0)) * 2.0f +
sample(make_int2(-1, 1)) * -1.0f +
sample(make_int2(1, 1)) * 1.0f;
// Sobel Y kernel: [-1 -2 -1; 0 0 0; 1 2 1]
Var gy = sample(make_int2(-1, -1)) * -1.0f +
sample(make_int2(0, -1)) * -2.0f +
sample(make_int2(1, -1)) * -1.0f +
sample(make_int2(-1, 1)) * 1.0f +
sample(make_int2(0, 1)) * 2.0f +
sample(make_int2(1, 1)) * 1.0f;
// Gradient magnitude and direction
Var magnitude = sqrt(gx * gx + gy * gy);
Var angle = atan2(gy, gx) / 3.14159f * 0.5f + 0.5f;
output.write(uv, make_float4(magnitude, angle, 0.0f, 1.0f));
};
Step 4: Creative Compositing
Kernel2D composite = [](ImageFloat original, ImageFloat blurred,
ImageFloat edges, ImageFloat output, Float edge_intensity) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var orig = original.read(uv);
Var blur = blurred.read(uv);
Var edge = edges.read(uv);
// Unsharp mask sharpening
Var sharpen_amount = 1.5f;
Var sharpened = orig.xyz() + (orig.xyz() - blur.xyz()) * sharpen_amount;
sharpened = clamp(sharpened, 0.0f, 1.0f);
// Edge overlay with cyan color
Var edge_color = make_float3(0.0f, 0.8f, 1.0f) * edge.x * edge_intensity;
Var final_color = lerp(sharpened, edge_color, edge.x * 0.5f);
// Vignette effect
Var coord = make_float2(uv) / make_float2(dispatch_size().xy());
Var center_dist = length(coord - make_float2(0.5f));
Var vignette = 1.0f - center_dist * 0.5f;
output.write(uv, make_float4(final_color * vignette, 1.0f));
};
Step 5: Pipeline Execution
// Create intermediate images
Image<float> source_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> temp_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> blurred_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> edge_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
// Generate source
stream << pattern_shader(source_image).dispatch(width, height);
// Run pipeline once
stream << blur_h_shader(source_image, temp_image).dispatch(width, height)
<< blur_v_shader(temp_image, blurred_image).dispatch(width, height)
<< sobel_shader(source_image, edge_image).dispatch(width, height);
// Animate in main loop
while (!window.should_close()) {
float edge_intensity = (sinf(time) + 1.0f) * 0.5f; // Pulsing effect
stream << composite_shader(source_image, blurred_image, edge_image,
display, edge_intensity).dispatch(width, height)
<< swap_chain.present(display);
}
Key Concepts
Separable Filters: Decompose 2D convolution into two 1D passes for efficiency
Multi-Pass Rendering: Chain multiple kernels for complex effects
Unsharp Mask: Sharpening using
original + (original - blurred) * amount
Exercises
Bloom Effect: Add a bright-pass filter before blur
Custom Kernels: Implement emboss or motion blur filters
Live Input: Use camera input instead of procedural pattern
N-Body Gravitational Simulation
Simulate thousands of stars interacting through gravity. This demonstrates particle systems, tile-based optimization, and 3D rendering on the GPU.
Final Result
A rotating galaxy simulation with thousands of gravitationally interacting particles.
Step-by-Step Implementation
Step 1: Particle Structure
struct Particle {
float3 position;
float3 velocity;
float mass;
float pad[3]; // Alignment padding
};
LUISA_STRUCT(Particle, position, velocity, mass, pad) {};
Step 2: Galaxy Initialization
// Initialize in a disk-like galaxy configuration
luisa::vector<Particle> host_particles(n_particles);
for (uint i = 0u; i < n_particles; i++) {
float radius = dist_radius(rng);
float angle = dist_angle(rng);
float height = (rng() / float(UINT32_MAX) - 0.5f) * 0.1f;
// Position in disk
float3 pos{
radius * cosf(angle),
height,
radius * sinf(angle)};
// Tangential velocity for orbital motion
float orbital_speed = sqrtf(G * 1000.0f / radius);
float3 vel{
-orbital_speed * sinf(angle),
0.0f,
orbital_speed * cosf(angle)};
host_particles[i] = Particle{
.position = pos,
.velocity = vel,
.mass = dist_mass(rng),
.pad = {0.0f, 0.0f, 0.0f}};
}
stream << particles_read.copy_from(host_particles.data()) << synchronize();
Step 3: N-Body Physics Kernel
Kernel1D nbody_kernel = [&](BufferVar<Particle> read_buf,
BufferVar<Particle> write_buf) noexcept {
set_block_size(tile_size);
Var idx = dispatch_id().x;
Var p = read_buf.read(idx);
// Accumulate gravitational forces
Var force = make_float3(0.0f);
// Tile loop - process particles in chunks
for (uint tile = 0u; tile < n_particles; tile += tile_size) {
for (uint j = 0u; j < tile_size; j++) {
Var other_idx = tile + j;
// Skip self-interaction to avoid infinity
if_(other_idx != idx, [&] {
Var other = read_buf.read(other_idx);
Var r = other.position - p.position;
Var dist_sq = dot(r, r) + softening * softening;
Var dist = sqrt(dist_sq);
Var f = G * p.mass * other.mass / dist_sq;
force += f * r / dist; // F = G*m1*m2/r² * direction
});
}
}
// Euler integration
Var new_vel = p.velocity + force / p.mass * dt;
Var new_pos = p.position + new_vel * dt;
write_buf.write(idx, Particle{
.position = new_pos,
.velocity = new_vel,
.mass = p.mass,
.pad = {0.0f, 0.0f, 0.0f}});
};
Step 4: 3D Rendering Kernel
Kernel2D render_kernel = [&](BufferVar<Particle> particles,
ImageFloat image, Float rot_x, Float rot_y) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
Var size = dispatch_size().xy();
// Clear background
image.write(uv, make_float4(0.02f, 0.02f, 0.05f, 1.0f));
Var color = make_float3(0.0f);
// Project particles
for (uint i = 0u; i < n_particles; i += 64u) {
Var p = particles.read(i);
// Rotation around Y axis
Var cos_y = cos(rot_y);
Var sin_y = sin(rot_y);
Var x1 = p.position.x * cos_y - p.position.z * sin_y;
Var z1 = p.position.x * sin_y + p.position.z * cos_y;
Var y1 = p.position.y;
// Rotation around X axis
Var cos_x = cos(rot_x);
Var sin_x = sin(rot_x);
Var y2 = y1 * cos_x - z1 * sin_x;
Var z2 = y1 * sin_x + z1 * cos_x;
// Perspective projection
Var distance = 5.0f + z2;
Var scale = 2.0f / distance;
Var screen_x = (x1 * scale + 1.0f) * 0.5f * cast<float>(size.x);
Var screen_y = (y2 * scale + 1.0f) * 0.5f * cast<float>(size.y);
// Gaussian glow
Var dx = cast<float>(uv.x) - screen_x;
Var dy = cast<float>(uv.y) - screen_y;
Var d_sq = dx * dx + dy * dy;
Var intensity = exp(-d_sq / (50.0f * scale));
// Color based on particle index
Var particle_color = make_float3(
0.8f + 0.2f * sin(cast<float>(i)),
0.6f + 0.3f * cos(cast<float>(i) * 1.3f),
1.0f);
color += particle_color * intensity * 0.1f;
}
// Motion blur effect
Var old = image.read(uv).xyz();
Var final_color = lerp(old, min(color, 1.0f), 0.3f);
image.write(uv, make_float4(final_color, 1.0f));
};
Step 5: Main Loop
while (!window.should_close()) {
// Auto-rotate
rot_y += 0.002f;
// Physics update (double buffering)
stream << nbody_shader(particles_read, particles_write).dispatch(n_particles);
std::swap(particles_read, particles_write);
// Render every few frames
if (frame % 3u == 0u) {
stream << render_shader(particles_read, display, rot_x, rot_y)
.dispatch(width, height)
<< swap_chain.present(display);
}
frame++;
}
Key Concepts
Softening: Add small epsilon to prevent division by zero at close range
Tile-Based Processing: Process in chunks for cache efficiency
Double Buffering: Read from one buffer, write to another, then swap
Gaussian Splatting: Render particles as soft glows rather than points
Exercises
Barnes-Hut: Implement O(n log n) hierarchical approximation
Collision Detection: Add particle collisions
Different Forces: Try Coulomb (electrostatic) or Lennard-Jones (molecular)
Fire Particle System
Create a mesmerizing fire simulation using 65,000 GPU particles with physics-based motion, procedural turbulence, and temperature-driven color gradients. Watch as flames dance and flicker with realistic fluid-like behavior.
Final Result
A captivating campfire-like effect with:
White-hot cores at the base where particles are born
Yellow-orange flames rising and swirling upward
Red embers cooling as they float higher
Procedural turbulence creating realistic flickering motion
Interactive wind (press SPACE) that bends the flames sideways
The particles cycle through their lifecycle endlessly—born hot and bright, rising with buoyancy, cooling as they age, fading to smoke, then respawning to keep the fire alive.
Key Concepts
Particle Lifecycle Management: Each particle tracks its own state:
struct FireParticle {
float3 position; // Position in 3D space
float lifetime; // Seconds remaining before respawn
float3 velocity; // Current movement direction
float temperature; // 1.0 = white hot, 0.0 = cold
float size; // Particle radius for rendering
};
This structure enables individual particle behavior while the GPU processes thousands in parallel.
struct FireParticle {
float3 position;
float lifetime;
float3 velocity;
float temperature;
float size;
};
Procedural Turbulence: We use a simple 3D noise function to create realistic turbulent motion:
Callable noise3d = [](Float3 p) noexcept {
Var i = floor(p);
Var f = fract(p);
f = f * f * (3.0f - 2.0f * f);
// ... trilinear interpolation
};
Temperature-Based Coloring: The color gradient follows real fire physics:
White: Hottest (1.0)
Yellow: Hot (0.75-1.0)
Orange: Warm (0.5-0.75)
Red: Cool (0.25-0.5)
Black: Cold (< 0.25)
Step-by-Step Implementation
Step 1: Particle Update Kernel
Kernel1D update_kernel = [&](BufferVar<FireParticle> particles,
Float time, Float wind) noexcept {
set_block_size(256);
Var idx = dispatch_id().x;
Var p = particles.read(idx);
$if (p.lifetime > 0.0f) {
// Apply gravity (negative = buoyancy)
p.velocity.y += gravity * dt;
// Add turbulence
Var noise_pos = p.position * 2.0f + time;
p.velocity.x += (noise3d(noise_pos) - 0.5f) * 2.0f * dt;
// Wind effect
p.velocity.x += wind * dt;
// Update position
p.position += p.velocity * dt;
// Cool down
p.temperature -= dt * 0.3f;
p.lifetime -= dt;
}
$else {
// Respawn with random properties
Var seed = idx + cast<uint>(time * 1000.0f);
// ... spawn new particle
};
particles.write(idx, p);
};
Step 2: Rendering with Additive Blending
Kernel2D render_kernel = [&](BufferVar<FireParticle> particles,
ImageFloat image) noexcept {
// For each pixel, accumulate contributions from all particles
Var color = make_float3(0.0f);
for (uint i = 0u; i < n_particles; i += 256u) {
Var p = particles.read(i);
// Project to screen space
Var screen_pos = project(p.position);
Var dist = length(pixel_pos - screen_pos);
// Gaussian intensity falloff
Var intensity = exp(-dist_sq / (p.size * size_scale));
// Temperature-based color
Var fire_color = get_fire_color(p.temperature);
color += fire_color * intensity;
}
image.write(uv, make_float4(color, 1.0f));
};
Exercises
Smoke: Add a smoke trail that rises from cooling particles
Sparks: Add smaller, faster particles that shoot upward occasionally
Interactive: Make the fire react to mouse position as a wind source
Reaction-Diffusion Simulation
Watch mathematical magic unfold as beautiful organic patterns emerge from simple chemical rules. The Gray-Scott model simulates how two chemicals interact to create mesmerizing textures resembling coral reefs, zebra stripes, fingerprints, and more.
What You’ll See
Starting from a tiny seed in the center, watch as intricate patterns grow and evolve across the screen:
Coral Growth: Branching, tree-like structures spreading outward
Fingerprint Patterns: Meandering, maze-like lines
Leopard Spots: Isolated dots that grow and stabilize
Stripes: Parallel waves that propagate across the field
Press keys 1-4 to switch between pattern types in real-time and observe how the same equations produce wildly different results!
The Gray-Scott Model
Two virtual chemicals dance on the grid:
U (inhibitor, shown as blue): The “food” that prevents reaction
V (activator, shown as red/yellow): The “predator” that consumes U and spreads
The mathematical rules that create life-like patterns:
du/dt = Du * ∇²u - u*v² + F*(1-u) // U diffuses and gets consumed
dv/dt = Dv * ∇²v + u*v² - (F+k)*v // V diffuses, grows, and dies
Where:
Du, Dv: How fast each chemical spreads (U spreads faster than V)
F (Feed): Rate that fresh U is added
k (Kill): Rate that V is removed
u*v²: The reaction where V consumes U
Pattern Types
By adjusting feed rate (F) and kill rate (k), we get different patterns:
Pattern |
F |
k |
Description |
|---|---|---|---|
Coral |
0.035 |
0.06 |
Branching coral-like growth |
Fingerprint |
0.037 |
0.06 |
Meandering lines |
Spots |
0.03 |
0.062 |
Isolated spots |
Stripes |
0.03 |
0.054 |
Parallel stripes |
Step-by-Step Implementation
Step 1: Chemical State Storage
// Two buffers for ping-pong rendering
Image<float> u_prev, u_curr; // Inhibitor
Image<float> v_prev, v_curr; // Activator
Step 2: Reaction-Diffusion Kernel
Kernel2D rd_step = [&](ImageFloat u_in, ImageFloat v_in,
ImageFloat u_out, ImageFloat v_out,
Float F, Float k, Float Du, Float Dv) noexcept {
set_block_size(16, 16, 1);
Var uv = dispatch_id().xy();
// Sample current state
Var u = u_in.read(uv).x;
Var v = v_in.read(uv).x;
// Compute Laplacian (5-point stencil)
Var u_lap = (sample(u_in, uv, {-1,0}) + sample(u_in, uv, {1,0}) +
sample(u_in, uv, {0,-1}) + sample(u_in, uv, {0,1}) +
u) * 0.2f - u;
// Reaction term
Var reaction = u * v * v;
// Update
Var u_new = u + dt * (Du * u_lap - reaction + F * (1.0f - u));
Var v_new = v + dt * (Dv * v_lap + reaction - (F + k) * v);
u_out.write(uv, make_float4(u_new, 0, 0, 1));
v_out.write(uv, make_float4(v_new, 0, 0, 1));
};
Step 3: Visualization
Kernel2D visualize = [&](ImageFloat u, ImageFloat v, ImageFloat output) {
Var uv = dispatch_id().xy();
Var u_val = u.read(uv).x;
Var v_val = v.read(uv).x;
// U = blue, V = red/yellow
Var color = make_float3(v_val * 1.5f, // Red
v_val * u_val * 0.5f, // Green
u_val * 0.8f); // Blue
output.write(uv, make_float4(color, 1.0f));
};
Key Concepts
Reaction: V consumes U when they meet (u*v² term)
Diffusion: U spreads faster than V (Du > Dv)
Feed/Kill: Constant injection of U, removal of V
Instability: Small perturbations grow into patterns
Exercises
Multiscale: Use different F/k values in different regions
3D Extension: Extend to 3D and render with volume ray marching
Interactive Brush: Paint V values with mouse to guide pattern growth
Voxel Ray Tracer
Explore a procedurally generated 3D voxel world in real-time! This example renders a Minecraft-style landscape with terrain, trees, and floating islands using ray marching through a voxel grid. Use arrow keys to rotate the view and W/S to zoom.
What You’ll See
A charming voxel scene featuring:
Rolling Hills: Procedurally generated terrain with sine-wave elevation
Green Grass Tops: Different block types for grass, dirt, and stone
Trees: Scattered throughout with wood trunks and leafy canopies
Floating Magic Orb: A mysterious purple sphere hovering above the landscape
Interactive Camera: Rotate with arrow keys, zoom with W/S
How Ray Voxel Traversal Works
Unlike traditional polygon rendering, we trace rays through a 3D grid:
Ray-Box Intersection: Calculate where the camera ray enters the voxel grid bounding box
DDA Traversal: Use the Digital Differential Analyzer algorithm to step through voxels in order along the ray, similar to Bresenham’s line algorithm in 3D
Hit Detection: At each voxel, check if it’s solid. If yes, render its color. If no, continue stepping.
This technique, used in games like Minecraft and Teardown, allows rendering millions of voxels efficiently without storing explicit geometry.
Step-by-Step Implementation
Step 1: Voxel Grid Storage
static constexpr uint grid_size = 64;
Buffer<uint> voxel_grid = device.create_buffer<uint>(
grid_size * grid_size * grid_size);
// Voxel types: 0=empty, 1=dirt, 2=grass, 3=stone, etc.
Step 2: Ray-Box Intersection
Callable intersect_box = [](Float3 ray_origin, Float3 ray_dir,
Float3 box_min, Float3 box_max) {
// Slab method for AABB intersection
Var tmin = (box_min - ray_origin) / ray_dir;
Var tmax = (box_max - ray_origin) / ray_dir;
// ... swap if necessary
return make_float2(entry_t, exit_t);
};
Step 3: DDA Traversal
Callable trace_ray = [](Float3 origin, Float3 dir, BufferVar<uint> voxels) {
// Find entry point
Var box_hit = intersect_box(origin, dir, grid_min, grid_max);
$if (box_hit.x < 0.0f) { return miss_color; };
// Initialize DDA
Var t = box_hit.x;
Var vx = floor(origin.x + dir.x * t);
// ... calculate step direction and initial distances
// Traverse
$for (step, max_steps) {
Var voxel = read_voxel(voxels, vx, vy, vz);
$if (voxel > 0u) {
// Hit! Return voxel color
return get_voxel_color(voxel);
};
// Step to next voxel
$if (next_tx < next_ty & next_tx < next_tz) {
vx += step_x;
next_tx += delta_tx;
}
// ... similar for y, z
};
return sky_color; // Missed everything
};
Step 4: Camera and Rendering
Kernel2D render = [&](ImageFloat image, Float3 cam_pos,
Float2 cam_rot, BufferVar<uint> voxels) {
Var uv = dispatch_id().xy();
Var ndc = (make_float2(uv) / resolution) * 2.0f - 1.0f;
// Generate ray direction from camera
Var ray_dir = normalize(make_float3(
ndc.x * aspect * fov,
-ndc.y * fov,
-1.0f
));
// Apply camera rotation
ray_dir = rotate_y(ray_dir, cam_rot.x);
ray_dir = rotate_x(ray_dir, cam_rot.y);
// Trace and write color
Var color = trace_ray(cam_pos, ray_dir, voxels);
image.write(uv, make_float4(color, 1.0f));
};
Exercises
Lighting: Add directional lighting with voxel self-shadowing
Materials: Add emission for some voxel types (glowing blocks)
Destruction: Add ability to remove voxels with mouse clicks
LOD: Implement level-of-detail for distant chunks
Black Hole Renderer
Experience the mind-bending visual effects of Einstein’s theory of general relativity! This example renders a Schwarzschild black hole with realistic gravitational lensing, an accretion disk, and relativistic Doppler effects—just like in the movie Interstellar.
What You’ll See
The Event Horizon: A perfectly black sphere where light cannot escape
Gravitational Lensing: Stars behind the black hole bend around it, creating a distorted “Einstein ring”
The Photon Sphere: A thin bright ring just outside the event horizon where light orbits the black hole
Accretion Disk: A swirling disk of superheated matter orbiting the black hole
Doppler Beaming: The side of the disk rotating toward you appears brighter (blueshifted), while the receding side appears dimmer (redshifted)
Left Mouse Drag: Orbit around the black hole
Right Mouse Drag: Roll/rotate the view (rotate around viewing direction)
Scroll Wheel or +/-: Zoom in/out
Watch how the gravitational lensing distorts the background starfield!
The Physics
Gravitational Lensing: Massive objects bend spacetime, causing light to curve. The bending angle follows:
acceleration = -1.5 * GM / r³ * position
(This is 2x Newtonian gravity as predicted by general relativity)
Photon Sphere: At r = 1.5 × Rs (Schwarzschild radius), light can orbit the black hole in unstable circular paths.
Accretion Disk Physics:
Temperature: Hotter near the center (~T ∝ r^(-3/4))
Doppler Effect: Approaching matter appears brighter/bluer
Gravitational Redshift: Light loses energy climbing out of the gravity well
Step-by-Step Implementation
Step 1: Ray Setup with Camera Transformation
// Camera position in orbit around black hole
Var cam_pos = make_float3(
distance * sin(rot_y) * cos(rot_x),
distance * sin(rot_x),
distance * cos(rot_y) * cos(rot_x)
);
// Ray direction through each pixel
Var ray_dir = normalize(forward + ndc.x * right * fov + ndc.y * up * fov);
Step 2: Gravitational Ray Marching
$for (step, max_steps) {
Var r = length(pos);
// Hit event horizon?
$if (r < bh_radius) { $break; };
// Gravitational bending (general relativity)
Var accel = -1.5f * bh_mass / (r * r * r) * pos;
dir = normalize(dir + accel * dt);
// Step forward
pos = pos + dir * dt;
};
Step 3: Accretion Disk Rendering
The accretion disk is a thin sheet in the XZ plane. To render it correctly with proper depth ordering, we detect when the ray crosses the plane and determine whether we’re hitting the front (visible) or back (hidden behind black hole) side:
// Track previous position to detect plane crossing
Var prev_y = cam_pos.y;
$for (step, max_steps) {
// ... ray marching loop ...
// Check if we crossed the disk plane (XZ plane at y=0)
Var crossed_plane = (prev_y > 0.0f & pos.y <= 0.0f) |
(prev_y < 0.0f & pos.y >= 0.0f);
$if (crossed_plane & r > inner_radius & r < outer_radius) {
// Interpolate to find exact intersection point
Var t = abs(prev_y) / (abs(prev_y) + abs(pos.y));
Var intersect_pos = prev_pos * (1.0f - t) + pos * t;
Var intersect_r = length(intersect_pos);
// Calculate disk color (temperature, Doppler beaming, etc.)
// ... color calculation code ...
// Determine if this is front or back side
// Front side: ray moving toward the plane from camera
Var moving_toward_plane = (cam_pos.y > 0.0f & pos.y < prev_y) |
(cam_pos.y < 0.0f & pos.y > prev_y);
$if (moving_toward_plane) {
// Front side - use this sample
disk_color = local_disk_color;
hit_disk = true;
}
$else {
// Back side - only use if front wasn't hit
// (handles edge case of camera inside disk radius)
$if (!hit_disk) { disk_color = local_disk_color; };
};
};
prev_y = pos.y; // Update for next iteration
// ... continue ray marching ...
};
#### Step 4: Background Starfield
```cpp
// Procedural stars for lensing demonstration
auto get_star_color = [&](Float3 rd) noexcept {
Var star_noise = fract(sin(rd.x * 123.45f + ...) * 43758.5453f);
Var star_brightness = ite(star_noise > 0.995f, star_noise, 0.0f);
return make_float3(star_brightness);
};
Key Concepts
Geodesic Ray Tracing: Light follows curved paths in curved spacetime
Relativistic Doppler Effect: Motion toward/away affects brightness and color
Gravitational Redshift: Light loses energy escaping gravity
Photon Sphere: Where light can orbit the black hole
Depth-Ordered Rendering: The accretion disk must be rendered with proper front/back ordering so the back side (behind the black hole) doesn’t incorrectly show through the event horizon
Exercises
Spinning Black Hole: Add frame-dragging effects (Kerr metric) for a rotating black hole
Multiple Black Holes: Simulate a binary black hole system
Time Dilation: Visualize clock slowing near the event horizon
Gravitational Waves: Add ripples to the accretion disk from merging black holes
Summary
Congratulations! You’ve explored eleven fascinating GPU computing tutorials that demonstrate the breadth of what’s possible with LuisaCompute:
Visual Effects & Rendering
Mandelbrot: Mathematical beauty through iterative complex numbers
Path Tracer: Physically-based global illumination with ray tracing
Voxel Ray Tracer: Real-time ray marching through 3D voxel grids
Black Hole: Gravitational lensing and relativistic ray tracing (Interstellar-style)
Physics Simulations
MPM: Material Point Method for fluid/solid simulation
Wave Equation: Interactive water ripples with caustics rendering
N-Body: Gravitational galaxy simulation with 4,000+ particles
Pattern Formation & Cellular Automata
Game of Life: Classic cellular automata with emergent behavior
Reaction-Diffusion: Gray-Scott chemical patterns mimicking nature
Image Processing & Particles
Image Processing: Multi-pass Gaussian blur and edge detection
Fire Particles: 65,000 temperature-driven animated particles
Interactive Features Summary
Tutorial |
Interaction |
|---|---|
Wave Equation |
Click & drag to create ripples |
N-Body |
Mouse drag to rotate, scroll/+/- to zoom |
Game of Life |
Watch evolution, R to reset |
Fire |
SPACE to toggle wind |
Reaction-Diffusion |
1-4 to switch patterns, R to reset |
Voxel Ray Tracer |
Arrow keys to rotate, W/S to zoom |
Black Hole |
Left drag=Orbit, Right drag=Roll, Scroll/+/-=Zoom |
What You’ve Learned
Mandelbrot: Basic kernel structure, loops, and image output
Path Tracer: Ray tracing, acceleration structures, sampling
MPM: Complex simulation with multiple kernels, atomic operations
Game of Life: Cellular automata, ping-pong buffering
Wave Equation: PDE solving, interactive simulation
Image Processing: Multi-pass pipelines, convolution
N-Body: Particle systems, tile-based optimization
Fire Particles: Particle systems with lifecycle management and procedural noise
Reaction-Diffusion: Coupled PDEs and pattern formation
Voxel Ray Tracer: 3D grid traversal and ray-box intersection
Black Hole: Gravitational lensing, geodesic ray tracing, relativistic effects
Performance Tips
Use appropriate block sizes: 16×16 for 2D kernels, 256 for 1D
Minimize memory transfers: Keep data on GPU when possible
Batch operations: Use
CommandListfor multiple dispatchesProfile with tools: Use Nsight, RenderDoc, or vendor tools
Next Steps
Explore the LuisaCompute tests for more examples
Check LuisaRender for production rendering
Try implementing your own simulations!