Tutorials

This section provides step-by-step tutorials for building practical applications with LuisaCompute.

Table of Contents

ShaderToy-Style Mandelbrot - A gentle introduction
Path Tracing Renderer - Global illumination with ray tracing
MPM Fluid Simulation - Material Point Method for fluids
Conway’s Game of Life - Cellular automata with ping-pong buffering
Wave Equation Simulation - Interactive PDE solver
Image Processing Pipeline - Multi-pass filters and effects
N-Body Simulation - Gravitational particle physics
Fire Particle System - Physics-based fire simulation
Reaction-Diffusion - Gray-Scott pattern formation
Voxel Ray Tracer - Real-time voxel rendering
Black Hole Renderer - Interstellar-style gravitational lensing

ShaderToy-Style Mandelbrot Set

Let’s start with a simple yet visually appealing example: rendering the Mandelbrot set. This mimics the style of ShaderToy, a popular platform for fragment shaders.

Final Result

A colorful visualization of the Mandelbrot set with smooth coloring.

Step-by-Step Implementation

Step 1: Project Setup

Create a new project with the following structure:

mandelbrot/
├── CMakeLists.txt
└── main.cpp

CMakeLists.txt:

cmake_minimum_required(VERSION 3.23)
project(mandelbrot LANGUAGES CXX)

set(CMAKE_CXX_STANDARD 20)

# Add LuisaCompute
add_subdirectory(LuisaCompute)

add_executable(mandelbrot main.cpp)
target_link_libraries(mandelbrot PRIVATE luisa::compute)

Step 2: Basic Structure

main.cpp - Initial Setup:

#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <stb/stb_image_write.h>

using namespace luisa;
using namespace luisa::compute;

int main(int argc, char *argv[]) {
    // Step 1: Create context and device
    Context context{argv[0]};
    Device device = context.create_device("cuda");  // or "cpu", "metal", "dx"
    Stream stream = device.create_stream();
    
    // Step 2: Create an image to render to
    static constexpr uint2 resolution = make_uint2(1024u, 1024u);
    Image<float> image = device.create_image<float>(
        PixelStorage::FLOAT4, resolution.x, resolution.y);
    
    // We'll add the kernel here...
    
    return 0;
}

Step 3: Understanding the Mandelbrot Set

The Mandelbrot set is defined by the iteration:

z_{n+1} = z_n² + c

Where c is a complex number corresponding to pixel coordinates. If the sequence remains bounded (|z| ≤ 2), the point is in the set.

Step 4: Implementing the Kernel

// Inside main(), after creating the image:

Kernel2D mandelbrot_kernel = [&](ImageFloat image) noexcept {
    // Get pixel coordinates
    UInt2 coord = dispatch_id().xy();
    Float2 uv = make_float2(coord) / make_float2(resolution);
    
    // Map to complex plane: centered at (-0.5, 0), zoomed in
    Float2 c = (uv - 0.5f) * 3.0f - make_float2(0.5f, 0.0f);
    
    // Mandelbrot iteration
    Float2 z = make_float2(0.0f);
    Float n = 0.0f;
    static constexpr int M = 100;  // Maximum iterations
    
    $for (i, M) {
        // z = z² + c
        Float2 z_new = make_float2(
            z.x * z.x - z.y * z.y + c.x,
            2.0f * z.x * z.y + c.y
        );
        z = z_new;
        
        // Check divergence
        $if (dot(z, z) > 4.0f) {
            $break;
        };
        n += 1.0f;
    };
    
    // Color using cosine palette (Inigo Quilez's technique)
    Float t = n / static_cast<float>(M);
    Float3 color = 0.5f + 0.5f * cos(6.28318f * (t + make_float3(0.0f, 0.33f, 0.67f)));
    
    // Write to image
    image.write(coord, make_float4(color, 1.0f));
};

Step 5: Compile and Execute

// Compile the kernel
auto shader = device.compile(mandelbrot_kernel);

// Execute
stream << shader(image).dispatch(resolution)
       << synchronize();

// Save result
luisa::vector<uint8_t> pixels(resolution.x * resolution.y * 4);
stream << image.copy_to(pixels.data())
       << synchronize();

stbi_write_png("mandelbrot.png", resolution.x, resolution.y, 4, pixels.data(), 0);
LUISA_INFO("Saved to mandelbrot.png");

Complete Code

Click to expand complete code

#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <stb/stb_image_write.h>

using namespace luisa;
using namespace luisa::compute;

int main(int argc, char *argv[]) {
    Context context{argv[0]};
    Device device = context.create_device("cuda");
    Stream stream = device.create_stream();
    
    static constexpr uint2 resolution = make_uint2(1024u, 1024u);
    Image<float> image = device.create_image<float>(
        PixelStorage::FLOAT4, resolution.x, resolution.y);
    
    Kernel2D mandelbrot_kernel = [&](ImageFloat image) noexcept {
        UInt2 coord = dispatch_id().xy();
        Float2 uv = make_float2(coord) / make_float2(resolution);
        
        Float2 c = (uv - 0.5f) * 3.0f - make_float2(0.5f, 0.0f);
        Float2 z = make_float2(0.0f);
        Float n = 0.0f;
        static constexpr int M = 100;
        
        $for (i, M) {
            Float2 z_new = make_float2(
                z.x * z.x - z.y * z.y + c.x,
                2.0f * z.x * z.y + c.y
            );
            z = z_new;
            $if (dot(z, z) > 4.0f) { $break; };
            n += 1.0f;
        };
        
        Float t = n / static_cast<float>(M);
        Float3 color = 0.5f + 0.5f * cos(6.28318f * (t + make_float3(0.0f, 0.33f, 0.67f)));
        image.write(coord, make_float4(color, 1.0f));
    };
    
    auto shader = device.compile(mandelbrot_kernel);
    stream << shader(image).dispatch(resolution) << synchronize();
    
    luisa::vector<uint8_t> pixels(resolution.x * resolution.y * 4);
    stream << image.copy_to(pixels.data()) << synchronize();
    stbi_write_png("mandelbrot.png", resolution.x, resolution.y, 4, pixels.data(), 0);
    
    return 0;
}

Exercises

Zoom Animation: Modify c to animate over time
Julia Set: Keep c constant and vary initial z
Smooth Coloring: Use n - log2(log2(dot(z,z))) for smoother gradients

Path Tracing Renderer

Now let’s build a more sophisticated renderer using ray tracing. We’ll implement a basic path tracer for the Cornell Box scene.

Final Result

A physically-based rendered image with global illumination.

Step-by-Step Implementation

Step 1: Scene Setup

We need geometry (triangles), materials, and acceleration structures:

#include <luisa/luisa-compute.h>
#include <luisa/dsl/sugar.h>
#include <luisa/runtime/rtx/accel.h>
#include <luisa/runtime/rtx/mesh.h>
#include <stb/stb_image_write.h>

using namespace luisa;
using namespace luisa::compute;

// Cornell Box geometry data
static constexpr float3 vertices[] = {
    // Floor
    {0.0f, 0.0f, 0.0f}, {1.0f, 0.0f, 0.0f}, {1.0f, 0.0f, 1.0f}, {0.0f, 0.0f, 1.0f},
    // Ceiling
    {0.0f, 1.0f, 0.0f}, {1.0f, 1.0f, 0.0f}, {1.0f, 1.0f, 1.0f}, {0.0f, 1.0f, 1.0f},
    // ... more vertices
};

static constexpr uint3 triangles[] = {
    {0, 1, 2}, {0, 2, 3},  // Floor
    // ... more triangles
};

Step 2: Create Acceleration Structure

// Create buffers
Buffer<float3> vertex_buffer = device.create_buffer<float3>(num_vertices);
Buffer<Triangle> triangle_buffer = device.create_buffer<Triangle>(num_triangles);

// Upload data
stream << vertex_buffer.copy_from(vertices)
       << triangle_buffer.copy_from(triangles)
       << synchronize();

// Create mesh and acceleration structure
Mesh mesh = device.create_mesh(vertex_buffer, triangle_buffer);
Accel accel = device.create_accel();
accel.emplace_back(mesh, make_float4x4(1.0f));  // Identity transform

stream << mesh.build() << accel.build() << synchronize();

Step 3: Helper Callables

// Linear congruential generator for random numbers
Callable lcg = [](UInt &state) noexcept {
    constexpr uint a = 1664525u;
    constexpr uint c = 1013904223u;
    state = a * state + c;
    return cast<float>(state) / cast<float>(std::numeric_limits<uint>::max());
};

// Cosine-weighted hemisphere sampling
Callable cosine_sample_hemisphere = [](Float2 u) noexcept {
    Float r = sqrt(u.x);
    Float theta = 2.0f * pi * u.y;
    return make_float3(r * cos(theta), r * sin(theta), sqrt(1.0f - u.x));
};

// Build orthonormal basis from normal
Callable make_onb = [](Float3 normal) noexcept {
    Float3 b1 = ite(abs(normal.y) < 0.999f, 
                     normalize(cross(normal, make_float3(0.0f, 1.0f, 0.0f))),
                     make_float3(1.0f, 0.0f, 0.0f));
    Float3 b2 = cross(normal, b1);
    return make_float3x3(b1, b2, normal);
};

Step 4: The Path Tracing Kernel

Kernel2D path_tracer = [&](ImageFloat image, ImageUInt seed_image, 
                           AccelVar accel, UInt frame_index) noexcept {
    set_block_size(16u, 16u, 1u);
    
    UInt2 coord = dispatch_id().xy();
    Float2 resolution = make_float2(dispatch_size().xy());
    
    // Initialize random seed
    $if (frame_index == 0u) {
        seed_image.write(coord, make_uint4(coord.x * 1234567u + coord.y));
    };
    
    UInt seed = seed_image.read(coord).x;
    
    // Generate camera ray
    Float2 uv = (make_float2(coord) + make_float2(lcg(seed), lcg(seed))) / resolution;
    Float2 ndc = uv * 2.0f - 1.0f;
    
    Float3 origin = make_float3(0.0f, 0.0f, 3.0f);
    Float3 direction = normalize(make_float3(ndc.x, ndc.y, -1.0f));
    
    // Path tracing loop
    Float3 radiance = def(make_float3(0.0f));
    Float3 throughput = def(make_float3(1.0f));
    
    $for (bounce, 5u) {  // Maximum 5 bounces
        // Trace ray
        Var<Ray> ray = make_ray(origin, direction);
        Var<TriangleHit> hit = accel.intersect(ray, {});
        
        $if (hit->miss()) { $break; };
        
        // Get hit information
        Float3 p0 = vertex_buffer->read(hit.prim * 3u + 0u);
        Float3 p1 = vertex_buffer->read(hit.prim * 3u + 1u);
        Float3 p2 = vertex_buffer->read(hit.prim * 3u + 2u);
        Float3 p = hit->interpolate(p0, p1, p2);
        Float3 n = normalize(cross(p1 - p0, p2 - p0));
        
        // Simple diffuse material (white)
        Float3 albedo = make_float3(0.8f);
        
        // Add emission if hit light
        $if (hit.prim >= light_triangle_start) {
            radiance += throughput * make_float3(10.0f);  // Emissive
            $break;
        };
        
        // Sample next direction (cosine-weighted)
        Float2 u = make_float2(lcg(seed), lcg(seed));
        Float3 local_dir = cosine_sample_hemisphere(u);
        Var<Float3x3> onb = make_onb(n);
        Float3 new_dir = onb * local_dir;
        
        // Update throughput
        Float cos_theta = dot(new_dir, n);
        throughput *= albedo * cos_theta * (1.0f / pi);
        
        // Russian roulette
        Float p_survive = max(throughput.x, max(throughput.y, throughput.z));
        $if (lcg(seed) > p_survive) { $break; };
        throughput *= 1.0f / p_survive;
        
        // Setup next ray
        origin = p + n * 1e-4f;
        direction = new_dir;
    };
    
    // Accumulate over frames
    Float4 prev = image.read(coord);
    Float count = prev.w + 1.0f;
    Float3 avg = lerp(prev.xyz(), radiance, 1.0f / count);
    image.write(coord, make_float4(avg, count));
    
    seed_image.write(coord, make_uint4(seed));
};

Step 5: Rendering Loop

// Create images
Image<float> framebuffer = device.create_image<float>(
    PixelStorage::FLOAT4, resolution.x, resolution.y);
Image<uint> seed_image = device.create_image<uint>(
    PixelStorage::INT1, resolution.x, resolution.y);

auto renderer = device.compile(path_tracer);

// Render multiple frames for accumulation
for (uint frame = 0; frame < 100; frame++) {
    stream << renderer(framebuffer, seed_image, accel, frame).dispatch(resolution)
           << synchronize();
}

// Tonemap and save
Kernel2D tonemap = [&](ImageFloat hdr, ImageFloat ldr) noexcept {
    UInt2 coord = dispatch_id().xy();
    Float3 color = hdr.read(coord).xyz();
    // Simple Reinhard tonemapping
    color = color / (color + 1.0f);
    // Gamma correction
    color = pow(color, 1.0f / 2.2f);
    ldr.write(coord, make_float4(color, 1.0f));
};

Image<float> ldr = device.create_image<float>(
    PixelStorage::BYTE4, resolution.x, resolution.y);
stream << device.compile(tonemap)(framebuffer, ldr).dispatch(resolution)
       << synchronize();

// Save...

Key Concepts Explained

Path Tracing: Simulates light transport by tracing random paths from camera
Russian Roulette: Probabilistically terminates paths to avoid infinite recursion
Cosine Sampling: Importance sampling for diffuse surfaces
Accumulation: Multiple samples averaged for noise reduction

MPM Fluid Simulation

Material Point Method (MPM) is a hybrid Lagrangian-Eulerian method for simulating continua. We’ll implement a basic 3D fluid simulation.

Final Result

An interactive 3D fluid simulation with particles.

Step-by-Step Implementation

Step 1: Simulation Parameters

static constexpr int n_grid = 64;           // Grid resolution
static constexpr uint n_particles = 50000;  // Number of particles
static constexpr float dx = 1.0f / n_grid;  // Grid spacing
static constexpr float dt = 8e-5f;          // Time step
static constexpr float p_vol = (dx * 0.5f) * (dx * 0.5f) * (dx * 0.5f);
static constexpr float p_mass = 1.0f * p_vol;
static constexpr float E = 400.0f;          // Young's modulus
static constexpr float gravity = 9.8f;

Step 2: Data Structures

// Particle data (Lagrangian)
Buffer<float3> x = device.create_buffer<float3>(n_particles);  // Positions
Buffer<float3> v = device.create_buffer<float3>(n_particles);  // Velocities
Buffer<float3x3> C = device.create_buffer<float3x3>(n_particles);  // Affine momentum
Buffer<float> J = device.create_buffer<float>(n_particles);    // Volume ratio

// Grid data (Eulerian)
Buffer<float4> grid = device.create_buffer<float4>(n_grid * n_grid * n_grid);
// xyz = momentum, w = mass

Step 3: Grid Clearing Kernel

auto clear_grid = device.compile<3>([&] {
    set_block_size(8, 8, 1);
    UInt idx = dispatch_id().x + dispatch_id().y * n_grid 
               + dispatch_id().z * n_grid * n_grid;
    grid->write(idx, make_float4(0.0f));
});

Step 4: Particle-to-Grid Transfer

This is the heart of MPM - transferring particle data to the grid:

auto particle_to_grid = device.compile<1>([&] {
    set_block_size(64, 1, 1);
    UInt p = dispatch_id().x;
    
    // Particle position in grid coordinates
    Float3 Xp = x->read(p) / dx;
    Int3 base = make_int3(Xp - 0.5f);
    Float3 fx = Xp - make_float3(base);
    
    // Quadratic B-spline weights
    auto w = [&](Float3 fx) {
        return make_float3(
            0.5f * (1.5f - fx) * (1.5f - fx),
            0.75f - (fx - 1.0f) * (fx - 1.0f),
            0.5f * (fx - 0.5f) * (fx - 0.5f)
        );
    };
    Float3 wx = w(fx.x), wy = w(fx.y), wz = w(fx.z);
    
    // Stress-based force (simplified for fluids)
    Float stress = -4.0f * dt * E * p_vol * (J->read(p) - 1.0f) / (dx * dx);
    Float3x3 affine = make_float3x3(stress, 0.0f, 0.0f,
                                     0.0f, stress, 0.0f,
                                     0.0f, 0.0f, stress);
    Float3 vp = v->read(p);
    
    // Scatter to 3x3x3 grid nodes
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            for (int k = 0; k < 3; k++) {
                Float3 offset = make_float3(i, j, k);
                Float3 dpos = (offset - fx) * dx;
                Float weight = wx[i] * wy[j] * wz[k];
                
                Float3 v_add = weight * (p_mass * vp + affine * dpos);
                UInt idx = (base.x + i) + (base.y + j) * n_grid 
                          + (base.z + k) * n_grid * n_grid;
                
                // Atomic add to grid
                grid->atomic(idx).x.fetch_add(v_add.x);
                grid->atomic(idx).y.fetch_add(v_add.y);
                grid->atomic(idx).z.fetch_add(v_add.z);
                grid->atomic(idx).w.fetch_add(weight * p_mass);
            }
        }
    }
});

Step 5: Grid Velocity Update

auto grid_update = device.compile<3>([&] {
    set_block_size(8, 8, 1);
    Int3 coord = make_int3(dispatch_id().xyz());
    
    UInt idx = coord.x + coord.y * n_grid + coord.z * n_grid * n_grid;
    Float4 v_and_m = grid->read(idx);
    
    Float3 v = v_and_m.xyz();
    Float m = v_and_m.w;
    
    // Normalize by mass
    v = ite(m > 0.0f, v / m, v);
    
    // Gravity
    v.y -= dt * gravity;
    
    // Boundary conditions
    static constexpr int bound = 3;
    v = ite((coord < bound && v < 0.0f) || 
            (coord > n_grid - bound && v > 0.0f), 
            0.0f, v);
    
    grid->write(idx, make_float4(v, m));
});

Step 6: Grid-to-Particle Transfer

auto grid_to_particle = device.compile<1>([&] {
    set_block_size(64, 1, 1);
    UInt p = dispatch_id().x;
    
    Float3 Xp = x->read(p) / dx;
    Int3 base = make_int3(Xp - 0.5f);
    Float3 fx = Xp - make_float3(base);
    
    // Same weights as P2G
    // ... weight computation ...
    
    Float3 new_v = def(make_float3(0.0f));
    Float3x3 new_C = def(make_float3x3(0.0f));
    
    // Gather from grid
    for (int i = 0; i < 3; i++) {
        for (int j = 0; j < 3; j++) {
            for (int k = 0; k < 3; k++) {
                Float3 offset = make_float3(i, j, k);
                Float3 dpos = (offset - fx) * dx;
                Float weight = wx[i] * wy[j] * wz[k];
                
                UInt idx = (base.x + i) + (base.y + j) * n_grid 
                          + (base.z + k) * n_grid * n_grid;
                Float3 g_v = grid->read(idx).xyz();
                
                new_v += weight * g_v;
                new_C += 4.0f * weight * outer_product(g_v, dpos) / (dx * dx);
            }
        }
    }
    
    // Update particle state
    v->write(p, new_v);
    x->write(p, x->read(p) + new_v * dt);
    J->write(p, J->read(p) * (1.0f + dt * trace(new_C)));
    C->write(p, new_C);
});

Step 7: Simulation Loop

// Initialize particles
luisa::vector<float3> x_init(n_particles);
std::random_device rd;
std::mt19937 gen(rd());
std::uniform_real_distribution<float> dis(0.2f, 0.8f);

for (uint i = 0; i < n_particles; i++) {
    x_init[i] = make_float3(dis(gen), dis(gen), dis(gen));
}

stream << x.copy_from(x_init.data())
       << v.copy_from(luisa::vector<float3>(n_particles, make_float3(0.0f)).data())
       << J.copy_from(luisa::vector<float>(n_particles, 1.0f).data())
       << synchronize();

// Simulation loop
Window window{"MPM Fluid", 1024, 1024};
Swapchain swapchain = device.create_swapchain(stream, {
    .display = window.native_display(),
    .window = window.native_handle(),
    .size = make_uint2(1024u, 1024u),
});
Image<float> display = device.create_image<float>(
    swapchain.backend_storage(), make_uint2(1024u, 1024u));

while (!window.should_close()) {
    CommandList cmd;
    
    // MPM sub-steps
    for (int i = 0; i < 25; i++) {
        cmd << clear_grid().dispatch(n_grid, n_grid, n_grid)
            << particle_to_grid().dispatch(n_particles)
            << grid_update().dispatch(n_grid, n_grid, n_grid)
            << grid_to_particle().dispatch(n_particles);
    }
    
    // Render particles to display
    cmd << clear_display().dispatch(1024, 1024)
        << draw_particles().dispatch(n_particles)
        << swapchain.present(display);
    
    stream << cmd.commit();
    window.poll_events();
}

Key Concepts Explained

Lagrangian vs Eulerian: Particles track material; grid handles physics
APIC (Affine Particle-in-Cell): Uses affine momentum matrix for better accuracy
Transfer: P2G (particle to grid) and G2P (grid to particle) are the core operations
MLS-MPM: Moving Least Squares MPM for stability

Conway’s Game of Life

Cellular automata are fascinating discrete systems. Let’s implement Conway’s Game of Life, a classic example that demonstrates ping-pong buffering - a technique where we alternate between two buffers to avoid read-write conflicts.

Final Result

An interactive simulation where cells evolve based on simple rules, creating complex emergent patterns.

Step-by-Step Implementation

Step 1: Understanding the Rules

Conway’s Game of Life follows three simple rules:

Survival: Live cells with 2-3 neighbors survive
Birth: Dead cells with exactly 3 neighbors become alive
Death: All other cells die or stay dead

Step 2: The ImagePair Structure

We’ll use two images that swap each frame:

struct ImagePair {
    Image<uint> prev;
    Image<uint> curr;
    ImagePair(Device &device, PixelStorage storage, uint width, uint height) noexcept
        : prev{device.create_image<uint>(storage, width, height)},
          curr{device.create_image<uint>(storage, width, height)} {}
    void swap() noexcept { std::swap(prev, curr); }
};

Step 3: The Update Kernel

// Helper to read cell state
Callable read_state = [](ImageUInt prev, UInt2 uv) noexcept {
    return prev.read(uv).x == 255u;
};

Kernel2D kernel = [&](ImageUInt prev, ImageUInt curr) noexcept {
    set_block_size(16, 16, 1);
    UInt count = def(0u);
    UInt2 uv = dispatch_id().xy();
    UInt2 size = dispatch_size().xy();
    Bool state = read_state(prev, uv);
    Int2 p = make_int2(uv);
    
    // Check all 8 neighbors with toroidal wrapping
    for (int dy = -1; dy <= 1; dy++) {
        for (int dx = -1; dx <= 1; dx++) {
            if (dx != 0 || dy != 0) {
                Int2 q = p + make_int2(dx, dy) + make_int2(size);
                Bool neighbor = read_state(prev, make_uint2(q) % size);
                count += ite(neighbor, 1, 0);
            }
        }
    }
    
    // Apply Conway's rules
    Bool c0 = count == 2u;
    Bool c1 = count == 3u;
    curr.write(uv, make_uint4(make_uint3(ite((state & c0) | c1, 255u, 0u)), 255u));
};

Step 4: Display Kernel

We upscale the low-res simulation for better visibility:

Kernel2D display_kernel = [&](ImageUInt in_tex, ImageFloat out_tex) noexcept {
    set_block_size(16, 16, 1);
    UInt2 uv = dispatch_id().xy();
    UInt2 coord = uv / 4u;  // 4x upscaling
    UInt4 value = in_tex.read(coord);
    out_tex.write(uv, make_float4(value) / 255.0f);
};

Step 5: Initialization and Main Loop

// Initialize with random state (25% alive)
luisa::vector<uint> host_image(width * height);
for (auto &v : host_image) {
    auto x = (rng() % 4u == 0u) * 255u;
    v = x * 0x00010101u | 0xff000000u;  // White or black
}
stream << image_pair.prev.copy_from(host_image.data()) << synchronize();

// Main loop
while (!window.should_close()) {
    stream << shader(image_pair.prev, image_pair.curr).dispatch(width, height)
           << display_shader(image_pair.curr, display).dispatch(width * 4u, height * 4u)
           << swap_chain.present(display);
    image_pair.swap();  // Ping-pong: swap buffers
    std::this_thread::sleep_for(std::chrono::milliseconds{30});
    window.poll_events();
}

Key Concepts

Ping-Pong Buffering: We read from prev and write to curr, then swap
Toroidal Boundary: % size makes the grid wrap around (edges connect)
Neighborhood: The 8 surrounding cells determine each cell’s fate

Exercises

Different Rules: Try HighLife (born with 3 or 6 neighbors)
Pattern Library: Implement a pattern loader for gliders, blinkers, etc.
Multi-State: Add fading effects for recently-dead cells

Wave Equation Simulation

Simulate realistic water ripples using the finite difference method to solve the 2D wave equation. This demonstrates PDE solving and interactive GPU computing with beautiful visual results.

Final Result

An interactive water simulation where you can click and drag to create waves that propagate realistically, interfere with each other, and gradually dampen. The rendering includes caustics effects and foam at wave crests.

The Physics

The wave equation: ∂²u/∂t² = c²(∂²u/∂x² + ∂²u/∂y²)

Discretized form: u_next = 2*u_curr - u_prev + c²*dt²*laplacian(u)

For stability, the CFL condition requires: c² * dt² < 0.5

Step-by-Step Implementation

Step 1: Setup Three Buffers

We need three height buffers for time integration (past, present, future):

Image<float> height_prev = device.create_image<float>(PixelStorage::FLOAT1, width, height);
Image<float> height_curr = device.create_image<float>(PixelStorage::FLOAT1, width, height);
Image<float> height_next = device.create_image<float>(PixelStorage::FLOAT1, width, height);

Step 2: The Wave Solver Kernel

Kernel2D wave_step = [&](ImageFloat prev, ImageFloat curr, ImageFloat next) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();
    Var size = dispatch_size().xy();

    Var u_curr = curr.read(uv).x;

    // Sample neighbors with clamping (not wrapping)
    auto sample = [&](Int2 offset) noexcept {
        Int2 p = make_int2(uv) + offset;
        p.x = clamp(p.x, 0, cast<int>(size.x) - 1);
        p.y = clamp(p.y, 0, cast<int>(size.y) - 1);
        return curr.read(make_uint2(p)).x;
    };

    Var u_left = sample(make_int2(-1, 0));
    Var u_right = sample(make_int2(1, 0));
    Var u_up = sample(make_int2(0, -1));
    Var u_down = sample(make_int2(0, 1));

    // Laplacian: sum of neighbors - 4*center
    Var laplacian = u_left + u_right + u_up + u_down - 4.0f * u_curr;

    // Read previous height
    Var u_prev = prev.read(uv).x;

    // Wave equation integration
    Var u_next = 2.0f * u_curr - u_prev + (c * c * dt * dt) * laplacian;
    
    // Apply damping for energy loss
    u_next *= damping;
    u_next = clamp(u_next, -2.0f, 2.0f);

    next.write(uv, make_float4(u_next, 0.0f, 0.0f, 1.0f));
};

Step 3: Interactive Droplet Drop

// Setup mouse callbacks
window.set_mouse_callback([&mouse_down, &mouse_pos](MouseButton, Action a, float2 p) noexcept {
    if (a == Action::ACTION_PRESSED) {
        mouse_down = true;
    } else if (a == Action::ACTION_RELEASED) {
        mouse_down = false;
    }
    mouse_pos = p;
});

// Droplet kernel - creates Gaussian ripple
Kernel2D drop_droplet = [](ImageFloat height, UInt2 center, Float strength) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();

    Var dx = cast<float>(cast<int>(uv.x) - cast<int>(center.x));
    Var dy = cast<float>(cast<int>(uv.y) - cast<int>(center.y));
    Var dist_sq = dx * dx + dy * dy;
    
    Var radius = 15.0f;
    Var gaussian = exp(-dist_sq / (radius * radius * 0.5f));

    Var current = height.read(uv).x;
    height.write(uv, make_float4(current + strength * gaussian, 0.0f, 0.0f, 1.0f));
};

Step 4: Rendering with Caustics Effect

Kernel2D render_kernel = [](ImageFloat height, ImageFloat output, Float time) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();
    Var size = dispatch_size().xy();

    Var h = height.read(uv).x;

    // Sample neighbors for slope (caustics effect)
    Var h_left = sample(make_int2(-2, 0));
    Var h_right = sample(make_int2(2, 0));
    Var h_up = sample(make_int2(0, -2));
    Var h_down = sample(make_int2(0, 2));

    // Slope magnitude creates caustic highlights
    Var slope_x = abs(h_right - h_left);
    Var slope_y = abs(h_down - h_up);
    Var slope = sqrt(slope_x * slope_x + slope_y * slope_y);

    // Water color palette
    Var deep_color = make_float3(0.0f, 0.1f, 0.3f);
    Var shallow_color = make_float3(0.0f, 0.4f, 0.6f);
    Var foam_color = make_float3(0.9f, 0.95f, 1.0f);
    Var highlight_color = make_float3(0.6f, 0.8f, 1.0f);

    // Base color based on wave height
    Var t = clamp(h * 0.5f + 0.5f, 0.0f, 1.0f);
    Var base_color = lerp(deep_color, shallow_color, t);

    // Add caustics based on slope
    Var caustics = smoothstep(0.1f, 0.5f, slope);
    base_color = lerp(base_color, highlight_color, caustics * 0.5f);

    // Foam at wave peaks
    Var foam = smoothstep(0.5f, 1.0f, h);
    base_color = lerp(base_color, foam_color, foam * 0.7f);

    output.write(uv, make_float4(base_color, 1.0f));
};

Step 5: Main Simulation Loop

while (!window.should_close()) {
    // Handle mouse interaction
    if (mouse_down) {
        uint sim_x = clamp(cast<uint>(mouse_pos.x / scale), 0u, width - 1);
        uint sim_y = clamp(cast<uint>(mouse_pos.y / scale), 0u, height - 1);
        
        // Interpolate for smooth trails when dragging
        stream << drop_shader(height_curr, make_uint2(sim_x, sim_y), -0.5f)
                      .dispatch(width, height);
    }

    // Update wave simulation (multiple steps per frame)
    for (int step = 0; step < 4; step++) {
        stream << wave_shader(height_prev, height_curr, height_next).dispatch(width, height);
        std::swap(height_prev, height_curr);
        std::swap(height_curr, height_next);
    }

    // Render
    stream << render(height_curr, display, time).dispatch(width, height)
           << swap_chain.present(display);
}

Key Concepts

Finite Differences: Approximate spatial derivatives using neighbor samples
Time Integration: Three-buffer scheme for second-order time integration (Verlet-style)
Stability: CFL condition requires careful parameter tuning
Caustics: Surface slope creates realistic water shimmer

Tips for Better Results

Parameters: Try c = 0.2f, dt = 1.0f, damping = 0.995f for nice propagation
Multiple Steps: Run 4 simulation steps per frame for faster wave propagation
Smooth Trails: Interpolate droplet positions when dragging mouse
Clamped Boundaries: Edges act as walls (wave reflection) rather than wrapping

Exercises

Variable Depth: Make wave speed vary by location for refraction effects
Obstacles: Add static obstacles that block/reflect waves
Rain: Add periodic random droplets for ambient activity
Underwater: Add objects beneath the surface with distortion

Image Processing Pipeline

Build a multi-stage image processing pipeline demonstrating separable filters, convolution, and multi-pass rendering.

Final Result

A real-time image processing demo showing Gaussian blur, Sobel edge detection, and creative compositing.

Step-by-Step Implementation

Step 1: Generate Test Pattern

Kernel2D generate_pattern = [](ImageFloat image) noexcept {
    set_block_size(16, 16, 1);
    Var uv = make_float2(dispatch_id().xy()) / make_float2(dispatch_size().xy());

    // Checkerboard
    Var checker = sin(uv.x * 20.0f) * sin(uv.y * 20.0f);
    
    // Concentric circles
    Var center = uv - make_float2(0.5f);
    Var radius = length(center);
    Var circles = sin(radius * 50.0f);
    
    // Combine patterns
    Var pattern = checker * 0.3f + circles * 0.4f;
    
    // Add "face-like" features
    Var eye1 = exp(-length(uv - make_float2(0.35f, 0.4f)) * 50.0f);
    Var eye2 = exp(-length(uv - make_float2(0.65f, 0.4f)) * 50.0f);
    Var mouth = exp(-pow(uv.y - 0.6f - 0.1f * pow(uv.x - 0.5f, 2.0f), 2.0f) * 200.0f);

    Var final_val = pattern * 0.5f + 0.5f + eye1 * 0.3f + eye2 * 0.3f + mouth * 0.2f;
    image.write(dispatch_id().xy(), make_float4(make_float3(final_val), 1.0f));
};

Step 2: Separable Gaussian Blur

Instead of a 2D convolution (81 samples), we do two 1D passes (18 samples total):

// Horizontal pass
Kernel2D gaussian_blur_h = [](ImageFloat input, ImageFloat output) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();
    Var size = dispatch_size().xy();

    // Gaussian weights (sigma=2.0)
    Var weights = make_float9(0.006f, 0.028f, 0.084f, 0.168f, 0.224f,
                              0.168f, 0.084f, 0.028f, 0.006f);
    Var offsets = make_int9(-4, -3, -2, -1, 0, 1, 2, 3, 4);

    Var sum = make_float3(0.0f);
    for (uint i = 0u; i < 9u; i++) {
        Var sample_uv = make_uint2(
            clamp(cast<int>(uv.x) + offsets[i], 0, cast<int>(size.x) - 1),
            uv.y);
        sum += input.read(sample_uv).xyz() * weights[i];
    }
    output.write(uv, make_float4(sum, 1.0f));
};

// Vertical pass (similar structure)
Kernel2D gaussian_blur_v = [](ImageFloat input, ImageFloat output) noexcept { ... };

Step 3: Sobel Edge Detection

Kernel2D sobel_edge = [](ImageFloat input, ImageFloat output) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();
    Var size = dispatch_size().xy();

    auto sample = [&](Int2 offset) noexcept {
        Var sample_uv = make_uint2(
            clamp(cast<int>(uv.x) + offset.x, 0, cast<int>(size.x) - 1),
            clamp(cast<int>(uv.y) + offset.y, 0, cast<int>(size.y) - 1));
        return input.read(sample_uv).x;  // Luminance
    };

    // Sobel X kernel: [-1 0 1; -2 0 2; -1 0 1]
    Var gx = sample(make_int2(-1, -1)) * -1.0f +
             sample(make_int2(1, -1)) * 1.0f +
             sample(make_int2(-1, 0)) * -2.0f +
             sample(make_int2(1, 0)) * 2.0f +
             sample(make_int2(-1, 1)) * -1.0f +
             sample(make_int2(1, 1)) * 1.0f;

    // Sobel Y kernel: [-1 -2 -1; 0 0 0; 1 2 1]
    Var gy = sample(make_int2(-1, -1)) * -1.0f +
             sample(make_int2(0, -1)) * -2.0f +
             sample(make_int2(1, -1)) * -1.0f +
             sample(make_int2(-1, 1)) * 1.0f +
             sample(make_int2(0, 1)) * 2.0f +
             sample(make_int2(1, 1)) * 1.0f;

    // Gradient magnitude and direction
    Var magnitude = sqrt(gx * gx + gy * gy);
    Var angle = atan2(gy, gx) / 3.14159f * 0.5f + 0.5f;
    
    output.write(uv, make_float4(magnitude, angle, 0.0f, 1.0f));
};

Step 4: Creative Compositing

Kernel2D composite = [](ImageFloat original, ImageFloat blurred,
                       ImageFloat edges, ImageFloat output, Float edge_intensity) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();

    Var orig = original.read(uv);
    Var blur = blurred.read(uv);
    Var edge = edges.read(uv);

    // Unsharp mask sharpening
    Var sharpen_amount = 1.5f;
    Var sharpened = orig.xyz() + (orig.xyz() - blur.xyz()) * sharpen_amount;
    sharpened = clamp(sharpened, 0.0f, 1.0f);

    // Edge overlay with cyan color
    Var edge_color = make_float3(0.0f, 0.8f, 1.0f) * edge.x * edge_intensity;
    Var final_color = lerp(sharpened, edge_color, edge.x * 0.5f);

    // Vignette effect
    Var coord = make_float2(uv) / make_float2(dispatch_size().xy());
    Var center_dist = length(coord - make_float2(0.5f));
    Var vignette = 1.0f - center_dist * 0.5f;

    output.write(uv, make_float4(final_color * vignette, 1.0f));
};

Step 5: Pipeline Execution

// Create intermediate images
Image<float> source_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> temp_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> blurred_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);
Image<float> edge_image = device.create_image<float>(PixelStorage::FLOAT4, width, height);

// Generate source
stream << pattern_shader(source_image).dispatch(width, height);

// Run pipeline once
stream << blur_h_shader(source_image, temp_image).dispatch(width, height)
       << blur_v_shader(temp_image, blurred_image).dispatch(width, height)
       << sobel_shader(source_image, edge_image).dispatch(width, height);

// Animate in main loop
while (!window.should_close()) {
    float edge_intensity = (sinf(time) + 1.0f) * 0.5f;  // Pulsing effect
    stream << composite_shader(source_image, blurred_image, edge_image,
                               display, edge_intensity).dispatch(width, height)
           << swap_chain.present(display);
}

Key Concepts

Separable Filters: Decompose 2D convolution into two 1D passes for efficiency
Multi-Pass Rendering: Chain multiple kernels for complex effects
Unsharp Mask: Sharpening using original + (original - blurred) * amount

Exercises

Bloom Effect: Add a bright-pass filter before blur
Custom Kernels: Implement emboss or motion blur filters
Live Input: Use camera input instead of procedural pattern

N-Body Gravitational Simulation

Simulate thousands of stars interacting through gravity. This demonstrates particle systems, tile-based optimization, and 3D rendering on the GPU.

Final Result

A rotating galaxy simulation with thousands of gravitationally interacting particles.

Step-by-Step Implementation

Step 1: Particle Structure

struct Particle {
    float3 position;
    float3 velocity;
    float mass;
    float pad[3];  // Alignment padding
};

LUISA_STRUCT(Particle, position, velocity, mass, pad) {};

Step 2: Galaxy Initialization

// Initialize in a disk-like galaxy configuration
luisa::vector<Particle> host_particles(n_particles);
for (uint i = 0u; i < n_particles; i++) {
    float radius = dist_radius(rng);
    float angle = dist_angle(rng);
    float height = (rng() / float(UINT32_MAX) - 0.5f) * 0.1f;

    // Position in disk
    float3 pos{
        radius * cosf(angle),
        height,
        radius * sinf(angle)};

    // Tangential velocity for orbital motion
    float orbital_speed = sqrtf(G * 1000.0f / radius);
    float3 vel{
        -orbital_speed * sinf(angle),
        0.0f,
        orbital_speed * cosf(angle)};

    host_particles[i] = Particle{
        .position = pos,
        .velocity = vel,
        .mass = dist_mass(rng),
        .pad = {0.0f, 0.0f, 0.0f}};
}
stream << particles_read.copy_from(host_particles.data()) << synchronize();

Step 3: N-Body Physics Kernel

Kernel1D nbody_kernel = [&](BufferVar<Particle> read_buf, 
                            BufferVar<Particle> write_buf) noexcept {
    set_block_size(tile_size);
    Var idx = dispatch_id().x;
    Var p = read_buf.read(idx);

    // Accumulate gravitational forces
    Var force = make_float3(0.0f);

    // Tile loop - process particles in chunks
    for (uint tile = 0u; tile < n_particles; tile += tile_size) {
        for (uint j = 0u; j < tile_size; j++) {
            Var other_idx = tile + j;
            
            // Skip self-interaction to avoid infinity
            if_(other_idx != idx, [&] {
                Var other = read_buf.read(other_idx);
                Var r = other.position - p.position;
                Var dist_sq = dot(r, r) + softening * softening;
                Var dist = sqrt(dist_sq);
                Var f = G * p.mass * other.mass / dist_sq;
                force += f * r / dist;  // F = G*m1*m2/r² * direction
            });
        }
    }

    // Euler integration
    Var new_vel = p.velocity + force / p.mass * dt;
    Var new_pos = p.position + new_vel * dt;

    write_buf.write(idx, Particle{
        .position = new_pos,
        .velocity = new_vel,
        .mass = p.mass,
        .pad = {0.0f, 0.0f, 0.0f}});
};

Step 4: 3D Rendering Kernel

Kernel2D render_kernel = [&](BufferVar<Particle> particles, 
                             ImageFloat image, Float rot_x, Float rot_y) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();
    Var size = dispatch_size().xy();

    // Clear background
    image.write(uv, make_float4(0.02f, 0.02f, 0.05f, 1.0f));

    Var color = make_float3(0.0f);

    // Project particles
    for (uint i = 0u; i < n_particles; i += 64u) {
        Var p = particles.read(i);

        // Rotation around Y axis
        Var cos_y = cos(rot_y);
        Var sin_y = sin(rot_y);
        Var x1 = p.position.x * cos_y - p.position.z * sin_y;
        Var z1 = p.position.x * sin_y + p.position.z * cos_y;
        Var y1 = p.position.y;

        // Rotation around X axis
        Var cos_x = cos(rot_x);
        Var sin_x = sin(rot_x);
        Var y2 = y1 * cos_x - z1 * sin_x;
        Var z2 = y1 * sin_x + z1 * cos_x;

        // Perspective projection
        Var distance = 5.0f + z2;
        Var scale = 2.0f / distance;
        Var screen_x = (x1 * scale + 1.0f) * 0.5f * cast<float>(size.x);
        Var screen_y = (y2 * scale + 1.0f) * 0.5f * cast<float>(size.y);

        // Gaussian glow
        Var dx = cast<float>(uv.x) - screen_x;
        Var dy = cast<float>(uv.y) - screen_y;
        Var d_sq = dx * dx + dy * dy;
        Var intensity = exp(-d_sq / (50.0f * scale));

        // Color based on particle index
        Var particle_color = make_float3(
            0.8f + 0.2f * sin(cast<float>(i)),
            0.6f + 0.3f * cos(cast<float>(i) * 1.3f),
            1.0f);

        color += particle_color * intensity * 0.1f;
    }

    // Motion blur effect
    Var old = image.read(uv).xyz();
    Var final_color = lerp(old, min(color, 1.0f), 0.3f);
    image.write(uv, make_float4(final_color, 1.0f));
};

Step 5: Main Loop

while (!window.should_close()) {
    // Auto-rotate
    rot_y += 0.002f;

    // Physics update (double buffering)
    stream << nbody_shader(particles_read, particles_write).dispatch(n_particles);
    std::swap(particles_read, particles_write);

    // Render every few frames
    if (frame % 3u == 0u) {
        stream << render_shader(particles_read, display, rot_x, rot_y)
                      .dispatch(width, height)
               << swap_chain.present(display);
    }
    frame++;
}

Key Concepts

Softening: Add small epsilon to prevent division by zero at close range
Tile-Based Processing: Process in chunks for cache efficiency
Double Buffering: Read from one buffer, write to another, then swap
Gaussian Splatting: Render particles as soft glows rather than points

Exercises

Barnes-Hut: Implement O(n log n) hierarchical approximation
Collision Detection: Add particle collisions
Different Forces: Try Coulomb (electrostatic) or Lennard-Jones (molecular)

Fire Particle System

Create a mesmerizing fire simulation using 65,000 GPU particles with physics-based motion, procedural turbulence, and temperature-driven color gradients. Watch as flames dance and flicker with realistic fluid-like behavior.

Final Result

A captivating campfire-like effect with:

White-hot cores at the base where particles are born
Yellow-orange flames rising and swirling upward
Red embers cooling as they float higher
Procedural turbulence creating realistic flickering motion
Interactive wind (press SPACE) that bends the flames sideways

The particles cycle through their lifecycle endlessly—born hot and bright, rising with buoyancy, cooling as they age, fading to smoke, then respawning to keep the fire alive.

Key Concepts

Particle Lifecycle Management: Each particle tracks its own state:

struct FireParticle {
    float3 position;    // Position in 3D space
    float lifetime;     // Seconds remaining before respawn
    float3 velocity;    // Current movement direction
    float temperature;  // 1.0 = white hot, 0.0 = cold
    float size;         // Particle radius for rendering
};

This structure enables individual particle behavior while the GPU processes thousands in parallel.

struct FireParticle {
    float3 position;
    float lifetime;
    float3 velocity;
    float temperature;
    float size;
};

Procedural Turbulence: We use a simple 3D noise function to create realistic turbulent motion:

Callable noise3d = [](Float3 p) noexcept {
    Var i = floor(p);
    Var f = fract(p);
    f = f * f * (3.0f - 2.0f * f);
    // ... trilinear interpolation
};

Temperature-Based Coloring: The color gradient follows real fire physics:

White: Hottest (1.0)
Yellow: Hot (0.75-1.0)
Orange: Warm (0.5-0.75)
Red: Cool (0.25-0.5)
Black: Cold (< 0.25)

Step-by-Step Implementation

Step 1: Particle Update Kernel

Kernel1D update_kernel = [&](BufferVar<FireParticle> particles, 
                             Float time, Float wind) noexcept {
    set_block_size(256);
    Var idx = dispatch_id().x;
    Var p = particles.read(idx);

    $if (p.lifetime > 0.0f) {
        // Apply gravity (negative = buoyancy)
        p.velocity.y += gravity * dt;
        
        // Add turbulence
        Var noise_pos = p.position * 2.0f + time;
        p.velocity.x += (noise3d(noise_pos) - 0.5f) * 2.0f * dt;
        
        // Wind effect
        p.velocity.x += wind * dt;
        
        // Update position
        p.position += p.velocity * dt;
        
        // Cool down
        p.temperature -= dt * 0.3f;
        p.lifetime -= dt;
    }
    $else {
        // Respawn with random properties
        Var seed = idx + cast<uint>(time * 1000.0f);
        // ... spawn new particle
    };

    particles.write(idx, p);
};

Step 2: Rendering with Additive Blending

Kernel2D render_kernel = [&](BufferVar<FireParticle> particles, 
                             ImageFloat image) noexcept {
    // For each pixel, accumulate contributions from all particles
    Var color = make_float3(0.0f);
    
    for (uint i = 0u; i < n_particles; i += 256u) {
        Var p = particles.read(i);
        
        // Project to screen space
        Var screen_pos = project(p.position);
        Var dist = length(pixel_pos - screen_pos);
        
        // Gaussian intensity falloff
        Var intensity = exp(-dist_sq / (p.size * size_scale));
        
        // Temperature-based color
        Var fire_color = get_fire_color(p.temperature);
        color += fire_color * intensity;
    }
    
    image.write(uv, make_float4(color, 1.0f));
};

Exercises

Smoke: Add a smoke trail that rises from cooling particles
Sparks: Add smaller, faster particles that shoot upward occasionally
Interactive: Make the fire react to mouse position as a wind source

Reaction-Diffusion Simulation

Watch mathematical magic unfold as beautiful organic patterns emerge from simple chemical rules. The Gray-Scott model simulates how two chemicals interact to create mesmerizing textures resembling coral reefs, zebra stripes, fingerprints, and more.

What You’ll See

Starting from a tiny seed in the center, watch as intricate patterns grow and evolve across the screen:

Coral Growth: Branching, tree-like structures spreading outward
Fingerprint Patterns: Meandering, maze-like lines
Leopard Spots: Isolated dots that grow and stabilize
Stripes: Parallel waves that propagate across the field

Press keys 1-4 to switch between pattern types in real-time and observe how the same equations produce wildly different results!

The Gray-Scott Model

Two virtual chemicals dance on the grid:

U (inhibitor, shown as blue): The “food” that prevents reaction
V (activator, shown as red/yellow): The “predator” that consumes U and spreads

The mathematical rules that create life-like patterns:

du/dt = Du * ∇²u - u*v² + F*(1-u)   // U diffuses and gets consumed
dv/dt = Dv * ∇²v + u*v² - (F+k)*v   // V diffuses, grows, and dies

Where:

Du, Dv: How fast each chemical spreads (U spreads faster than V)
F (Feed): Rate that fresh U is added
k (Kill): Rate that V is removed
u*v²: The reaction where V consumes U

Pattern Types

By adjusting feed rate (F) and kill rate (k), we get different patterns:

Pattern	F	k	Description
Coral	0.035	0.06	Branching coral-like growth
Fingerprint	0.037	0.06	Meandering lines
Spots	0.03	0.062	Isolated spots
Stripes	0.03	0.054	Parallel stripes

Step-by-Step Implementation

Step 1: Chemical State Storage

// Two buffers for ping-pong rendering
Image<float> u_prev, u_curr;  // Inhibitor
Image<float> v_prev, v_curr;  // Activator

Step 2: Reaction-Diffusion Kernel

Kernel2D rd_step = [&](ImageFloat u_in, ImageFloat v_in,
                       ImageFloat u_out, ImageFloat v_out,
                       Float F, Float k, Float Du, Float Dv) noexcept {
    set_block_size(16, 16, 1);
    Var uv = dispatch_id().xy();

    // Sample current state
    Var u = u_in.read(uv).x;
    Var v = v_in.read(uv).x;

    // Compute Laplacian (5-point stencil)
    Var u_lap = (sample(u_in, uv, {-1,0}) + sample(u_in, uv, {1,0}) +
                 sample(u_in, uv, {0,-1}) + sample(u_in, uv, {0,1}) +
                 u) * 0.2f - u;

    // Reaction term
    Var reaction = u * v * v;

    // Update
    Var u_new = u + dt * (Du * u_lap - reaction + F * (1.0f - u));
    Var v_new = v + dt * (Dv * v_lap + reaction - (F + k) * v);

    u_out.write(uv, make_float4(u_new, 0, 0, 1));
    v_out.write(uv, make_float4(v_new, 0, 0, 1));
};

Step 3: Visualization

Kernel2D visualize = [&](ImageFloat u, ImageFloat v, ImageFloat output) {
    Var uv = dispatch_id().xy();
    Var u_val = u.read(uv).x;
    Var v_val = v.read(uv).x;
    
    // U = blue, V = red/yellow
    Var color = make_float3(v_val * 1.5f,  // Red
                           v_val * u_val * 0.5f,  // Green
                           u_val * 0.8f);  // Blue
    
    output.write(uv, make_float4(color, 1.0f));
};

Key Concepts

Reaction: V consumes U when they meet (u*v² term)
Diffusion: U spreads faster than V (Du > Dv)
Feed/Kill: Constant injection of U, removal of V
Instability: Small perturbations grow into patterns

Exercises

Multiscale: Use different F/k values in different regions
3D Extension: Extend to 3D and render with volume ray marching
Interactive Brush: Paint V values with mouse to guide pattern growth

Voxel Ray Tracer

Explore a procedurally generated 3D voxel world in real-time! This example renders a Minecraft-style landscape with terrain, trees, and floating islands using ray marching through a voxel grid. Use arrow keys to rotate the view and W/S to zoom.

What You’ll See

A charming voxel scene featuring:

Rolling Hills: Procedurally generated terrain with sine-wave elevation
Green Grass Tops: Different block types for grass, dirt, and stone
Trees: Scattered throughout with wood trunks and leafy canopies
Floating Magic Orb: A mysterious purple sphere hovering above the landscape
Interactive Camera: Rotate with arrow keys, zoom with W/S

How Ray Voxel Traversal Works

Unlike traditional polygon rendering, we trace rays through a 3D grid:

Ray-Box Intersection: Calculate where the camera ray enters the voxel grid bounding box
DDA Traversal: Use the Digital Differential Analyzer algorithm to step through voxels in order along the ray, similar to Bresenham’s line algorithm in 3D
Hit Detection: At each voxel, check if it’s solid. If yes, render its color. If no, continue stepping.

This technique, used in games like Minecraft and Teardown, allows rendering millions of voxels efficiently without storing explicit geometry.

Step-by-Step Implementation

Step 1: Voxel Grid Storage

static constexpr uint grid_size = 64;
Buffer<uint> voxel_grid = device.create_buffer<uint>(
    grid_size * grid_size * grid_size);

// Voxel types: 0=empty, 1=dirt, 2=grass, 3=stone, etc.

Step 2: Ray-Box Intersection

Callable intersect_box = [](Float3 ray_origin, Float3 ray_dir,
                            Float3 box_min, Float3 box_max) {
    // Slab method for AABB intersection
    Var tmin = (box_min - ray_origin) / ray_dir;
    Var tmax = (box_max - ray_origin) / ray_dir;
    // ... swap if necessary
    return make_float2(entry_t, exit_t);
};

Step 3: DDA Traversal

Callable trace_ray = [](Float3 origin, Float3 dir, BufferVar<uint> voxels) {
    // Find entry point
    Var box_hit = intersect_box(origin, dir, grid_min, grid_max);
    $if (box_hit.x < 0.0f) { return miss_color; };

    // Initialize DDA
    Var t = box_hit.x;
    Var vx = floor(origin.x + dir.x * t);
    // ... calculate step direction and initial distances

    // Traverse
    $for (step, max_steps) {
        Var voxel = read_voxel(voxels, vx, vy, vz);
        $if (voxel > 0u) {
            // Hit! Return voxel color
            return get_voxel_color(voxel);
        };
        
        // Step to next voxel
        $if (next_tx < next_ty & next_tx < next_tz) {
            vx += step_x;
            next_tx += delta_tx;
        }
        // ... similar for y, z
    };
    
    return sky_color;  // Missed everything
};

Step 4: Camera and Rendering

Kernel2D render = [&](ImageFloat image, Float3 cam_pos, 
                      Float2 cam_rot, BufferVar<uint> voxels) {
    Var uv = dispatch_id().xy();
    Var ndc = (make_float2(uv) / resolution) * 2.0f - 1.0f;
    
    // Generate ray direction from camera
    Var ray_dir = normalize(make_float3(
        ndc.x * aspect * fov,
        -ndc.y * fov,
        -1.0f
    ));
    
    // Apply camera rotation
    ray_dir = rotate_y(ray_dir, cam_rot.x);
    ray_dir = rotate_x(ray_dir, cam_rot.y);
    
    // Trace and write color
    Var color = trace_ray(cam_pos, ray_dir, voxels);
    image.write(uv, make_float4(color, 1.0f));
};

Exercises

Lighting: Add directional lighting with voxel self-shadowing
Materials: Add emission for some voxel types (glowing blocks)
Destruction: Add ability to remove voxels with mouse clicks
LOD: Implement level-of-detail for distant chunks

Black Hole Renderer

Experience the mind-bending visual effects of Einstein’s theory of general relativity! This example renders a Schwarzschild black hole with realistic gravitational lensing, an accretion disk, and relativistic Doppler effects—just like in the movie Interstellar.

What You’ll See

The Event Horizon: A perfectly black sphere where light cannot escape
Gravitational Lensing: Stars behind the black hole bend around it, creating a distorted “Einstein ring”
The Photon Sphere: A thin bright ring just outside the event horizon where light orbits the black hole
Accretion Disk: A swirling disk of superheated matter orbiting the black hole
Doppler Beaming: The side of the disk rotating toward you appears brighter (blueshifted), while the receding side appears dimmer (redshifted)
Left Mouse Drag: Orbit around the black hole
Right Mouse Drag: Roll/rotate the view (rotate around viewing direction)
Scroll Wheel or +/-: Zoom in/out

Watch how the gravitational lensing distorts the background starfield!

The Physics

Gravitational Lensing: Massive objects bend spacetime, causing light to curve. The bending angle follows:

acceleration = -1.5 * GM / r³ * position

(This is 2x Newtonian gravity as predicted by general relativity)

Photon Sphere: At r = 1.5 × Rs (Schwarzschild radius), light can orbit the black hole in unstable circular paths.

Accretion Disk Physics:

Temperature: Hotter near the center (~T ∝ r^(-3/4))
Doppler Effect: Approaching matter appears brighter/bluer
Gravitational Redshift: Light loses energy climbing out of the gravity well

Step-by-Step Implementation

Step 1: Ray Setup with Camera Transformation

// Camera position in orbit around black hole
Var cam_pos = make_float3(
    distance * sin(rot_y) * cos(rot_x),
    distance * sin(rot_x),  
    distance * cos(rot_y) * cos(rot_x)
);

// Ray direction through each pixel
Var ray_dir = normalize(forward + ndc.x * right * fov + ndc.y * up * fov);

Step 2: Gravitational Ray Marching

$for (step, max_steps) {
    Var r = length(pos);
    
    // Hit event horizon?
    $if (r < bh_radius) { $break; };
    
    // Gravitational bending (general relativity)
    Var accel = -1.5f * bh_mass / (r * r * r) * pos;
    dir = normalize(dir + accel * dt);
    
    // Step forward
    pos = pos + dir * dt;
};

Step 3: Accretion Disk Rendering

The accretion disk is a thin sheet in the XZ plane. To render it correctly with proper depth ordering, we detect when the ray crosses the plane and determine whether we’re hitting the front (visible) or back (hidden behind black hole) side:

// Track previous position to detect plane crossing
Var prev_y = cam_pos.y;

$for (step, max_steps) {
    // ... ray marching loop ...
    
    // Check if we crossed the disk plane (XZ plane at y=0)
    Var crossed_plane = (prev_y > 0.0f & pos.y <= 0.0f) | 
                        (prev_y < 0.0f & pos.y >= 0.0f);
    
    $if (crossed_plane & r > inner_radius & r < outer_radius) {
        // Interpolate to find exact intersection point
        Var t = abs(prev_y) / (abs(prev_y) + abs(pos.y));
        Var intersect_pos = prev_pos * (1.0f - t) + pos * t;
        Var intersect_r = length(intersect_pos);
        
        // Calculate disk color (temperature, Doppler beaming, etc.)
        // ... color calculation code ...
        
        // Determine if this is front or back side
        // Front side: ray moving toward the plane from camera
        Var moving_toward_plane = (cam_pos.y > 0.0f & pos.y < prev_y) | 
                                  (cam_pos.y < 0.0f & pos.y > prev_y);
        
        $if (moving_toward_plane) {
            // Front side - use this sample
            disk_color = local_disk_color;
            hit_disk = true;
        }
        $else {
            // Back side - only use if front wasn't hit
            // (handles edge case of camera inside disk radius)
            $if (!hit_disk) { disk_color = local_disk_color; };
        };
    };
    
    prev_y = pos.y;  // Update for next iteration
    // ... continue ray marching ...
};

#### Step 4: Background Starfield

```cpp
// Procedural stars for lensing demonstration
auto get_star_color = [&](Float3 rd) noexcept {
    Var star_noise = fract(sin(rd.x * 123.45f + ...) * 43758.5453f);
    Var star_brightness = ite(star_noise > 0.995f, star_noise, 0.0f);
    return make_float3(star_brightness);
};

Key Concepts

Geodesic Ray Tracing: Light follows curved paths in curved spacetime
Relativistic Doppler Effect: Motion toward/away affects brightness and color
Gravitational Redshift: Light loses energy escaping gravity
Photon Sphere: Where light can orbit the black hole
Depth-Ordered Rendering: The accretion disk must be rendered with proper front/back ordering so the back side (behind the black hole) doesn’t incorrectly show through the event horizon

Exercises

Spinning Black Hole: Add frame-dragging effects (Kerr metric) for a rotating black hole
Multiple Black Holes: Simulate a binary black hole system
Time Dilation: Visualize clock slowing near the event horizon
Gravitational Waves: Add ripples to the accretion disk from merging black holes

Summary

Congratulations! You’ve explored eleven fascinating GPU computing tutorials that demonstrate the breadth of what’s possible with LuisaCompute:

Visual Effects & Rendering

Mandelbrot: Mathematical beauty through iterative complex numbers
Path Tracer: Physically-based global illumination with ray tracing
Voxel Ray Tracer: Real-time ray marching through 3D voxel grids
Black Hole: Gravitational lensing and relativistic ray tracing (Interstellar-style)

Physics Simulations

MPM: Material Point Method for fluid/solid simulation
Wave Equation: Interactive water ripples with caustics rendering
N-Body: Gravitational galaxy simulation with 4,000+ particles

Pattern Formation & Cellular Automata

Game of Life: Classic cellular automata with emergent behavior
Reaction-Diffusion: Gray-Scott chemical patterns mimicking nature

Image Processing & Particles

Image Processing: Multi-pass Gaussian blur and edge detection
Fire Particles: 65,000 temperature-driven animated particles

Interactive Features Summary

Tutorial	Interaction
Wave Equation	Click & drag to create ripples
N-Body	Mouse drag to rotate, scroll/+/- to zoom
Game of Life	Watch evolution, R to reset
Fire	SPACE to toggle wind
Reaction-Diffusion	1-4 to switch patterns, R to reset
Voxel Ray Tracer	Arrow keys to rotate, W/S to zoom
Black Hole	Left drag=Orbit, Right drag=Roll, Scroll/+/-=Zoom

What You’ve Learned

Mandelbrot: Basic kernel structure, loops, and image output
Path Tracer: Ray tracing, acceleration structures, sampling
MPM: Complex simulation with multiple kernels, atomic operations
Game of Life: Cellular automata, ping-pong buffering
Wave Equation: PDE solving, interactive simulation
Image Processing: Multi-pass pipelines, convolution
N-Body: Particle systems, tile-based optimization
Fire Particles: Particle systems with lifecycle management and procedural noise
Reaction-Diffusion: Coupled PDEs and pattern formation
Voxel Ray Tracer: 3D grid traversal and ray-box intersection
Black Hole: Gravitational lensing, geodesic ray tracing, relativistic effects

Performance Tips

Use appropriate block sizes: 16×16 for 2D kernels, 256 for 1D
Minimize memory transfers: Keep data on GPU when possible
Batch operations: Use CommandList for multiple dispatches
Profile with tools: Use Nsight, RenderDoc, or vendor tools

Next Steps

Explore the LuisaCompute tests for more examples
Check LuisaRender for production rendering
Try implementing your own simulations!