Wave Lang Compiler Stack

Understanding the technology behind Wave Lang's high-performance GPU code generation. From Python DSL to optimized kernels.

Compilation Pipeline

Wave Lang transforms your Python code through multiple optimization stages to generate efficient GPU kernels.

🐍

Python DSL

Write high-level tensor operations using familiar Python syntax with type hints and decorators.

🔍

Analysis

Parse and analyze the computation graph, inferring shapes, data flow, and optimization opportunities.

⚙️

IR Generation

Generate IREE's intermediate representation with hardware-agnostic optimizations applied.

GPU Kernel

Generate optimized machine code that executes efficiently on your target GPU architecture.

Architecture Overview

Wave Lang Compiler Stack

Wave Lang Python DSL
High-level tensor operations with Python syntax
Kernel Logic Separation
Pure computational logic separated from scheduling concerns
Constraint-Based Scheduling
Declarative constraints define tiling, memory layout, and parallelization
IREE Compiler Infrastructure
MLIR-based optimization passes and code generation
Target Backends
CUDA, ROCm, Vulkan, and CPU code generation
Runtime System
Efficient kernel execution and memory management

The Wave Lang Philosophy: Separation of Concerns

What makes Wave Lang truly special is how it separates kernel logic from scheduling and tiling concerns, making GPU programming both simpler and more fun.

🧠

Pure Kernel Logic

Write clean, mathematical expressions that focus purely on the computation you want to perform. No need to think about thread blocks, shared memory, or memory coalescing patterns.

⚙️

Declarative Constraints

Specify how you want the computation scheduled through simple constraint objects. Control tiling, memory hierarchy, and parallelization without mixing it with your algorithm.

🎨

Easy Experimentation

Try different scheduling strategies by simply changing constraint parameters. No need to rewrite your kernel logic - the same mathematical code works with any scheduling approach.

🔄

Portable Performance

The same kernel logic can be optimized for different hardware by adjusting constraints. Move from development to production, or AMD to NVIDIA, with just constraint changes.

Traditional vs Wave Lang Approach

❌ Traditional C

Kernel logic mixed with:
• Thread indexing calculations
• Shared memory management
• Coalescing optimizations
• Block size considerations
• Hardware-specific tuning

✅ Wave Lang

Kernel: Pure math expressions
Constraints: Scheduling decisions
• WorkgroupConstraint
• TilingConstraint
• HardwareConstraint
• WaveConstraint

Built on Proven Technology

Wave Lang leverages industry-leading compiler infrastructure for maximum performance and reliability.

MLIR
Multi-Level IR Framework
IREE
Runtime & Compiler
PyTorch
Tensor Interface
LLVM
Code Generation
ROCm
AMD GPU Support

How It Works

🧠

Symbolic Computation

Wave Lang uses symbolic variables to represent tensor dimensions, enabling compile-time optimization and automatic kernel specialization based on actual input shapes.

🔄

Graph Optimization

The compiler analyzes the entire computation graph to identify fusion opportunities, eliminate redundant operations, and optimize memory access patterns.

🎪

Automatic Tiling

Wave Lang automatically determines optimal tile sizes for different operations based on hardware characteristics and memory constraints of the target GPU.

⚖️

Load Balancing

Smart work distribution ensures all GPU compute units are utilized efficiently, minimizing idle time and maximizing throughput.

Want to learn more?

Dive deeper into Wave Lang's compiler architecture and optimization techniques.

Read Documentation View Source Code