The most painful wheel to compile, now prebuilt and ready to install. Get both flash-attn 2.7.3 and 2.8.3 — guaranteed to work out of the box on A100 and L4 with Colab's current CUDA stack.
Latest stable release. Full Flash Attention 2 with improved kernel dispatch, better memory efficiency, and support for GQA and MQA patterns. Ideal for newer models like Z-Image, SDXL, and Flux.
Widely-used stable release. Compatible with the broadest range of models and diffusers versions. Perfect for TRELLIS.2, Stable Diffusion, and workflows that pin to 2.7.x.
| Spec | Guaranteed |
|---|---|
| GPU | A100 L4 |
| Platform | Google Colab linux x86_64 |
| Python | 3.12 |
| CUDA | 12.8 |
| PyTorch | 2.10 |
State-of-the-art text-to-image generation. Flash Attention 2.8.3 powers the efficient inference — generate 1024×1024 images in seconds on an L4.
Flash Attention 2.7.3 and 2.8.3, prebuilt for every Colab GPU. Five dollars. Zero compiling.