CactusCactus
On-device AIDocsBlogTalk to us
Sign inGet Started

[NEW]|Free cloud fallback for the month of March

Get started

Cactus Blog

Deep dives into on-device AI, inference optimization, and the engineering behind Cactus.

Latest
TranscriptionHybrid AI

Sub-150ms Transcription with Cloud-Level Accuracy: Why We Built a Hybrid Engine

How Cactus combines on-device and cloud inference for real-time speech transcription to achieve sub-150ms latency and handle noisy audio.

RS

Roman Shemet

|February 27, 2026|5 min read
ModelsApplications

The Sweet Spot for Mac Code Use: Reviewing LFM2 24B MoE A2B with Cactus

LFM2-24B-A2B features 24B total parameters but only activates 2B during inference. We break down the MoE architecture, GQA, gated convolutions, and show how to run it locally with Cactus.

Noah Cylich & Henry Ndubuaku|February 24, 2026|10 min read
CactusCactus

Hybrid inference for modern applications.

Product

  • Features
  • Pricing
  • Changelog

Company

  • Contact

© 2026 Cactus Compute. All rights reserved.