Wednesday, July 16, 2025

Hello, Gemma! - Part 1: The Build

Hello, Gemma3n!

Jetson + MatFormer + PLE Caching + Audio Input

Bringing Google latest open-source model, Gemma3n, to NVIDIA Jetson Orin to enable on-device, live audio chat!


Google Gemma 3n is nothing short of incredible! In addition to incredible multi-modal performance in a tiny, efficient package, they also managed to add multi-language audio input!!

 

Design Considerations: 

  • Python + transformers + torch
    • Audio input is so new, we'll have to leverage the latest tranformers package from HuggingFace to leverage it.
  • jetson-containers
    • To make things easy, I'll build a container with the latest transformers for the Jetson using jetson-containers
  • Piper for efficient on device text-to-speech 

 …video coming when I get a chance shrink it…

Full build and details on GitHub: GregariousEngineering/hello-gemma 

Up Next!

  • Wake word and query completion detection
  • Internet access
  • Remote LLMs 


No comments:

Post a Comment

BrainBit Flappy Bird! No Coding Edition

Gemini 3 Pro + Antigravity Google released a multi-model that excels at agentic coding, and a new IDE to go with it! Naturally, I need to te...