Posts

Showing posts from July, 2025

Hello, Gemma! - Part 1: The Build

Hello, Gemma3n! Jetson + MatFormer + PLE Caching + Audio Input Bringing Google latest open-source model, Gemma3n, to NVIDIA Jetson Orin to enable on-device, live audio chat! Google Gemma 3n is nothing short of incredible! In addition to incredible multi-modal performance in a tiny, efficient package, they also managed to add multi-language audio input!!   Design Considerations:  Python + transformers + torch Audio input is so new, we'll have to leverage the latest tranformers package from HuggingFace to leverage it. jetson-containers To make things easy, I'll build a container with the latest transformers for the Jetson using  jetson-containers Piper for efficient on device text-to-speech   …video coming when I get a chance shrink it… Full build and details on GitHub:  GregariousEngineering/hello-gemma   Up Next! Wake word and query completion detection Internet access Remote LLMs 

🦙🦙🦙 Llama Panel 🦙🦙🦙

A multi-model agentic app to provide internet backed, consensus based answers from a configurable panel of LLMs (ReAct , Self- and Cross-Consistency) Open Source Greatness in Numbers Open Source LLMs have come a long way and can now challenge closed source models! Excitingly several Instruction Tuned open source models are now out and proving quite capable. Moreover, thanks to great improvements in efficiency and quantization, capable models are now small enough to run many on one system. Building on the idea that groups outperform individuals on various cognitive tasks, I set out to create at simple application to see if these latest models perform better as a group! Python Leading Llama Leading Llamas 🐍🦙🔎🦙🦙🦙 For simplicity, I built the app in Python on top of Ollama. A large capable model, ideally instruction tuned, leads the investigation, querying the panel of models and internet until a reasonable consensus is reached! Sample Run! gregarious@hal9000 : ~/llama_panel $ ./lla...