live · guardian armed NIT codeday 2026v0.1 · open source · MIT

A voice agent for the Mac.

by Adeeb Bashir · 11th grade · Crescent Public School

Built so blind users can run their laptop hands-free. Speak. Your Mac does it. Speaks back in your cloned voice. End-to-end in ~2.3 seconds.

25 voice-callable tools 3 languages · en · हिन्दी · کٲشُر Solo build · 24 days
~2.3s
end-to-end voice round-trip
25
tools the agent can call
3
languages: en · hi · ks
3.5k
lines of Python
the process

From your voice, back to your voice.

Seven stages. All local where it matters. End-to-end in ~2.3 seconds.

on-device cloud hybrid
mic capture
01
sounddevice · 16 kHz · mono PCM
on-devicelive
streaming
VAD
02
silero · webrtc · 30 ms windows
on-device30ms
30ms chunk
speech-to-text
03
faster-whisper · base · INT8 · multilingual
hybrid0.4s
transcript
LLM
04
gemini-2.5-flash · openrouter · tool loop
cloud0.9s
tool_call
tool exec
05
25 tools · applescript · cliclick
on-device1.6s
tool result
text-to-speech
06
elevenlabs turbo v2.5 · your cloned voice
cloud0.5s
audio bytes
playback
07
afplay · streamed · mp3 chunks
on-devicestream
latency budget
total ~2.3s
STT
LLM
tool
TTS
STT
groq whisper
400ms
LLM
gemini 2.5 flash
900ms
tool
applescript · find
600ms
TTS
elevenlabs turbo
500ms
end-to-end trace · 3.4s total · live example
YOU →
"find the pdf about kidneys and open it"
STT (Groq, 0.4s) →
transcript: "find the pdf about kidneys and open it"
BRAIN (Gemini 2.5 Flash, 0.9s) →
tool_call: find(query="kidneys", kind="pdf", open_it=true)
TOOL (mdfind + BGE rerank, 1.6s) →
top match: kidney_report_2024.pdf (0.83) — opened.
TTS (ElevenLabs, 0.5s) →
"opened kidney report."
who it's for

Built for the people who need it.

Accessibility-first. Screen readers tell you what's there — bigpi does it for you.

Blind users

Screen readers tell you what's there. bigpi does it for you. Voice in, action out, voice back.

Mobility impaired

Hands-free Mac from bed, across the room, or a Telegram voice note when you're not even home.

Hands-busy days

Cooking, driving, bandaged, multitasking. Anyone whose hands aren't on the keyboard right now.

the stack

Eight models. One agent.

Each layer is best-in-class for what it does. Nothing here is decoration.

G
Google via OpenRouter

Gemini 2.5 Flash

The brain. Tool-calling loop, multilingual, fast.

used foragent reasoning
11
ElevenLabs

ElevenLabs Turbo v2.5

The voice. My actual voice, cloned and streamed.

used forvoice synthesis
Gq
Groq

Whisper Large v3 Turbo

Cloud STT for Hindi & Kashmiri voice notes.

used formultilingual STT
FW
SYSTRAN · OpenAI

faster-whisper base

Local STT on Apple Silicon. Sub-second English.

used forlocal STT
BG
DeepInfra

BGE base-en-v1.5

Semantic file search reranker — the `find` tool.

used forfile embeddings
Br
Brave

Brave Search API

Web search backbone. DDG fallback when rate-limited.

used forweb search
Mc
macOS

AppleScript + cliclick

The hands. Drives every Mac app from the terminal.

used forsystem control
Tg
Telegram

Telegram Bot API

Remote bridge. Send a voice note, your Mac obeys.

used forremote bridge
Glued together with Python, ffmpeg, and ~3.5k lines of code.
No frameworks. No agents libraries. The whole loop is hand-written so it stays under 2.3s.
25 tools wired

Everything bigpi can actually do.

Every tool is one LLM tool-call away. Mix and match in a single voice note.

Vision·4Find & files·2Communication·6Productivity·4Web·3System·6
Vision
4/4 wired
see
webcam describer
read
OCR via webcam
describe_screen
narrate window
read_screen
targeted Q&A
Find & files
2/2 wired
find
semantic search
search_files
mdfind keywords
Communication
6/6 wired
compose_email
polished draft
send_email
Gmail compose
send_imessage
iMessage
send_whatsapp
WhatsApp
send_discord_dm
Discord DM
lookup_contact
Contacts.app
Productivity
4/4 wired
calendar_today
today's events
calendar_add
voice schedule
reminder_add
quick capture
get_weather
umbrella verdict
Web
3/3 wired
web_search
Brave / DDG
web_fetch
URL → text
web_ask
mini-Perplexity
System
6/6 wired
run_on_mac
any shell cmd
type_text
into focused field
open_url
any URL
open_app
launch Mac app
play_apple_music
Music.app
play_spotify
Spotify
25
tools
voice-callable
3
languages
en · हिन्दी · کٲشُر
~2.3s
latency
end-to-end
the builder

Solo. 24 days. Built by a high-schooler.

hi, i'm

Adeeb Bashir

11th gradeCrescent Public School logoCrescent Public School NIT Codeday 2026

I built bigpi alone over 24 days — between school and homework — because screen readers aren't enough. My grandfather needed someone to actually do things on his laptop, not just read them. So I made that someone. Architected the full system, paired with Claude Code on boilerplate. 52 commits. ~3,500 lines of Python.

11
grade
24
days
52
commits
3.5k
lines

"Screen readers tell you what's there. I wanted something that does it for you."

— adeeb, 11th grade
live demo · try it now

Scan. Speak. Watch my Mac respond.

Send my Telegram bot a voice note from your phone. My Mac runs bigpi and replies back in my cloned voice. Guardian terminates on anything suspicious.

step 1 · scan
Open my Telegram bot
t.me/bhrh2hr9u50985904oiwjhsfgbbot
step 2 · speak
bigpi bot
online · armed
0:04
you · 2:14 pm
"opened kidney report."
bigpi · 2:14 pm
0:02
cloned voice · 2:14 pm
Send a voice note
en · हिन्दी · کٲشُر · <90s
01
Scan / open bot
@bhrh2hr9u…bot
02
Send /auth <pin>
owner-paired · guardian
03
Voice note in
voice note out · cloned
things people actually ask
"what's on my screen"
"read this label"
"find my resume"
"text mom — running late"
"weather, umbrella?"
"calendar today"
"draft a thank-you email"
"play sad music"
live now owner-only · guardian armed multilingual