- Naunas' Newsletter
- Posts
- ✅ Day 3 — Voice Input and Real-Time Categorization
✅ Day 3 — Voice Input and Real-Time Categorization
We built full voice input from scratch, meaning you can now:
Record your voice
Transcribe it instantly using OpenAI’s Whisper API
Get automatic tags and a group category from GPT
Save everything into Supabase — audio, text, tags, and metadata
This is how you take an idea from your brain, speak it out loud, and have AI organize it for you.
🎤 Step 1 — Add Voice Recording in the Browser
We used the Web Audio API and MediaRecorder
to capture the mic stream from the browser.
You can start with this simple prompt:
plaintext
Create a fullscreen modal in React that records mic input using the browser. Show a waveform animation while recording. After stopping, convert the recording into a blob and upload it to Supabase. Display a success message once uploaded.
Implementation Tips
Use
navigator.mediaDevices.getUserMedia({ audio: true })
to access the micUse
MediaRecorder
to capture the audioCreate a dynamic canvas or SVG to show a waveform if you want the UI to feel responsive
📍 If using Cursor, just type that prompt as a code comment and hit Ask Cursor
🔐 Step 2 — Set Up Supabase Storage for Audio
Inside your Supabase project:
Go to Storage in the left sidebar
Click New Bucket
Name it something like
voice_uploads
Set the privacy to Public so you can play back audio later
Inside that bucket, audio files will be uploaded as
.webm
or.mp3
depending on your conversion
Make sure to install the Supabase client and use this pattern:
typescript
const { data, error } = await supabase.storage .from('voice_uploads') .uploadrecordings/${filename}.mp3, audioBlob)
📌 Tip: You can get the public URL using:
typescript
const { data } = supabase.storage .from('voice_uploads') .getPublicUrlrecordings/${filename}.mp3)
Then save that publicUrl
in your database.
✍️ Step 3 — Save Audio URL and Transcript to Supabase
We added two new fields to our thoughts
table:
Column | Type | Notes |
---|---|---|
| text | From Whisper API |
| text | Public file link from Supabase bucket |
SQL to Add Fields (optional)
If you didn’t add them before:
sql
ALTER TABLE thoughts ADD COLUMN transcript text; ALTER TABLE thoughts ADD COLUMN audio_url text;
📍 In Supabase, go to SQL Editor → New Query, paste that in, then run it.
🧠 Step 4 — Transcribe Using Whisper API
We used OpenAI’s Whisper API to transcribe the audio before sending it to GPT for tagging.
Here's how to do it:
Convert your audio blob to a
File
objectSend it to the Whisper endpoint using a
multipart/form-data
request
typescript
const formData = new FormData() formData.append('file', audioFile) formData.append('model', 'whisper-1') const response = await fetch('https://api.openai.com/v1/audio/transcriptions', { method: 'POST', headers: { Authorization: Bearer ${OPENAI_API_KEY} }, body: formData }) const { text } = await response.json()
Now you’ve got your transcript stored in text
.
🤖 Step 5 — Auto-Tag and Categorize with GPT
Now take that transcript and pass it to GPT to generate tags and group categories.
Prompt we used:
plaintext
You are an expert assistant for organizing thoughts. Take the following raw thought and return: - A comma-separated list of relevant tags - A one-word or short-phrase group that the thought belongs to Return only JSON in this format: { "tags": ["tag1", "tag2", "tag3"], "group": "example group" }
Save this data in Supabase by updating the row that matches the thought’s id
.
✅ Day 3 Checklist
markdown
- [x] Create browser mic recorder in React - [x] Show waveform animation during recording - [x] Upload audio to Supabase bucket - [x] Store audio_url in thoughts table - [x] Transcribe audio using OpenAI Whisper API - [x] Store transcript field in Supabase - [x] Generate tags and group with GPT - [x] Update the thought row with AI results
💡 Tip — Debugging Audio and Whisper
If Whisper returns an empty result, double-check that you’re sending
audio/webm
oraudio/mpeg
Convert your audio blob to
mp3
if you’re running into format issuesSupabase will not play private audio links unless you set the bucket to Public
💭 Why We Did This
Typing is great for some ideas, but most people think out loud.
Voice capture unlocks speed. Transcription adds structure. GPT makes it usable.
With this flow, your brain can finally dump ideas naturally without worrying about structure — the AI will handle that part for you.
📩 Want the Code and Prompts?
Every step of this voice input flow is documented in the newsletter.
You’ll get:
Full browser mic recorder logic
Whisper API integration
Supabase upload and table setup
GPT tagging prompts
Cursor-ready comments to auto-generate each part