Naunas' Newsletter
Posts
✅ Day 3 — Voice Input and Real-Time Categorization

✅ Day 3 — Voice Input and Real-Time Categorization

July 15, 2025

We built full voice input from scratch, meaning you can now:

Record your voice
Transcribe it instantly using OpenAI’s Whisper API
Get automatic tags and a group category from GPT
Save everything into Supabase — audio, text, tags, and metadata

This is how you take an idea from your brain, speak it out loud, and have AI organize it for you.

🎤 Step 1 — Add Voice Recording in the Browser

We used the Web Audio API and MediaRecorder to capture the mic stream from the browser.

You can start with this simple prompt:

plaintext

Create a fullscreen modal in React that records mic input using the browser. Show a waveform animation while recording. After stopping, convert the recording into a blob and upload it to Supabase. Display a success message once uploaded.

Implementation Tips

Use navigator.mediaDevices.getUserMedia({ audio: true }) to access the mic
Use MediaRecorder to capture the audio
Create a dynamic canvas or SVG to show a waveform if you want the UI to feel responsive

📍 If using Cursor, just type that prompt as a code comment and hit Ask Cursor

🔐 Step 2 — Set Up Supabase Storage for Audio

Inside your Supabase project:

Go to Storage in the left sidebar
Click New Bucket
Name it something like voice_uploads
Set the privacy to Public so you can play back audio later
Inside that bucket, audio files will be uploaded as .webm or .mp3 depending on your conversion

Make sure to install the Supabase client and use this pattern:

typescript

const { data, error } = await supabase.storage .from('voice_uploads') .uploadrecordings/${filename}.mp3, audioBlob)

📌 Tip: You can get the public URL using:

typescript

const { data } = supabase.storage .from('voice_uploads') .getPublicUrlrecordings/${filename}.mp3)

Then save that publicUrl in your database.

✍️ Step 3 — Save Audio URL and Transcript to Supabase

We added two new fields to our thoughts table:

Column	Type	Notes
`transcript`	text	From Whisper API
`audio_url`	text	Public file link from Supabase bucket

SQL to Add Fields (optional)

If you didn’t add them before:

sql

ALTER TABLE thoughts ADD COLUMN transcript text; ALTER TABLE thoughts ADD COLUMN audio_url text;

📍 In Supabase, go to SQL Editor → New Query, paste that in, then run it.

🧠 Step 4 — Transcribe Using Whisper API

We used OpenAI’s Whisper API to transcribe the audio before sending it to GPT for tagging.

Here's how to do it:

Convert your audio blob to a File object
Send it to the Whisper endpoint using a multipart/form-data request

typescript

const formData = new FormData() formData.append('file', audioFile) formData.append('model', 'whisper-1') const response = await fetch('https://api.openai.com/v1/audio/transcriptions', { method: 'POST', headers: { Authorization: Bearer ${OPENAI_API_KEY} }, body: formData }) const { text } = await response.json()

Now you’ve got your transcript stored in text.

🤖 Step 5 — Auto-Tag and Categorize with GPT

Now take that transcript and pass it to GPT to generate tags and group categories.

Prompt we used:

plaintext

You are an expert assistant for organizing thoughts. Take the following raw thought and return: - A comma-separated list of relevant tags - A one-word or short-phrase group that the thought belongs to Return only JSON in this format: { "tags": ["tag1", "tag2", "tag3"], "group": "example group" }

Save this data in Supabase by updating the row that matches the thought’s id.

✅ Day 3 Checklist

markdown

- [x] Create browser mic recorder in React - [x] Show waveform animation during recording - [x] Upload audio to Supabase bucket - [x] Store audio_url in thoughts table - [x] Transcribe audio using OpenAI Whisper API - [x] Store transcript field in Supabase - [x] Generate tags and group with GPT - [x] Update the thought row with AI results

💡 Tip — Debugging Audio and Whisper

If Whisper returns an empty result, double-check that you’re sending audio/webm or audio/mpeg
Convert your audio blob to mp3 if you’re running into format issues
Supabase will not play private audio links unless you set the bucket to Public

💭 Why We Did This

Typing is great for some ideas, but most people think out loud.

Voice capture unlocks speed. Transcription adds structure. GPT makes it usable.

With this flow, your brain can finally dump ideas naturally without worrying about structure — the AI will handle that part for you.

📩 Want the Code and Prompts?

Every step of this voice input flow is documented in the newsletter.

You’ll get:

Full browser mic recorder logic
Whisper API integration
Supabase upload and table setup
GPT tagging prompts
Cursor-ready comments to auto-generate each part

👉 Subscribe here for the full walkthrough