OpenAI audio transcription and synthesis in React Native
I was recently (Jan 2025) building a React Native app that recorded a users voice, past that audio to OpenAI for transcription so it could be processed, and then later passed the resulting processed text to Open AI for speech synthesis.
It worked fine when testing in the browser, however, I ran into problems when I ran it on an Android phone — Where posting and accessing the audio files proved more complicated than expected.
This was one of those moments where Cursor AI (In this case Claude), tells you an answer that’s wrong, then constantly changes its answer, forgets what it’s already tried previously and suggests it again later on — And occasionally gives you the same approach you had before asking it in the first place. One of those situations where the AI has no damn I idea but will pretend that it does.
Here’s some code that describes the issue and how I eventually got it solved.
Recording the audio
I won’t go into this in detail as it’s using the well documented Expo AV module.
await recording.stopAndUnloadAsync();
Transcription on Web
This was easy enough. I used the standard Open AI functions from the npm openai module.
import OpenAI from "openai";
//////////////////////////
// Reference my API key and set up openAI (In this case in the front end)
const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });
//////////////////////////
// Using the recording stopped above
const uri = recording.getURI();
const response = await fetch(uri);
const audioBlob = await response.blob();
const audioFile = new File([audioBlob], 'audio.mp3', { type: 'audio/mpeg' });
if (!audioFile) throw new Error('No audio file available');
const response = await openai.audio.transcriptions.create({
model: "whisper-1",
language: "en",
file: audioFile,
});
// return response.text
Transcription on Android
To pass in an audio file that Open AI would accept, I had to use an uploadAsync call through expo’s file system module to manually post it. The standard Open AI function above wouldn’t work on Android for me.
import OpenAI from "openai";
import * as FileSystem from 'expo-file-system';
//////////////////////////
const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });
const openaiUrl = "https://api.openai.com/v1/audio/transcriptions"
//////////////////////////
// Using the recording stopped above
const audioUri = recording.getURI();
const response = await FileSystem.uploadAsync(
openaiUrl,
audioUri,
{
headers: {
Authorization: `Bearer ${apiKey}`,
},
// Options specifying how to upload the file.
httpMethod: 'POST',
uploadType: FileSystem.FileSystemUploadType.MULTIPART,
fieldName: 'file',
mimeType: 'audio/mpeg',
parameters: {
model: 'whisper-1',
},
}
);
const responseBody = JSON.parse(response.body)
// return responseBody.text;
After getting the transcription and processing it successfully, I then had to deal with Open AI and audio files again…
Synthesising speech
This time the code is a little more complete. I’ve wrapped it in a function and shown the if statement that branches based off the platform.
import OpenAI from "openai";
import { Audio } from 'expo-av';
import { Platform } from "react-native";
import * as FileSystem from 'expo-file-system';
import { Buffer } from 'buffer';
////////////////////////////////////////////////////////////
const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });
////////////////////////////////////////////////////////////
export async function synthesizeSpeech(text: string): Promise<void> {
try {
// This asks openAI to convert to the text to audio
const response = await openai.audio.speech.create({
model: "tts-1",
voice: "alloy",
input: text,
});
// We have to treat web and Android differently
if (Platform.OS === 'web') {
const audioBlob = new Blob([await response.arrayBuffer()], {
type: 'audio/mp3'
});
const audioUri = URL.createObjectURL(audioBlob);
const { sound } = await Audio.Sound.createAsync({ uri: audioUri });
await sound.playAsync();
} else {
// I tested this on Android, but it may be the same for iOS
// Download the audio data first
const audioData = await response.arrayBuffer();
const base64Audio = Buffer.from(audioData).toString('base64');
// Save to a temporary file
const fileUri = `${FileSystem.documentDirectory}temp-audio.mp3`;
await FileSystem.writeAsStringAsync(fileUri, base64Audio, {
encoding: FileSystem.EncodingType.Base64,
});
// Play the local file
const { sound } = await Audio.Sound.createAsync(
{ uri: fileUri },
{ shouldPlay: false }
);
await sound.playAsync();
}
} catch (error) {
console.error('Error in synthesizeSpeech:', error);
throw error;
}
}
A Buffer gotcha (Or got me, anyway)
Note that Buffer is imported. You typing system probably won’t show an error if you don’t, but you need to npm install buffer
and import it at the top of the file for mobile.
Some of Cursors suggested alternatives
Using the URL from the response data
Cursor suggested this and I thought it had become way easier. The response data provides a URL!!! Unfortunately, this means that you’re trying to download the audio file directly from OpenAI’s servers, which, even if you provide your API Key, isn’t allowed.
// Can't do this because it tries to download from the open AI servers
const { sound } = await Audio.Sound.createAsync(
{
uri: response.url,
// Even with this it's not allowed
// headers: { 'Authorization': `Bearer ${apiKey}` }
},
{ shouldPlay: false }
);
await sound.playAsync();
Using response.blob
When Cursor told me I couldn’t use arrayBuffer on Android, it suggested using the blob straight from the response. Unfortunately, you still need to turn it into a URI, and createObjectURL can’t be used either.
// This won't work either, because createObjectURL can't be used on Android.
const audioData = await response.blob();
const audioUri = URL.createObjectURL(audioData);
const { sound } = await Audio.Sound.createAsync({ uri: audioUri });
await sound.playAsync();
There may be cleaner was to do this, but while figuring this out, I couldn’t find any useful information online, so hopefully this post helps someone else.
Thanks…
I also dissect and speculate on design and development.
Digging into subtle details and implications, and exploring broad perspectives and potential paradigm shifts.
Check out my conceptual articles on Substack or find my latest below.
You can also find me on Threads, Bluesky, Mastodon, or X for more diverse posts about ongoing projects.
Leave a Reply