OpenAI audio transcription and synthesis in React Native

I was recently (Jan 2025) building a React Native app that recorded a users voice, past that audio to OpenAI for transcription so it could be processed, and then later passed the resulting processed text to Open AI for speech synthesis.

It worked fine when testing in the browser, however, I ran into problems when I ran it on an Android phone — Where posting and accessing the audio files proved more complicated than expected.

This was one of those moments where Cursor AI (In this case Claude), tells you an answer that’s wrong, then constantly changes its answer, forgets what it’s already tried previously and suggests it again later on — And occasionally gives you the same approach you had before asking it in the first place. One of those situations where the AI has no damn I idea but will pretend that it does.

Here’s some code that describes the issue and how I eventually got it solved.

Bluesky
Threads
Twitter / X
Mastodon
Instagram

Recording the audio

I won’t go into this in detail as it’s using the well documented Expo AV module.

await recording.stopAndUnloadAsync();

Transcription on Web

This was easy enough. I used the standard Open AI functions from the npm openai module.

import OpenAI from "openai";

//////////////////////////

// Reference my API key and set up openAI (In this case in the front end)
const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });

//////////////////////////

// Using the recording stopped above
const uri = recording.getURI();
const response = await fetch(uri);
const audioBlob = await response.blob();

const audioFile = new File([audioBlob], 'audio.mp3', { type: 'audio/mpeg' });
if (!audioFile) throw new Error('No audio file available');

const response = await openai.audio.transcriptions.create({
    model: "whisper-1",
    language: "en",
    file: audioFile,
});

// return response.text

Transcription on Android

To pass in an audio file that Open AI would accept, I had to use an uploadAsync call through expo’s file system module to manually post it. The standard Open AI function above wouldn’t work on Android for me.

import OpenAI from "openai";
import * as FileSystem from 'expo-file-system';

//////////////////////////

const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });
const openaiUrl = "https://api.openai.com/v1/audio/transcriptions"

//////////////////////////

// Using the recording stopped above
const audioUri = recording.getURI();

const response = await FileSystem.uploadAsync(
    openaiUrl,
    audioUri,
    {
        headers: {
            Authorization: `Bearer ${apiKey}`,
        },

        // Options specifying how to upload the file.
        httpMethod: 'POST',
        uploadType: FileSystem.FileSystemUploadType.MULTIPART,
        fieldName: 'file',
        mimeType: 'audio/mpeg',
        parameters: {
            model: 'whisper-1',
        },
    }
);

const responseBody = JSON.parse(response.body)

// return responseBody.text;

After getting the transcription and processing it successfully, I then had to deal with Open AI and audio files again…

Synthesising speech

This time the code is a little more complete. I’ve wrapped it in a function and shown the if statement that branches based off the platform.

import OpenAI from "openai";
import { Audio } from 'expo-av';
import { Platform } from "react-native";
import * as FileSystem from 'expo-file-system';
import { Buffer } from 'buffer';

////////////////////////////////////////////////////////////

const apiKey = process.env.EXPO_PUBLIC_OPENAI_API_KEY;
const openai = new OpenAI({ apiKey: apiKey, dangerouslyAllowBrowser: true });

////////////////////////////////////////////////////////////

export async function synthesizeSpeech(text: string): Promise<void> {
    try {

        // This asks openAI to convert to the text to audio
        const response = await openai.audio.speech.create({
            model: "tts-1",
            voice: "alloy",
            input: text,
        });

        // We have to treat web and Android differently
        if (Platform.OS === 'web') {

            const audioBlob = new Blob([await response.arrayBuffer()], {
                type: 'audio/mp3'
            });
            const audioUri = URL.createObjectURL(audioBlob);
            const { sound } = await Audio.Sound.createAsync({ uri: audioUri });
            await sound.playAsync();


        } else {
            // I tested this on Android, but it may be the same for iOS

            // Download the audio data first
            const audioData = await response.arrayBuffer();
            const base64Audio = Buffer.from(audioData).toString('base64');
            // Save to a temporary file
            const fileUri = `${FileSystem.documentDirectory}temp-audio.mp3`;
            await FileSystem.writeAsStringAsync(fileUri, base64Audio, {
                encoding: FileSystem.EncodingType.Base64,
            });
            // Play the local file
            const { sound } = await Audio.Sound.createAsync(
                { uri: fileUri },
                { shouldPlay: false }
            );
            await sound.playAsync();
        }


    } catch (error) {
        console.error('Error in synthesizeSpeech:', error);
        throw error;
    }
}

A Buffer gotcha (Or got me, anyway)

Note that Buffer is imported. You typing system probably won’t show an error if you don’t, but you need to npm install buffer and import it at the top of the file for mobile.

Some of Cursors suggested alternatives

Using the URL from the response data

Cursor suggested this and I thought it had become way easier. The response data provides a URL!!! Unfortunately, this means that you’re trying to download the audio file directly from OpenAI’s servers, which, even if you provide your API Key, isn’t allowed.

// Can't do this because it tries to download from the open AI servers
const { sound } = await Audio.Sound.createAsync(
     {
         uri: response.url,
         // Even with this it's not allowed
         // headers: { 'Authorization': `Bearer ${apiKey}` }
      },
     { shouldPlay: false }
 );
 await sound.playAsync();

Using response.blob

When Cursor told me I couldn’t use arrayBuffer on Android, it suggested using the blob straight from the response. Unfortunately, you still need to turn it into a URI, and createObjectURL can’t be used either.

// This won't work either, because createObjectURL can't be used on Android.
const audioData = await response.blob();
const audioUri = URL.createObjectURL(audioData);
const { sound } = await Audio.Sound.createAsync({ uri: audioUri });
await sound.playAsync();

There may be cleaner was to do this, but while figuring this out, I couldn’t find any useful information online, so hopefully this post helps someone else.

Thanks…

I also dissect and speculate on design and development.
Digging into subtle details and implications, and exploring broad perspectives and potential paradigm shifts.
Check out my conceptual articles on Substack or find my latest below.


You can also find me on Threads, Bluesky, Mastodon, or X for more diverse posts about ongoing projects.

My latest articles

Staging XR scenes (Keep doing your crappy little drawings)

Some people create beautiful perspective illustrations to visualise and storyboard their virtual reality designs And it’s tempting to think you’re not a strong designer if you’re not doing that too…

Focal point blocking for XR media

Planning out a linear VR experience requires thinking about where the viewers attention might be. Thinking about the focal points…

Designing immersive experiences

In traditional cinema, TV, or even the more modern phone screen, there’s limited screen real-estate. But removing that limitation creates a design problem…

The future is not prompt engineered

Let’s not pretend the importance of prompt engineering is ubiquitous. The most prevalent power of generative AI is in the way it adapts to us, not the other way around…

The typography of dates, times, & filenames

A deep dive into carefully considered date formatting, line length and general typography attributes of filenames…

Loosening the Shackles of Rapid Authoring Tools

Rapid authoring tools like Articulate Storyline and Evolve Authoring make sharing projects possible across a team of non-programmers, but your design must often adapted to the limited range of possibilities the tool allows…
Bluesky
Threads
Twitter / X
Mastodon
Instagram


Author:

Date:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.