I help my clients acquire new users and make more money with their web businesses. I have ten years of experience with SaaS projects. If that’s something you need help with, we should get in touch!
< Back to article list

Automatic transcription of audio/video files

The problem: we have a video or an audio file, and we’d like a text file that contains a transcript of what is said.

There are many online solutions, but I’m a bit scared of sending some audio or video files to random services, so I wrote a bit of code to use Whisper’s service, from OpenAI .

It can be argued that the privacy issues are not solved with OpenAI, but oftentimes the providers are wrappers around OpenAI so I’d rather cut the middleman.

Whisper can only process chunks that weigh less than 25Mb, so we need to use a compressed file format. Wav do not work for long audio. In my experience, we have some space ; 1h video meeting turns into a 12Mb mp3 file.

Here is how to extract the audio part from a video recording:

ffmpeg -i some-video.mp4 -vn -ar 16000 -ac 1 -f mp3 audio.mp3

Then, we can write a little piece of code in order to do the audio extraction:

pip install python-dotenv openai
import openai, os, time
import readline
import atexit
from dotenv import load_dotenv
import getpass

openai.api_key = os.getenv("OPENAI_API_KEY")
audio_file= open("./audio.mp3", "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file, response_format="text",