The problem: we have a video or an audio file, and we’d like a text file that contains a transcript of what is said.
There are many online solutions, but I’m a bit scared of sending some audio or video files to random services, so I wrote a bit of code to use Whisper’s service, from OpenAI .
It can be argued that the privacy issues are not solved with OpenAI, but oftentimes the providers are wrappers around OpenAI so I’d rather cut the middleman.
Whisper can only process chunks that weigh less than 25Mb, so we need to use a compressed file format. Wav do not work for long audio. In my experience, we have some space ; 1h video meeting turns into a 12Mb mp3 file.
Here is how to extract the audio part from a video recording:
ffmpeg -i some-video.mp4 -vn -ar 16000 -ac 1 -f mp3 audio.mp3
Then, we can write a little piece of code in order to do the audio extraction:
pip install python-dotenv openai
import openai, os, time import readline import atexit from dotenv import load_dotenv import getpass load_dotenv() openai.api_key = os.getenv("OPENAI_API_KEY") audio_file= open("./audio.mp3", "rb") transcript = openai.Audio.transcribe("whisper-1", audio_file, response_format="text", language="fr") print(transcript)