How to Convert PDF to Audiobook Using OpenAI’s Text-to-Speech API | Anurag Kumar | Coins | November 2023

By Crypto Gloom On Nov 30, 2023

Considering that the OpenAI text-to-speech API has a character limit of 4096 characters, let’s create the following function: split_text It is designed to break organized text into smaller chunks. Each chunk adheres to a maximum character limit to ensure compatibility with the API. The process is as follows:

This function splits text into sentences.
Then, when you add another sentence, it will repeatedly add sentences to the chunk until you exceed the character limit.
When the limit is approached, the current chunk is saved and a new chunk starts from the next sentence.
This process continues until all sentences are assigned to chunks.

def split_text(text, max_chunk_size=4096):
chunks = ()  # List to hold the chunks of text
current_chunk = ""  # String to build the current chunk# Split the text into sentences and iterate through them
for sentence in text.split('.'):
sentence = sentence.strip()  # Remove leading/trailing whitespaces
if not sentence:
continue  # Skip empty sentences
# Check if adding the sentence would exceed the max chunk size
if len(current_chunk) + len(sentence) + 1 <= max_chunk_size:
current_chunk += sentence + "."  # Add sentence to current chunk
else:
chunks.append(current_chunk)  # Add the current chunk to the list
current_chunk = sentence + "."  # Start a new chunk
# Add the last chunk if it's not empty
if current_chunk:
chunks.append(current_chunk)
return chunks

# Function Usage
chunks = split_text(plain_text)# Printing each chunk with its number
for i, chunk in enumerate(chunks, 1):
print(f"Chunk i:\nchunk\n---\n")

to the nexttext_to_speech This feature converts text to audio by leveraging OpenAI’s text-to-speech API. This function performs the following steps:

Initialize the OpenAI client to interact with the API.
Sends a request to the audio API using the specified text, model, and voice parameters. that much model While parameters define the quality of text-to-speech, voice The parameter selects the tone type.
Receives an audio response from the API and streams it to the specified output file.

⚠️ I specified my OpenAI API key in my environment variables. Otherwise, you will need to provide the client with an OpenAI API key.

# Importing necessary modules
from pathlib import Path
import openaidef text_to_speech(input_text, output_file, model="tts-1-hd", voice="nova"):
# Initialize the OpenAI client
client = openai.OpenAI()
# Make a request to OpenAI's Audio API with the given text, model, and voice
response = client.audio.speech.create(
model=model,      # Model for text-to-speech quality
voice=voice,      # Voice type
input=input_text  # The text to be converted into speech
)
# Define the path for the output audio file
speech_file_path = Path(output_file)
# Stream the audio response to the specified file
response.stream_to_file(speech_file_path)
# Print confirmation message after saving the audio file
print(f"Audio saved to speech_file_path")

Convert text chunks to audio files

Let’s define it convert_chunks_to_audio Process each chunk of text through a function. text_to_speech Performs the function and saves the resulting audio file. The steps are as follows:

Repeat chunks of text.
For each chunk, it generates a file name for the output audio file so that it is saved to the specified output folder.
Convert each text chunk to an audio file using: text_to_speech Function defined previously.
Saves the path to each created audio file in the list.

# Importing necessary modules
import os
from pydub import AudioSegmentdef convert_chunks_to_audio(chunks, output_folder):
audio_files = ()  # List to store the paths of generated audio files
# Iterate over each chunk of text
for i, chunk in enumerate(chunks):
# Define the path for the output audio file
output_file = os.path.join(output_folder, f"chunk_i+1.mp3")
# Convert the text chunk to speech and save as an audio file
text_to_speech(chunk, output_file)
# Append the path of the created audio file to the list
audio_files.append(output_file)
return audio_files  # Return the list of audio file paths

# Function Usage
output_folder = "chunks"  # Define the folder to save audio chunks
audio_files = convert_chunks_to_audio(chunks, output_folder)  # Convert chunks to audio files
print(audio_files) # print list of all the audio files generated

Note: Make sure the folder exists before running the code. For our example chunks.

that much combine_audio_with_moviepy The feature combines multiple audio clips into a single audio file using: moviepy library. This feature follows these steps:

Iterates through and filters files in a specified folder. .mp3 file.
For each audio file AudioFileClip Select an object and add it to the list.
Once all audio clips have been collected concatenate_audioclips Merge them into one continuous audio clip.
Writes the combined clips to the output file.

# Importing necessary modules from moviepy
from moviepy.editor import concatenate_audioclips, AudioFileClip
import osdef combine_audio_with_moviepy(folder_path, output_file):
audio_clips = ()  # List to store the audio clips
# Iterate through each file in the given folder
for file_name in sorted(os.listdir(folder_path)):
if file_name.endswith('.mp3'):
# Construct the full path of the audio file
file_path = os.path.join(folder_path, file_name)
print(f"Processing file: file_path")
try:
# Create an AudioFileClip object for each audio file
clip = AudioFileClip(file_path)
audio_clips.append(clip)  # Add the clip to the list
except Exception as e:
# Print any errors encountered while processing the file
print(f"Error processing file file_path: e")
# Check if there are any audio clips to combine
if audio_clips:
# Concatenate all the audio clips into a single clip
final_clip = concatenate_audioclips(audio_clips)
# Write the combined clip to the specified output file
final_clip.write_audiofile(output_file)
print(f"Combined audio saved to output_file")
else:
print("No audio clips to combine.")

# Function Usage
combine_audio_with_moviepy('chunks', 'combined_audio.mp3')  # Combine audio files in 'chunks' folder

I created an image in Canva that I would render as a video while audio played in the background.

that much create_mp4_with_image_and_audio The feature combines images and audio files to create MP4 video. This can be especially useful for presentations or other scenarios where the audio track must be accompanied by static images, such as YouTube videos. This function performs the following steps:

Load the audio file as follows: AudioFileClip.
Create a video clip from a specified image using: ImageClipSet the duration to match the length of the audio.
Sets the frames per second (fps) of the video clip.
Assigns an audio clip to the video clip’s audio track.
Specifies the video and audio codecs to write the final video clip to the output file.

from moviepy.editor import AudioFileClip, ImageClipdef create_mp4_with_image_and_audio(image_file, audio_file, output_file, fps=24):
# Load the audio file
audio_clip = AudioFileClip(audio_file)
# Create a video clip from an image
video_clip = ImageClip(image_file, duration=audio_clip.duration)
# Set the fps for the video clip
video_clip = video_clip.set_fps(fps)
# Set the audio of the video clip as the audio clip
video_clip = video_clip.set_audio(audio_clip)
# Write the result to a file
video_clip.write_videofile(output_file, codec='libx264', audio_codec='aac')
# Example usage
image_file = 'cover_image.png'  # Replace with the path to your image
audio_file = 'combined_audio.mp3'      # The combined audio file
output_file = 'output_video.mp4'       # Output MP4 file
create_mp4_with_image_and_audio(image_file, audio_file, output_file)

And that’s it. Once this code has finished running, your audiobook will be created. The output for the example code above would be: