This repository contains the script pipeline.py
to transform YouTube videos into engaging, conversational avatars using Sieve APIs. The goal is to automate the process of repurposing video content into interactive dialogues between two talking avatars, ideal for storytelling, educational purposes, or creating dynamic presentations.
The pipeline.py
script achieves the following:
- Download YouTube Video: Extracts video content using the Sieve function.
- Summarize Content: Converts the video content into a conversational-style summary between two speakers.
- Text-to-Speech Conversion: Uses Sieve's TTS API to convert the summarized dialogue into speech.
- Talking Avatar Generation: Creates two distinct avatars to narrate the conversation using Sieve's portrait-avatar API.
- Merge Video Clips: Combines individual video segments into a final video using
ffmpeg
.
- Repurpose Content: Convert lengthy videos into bite-sized, conversational narratives.
- Interactive Presentations: Make content more engaging with avatars.
- Time Efficiency: Summarization saves time while retaining the core message.
- Creative Possibilities: Perfect for storytelling, education, or marketing.
- pipeline.py: the script implementing the 'video2dialogue' tool.
- parallelized_version/pipeline.py: It's efficient parallelized version.
- examples/ : contains a sample output video and summary text generated.
- Python 3.7+
- Sieve Python Client
- ffmpeg
-
Clone this repository:
git clone https://github.com/yourusername/video2dialogue.git cd video2dialogue
-
Install dependencies:
pip install sievedata
-
Authenticate with Sieve:
sieve login
-
Run the Script Ensure
pipeline.py
is in your project folder:Execute the pipeline with:
python pipeline.py
-
Output
The final video featuring talking avatars will be saved in your project directory. Logs and job statuses can be monitored on the Sieve dashboard.
The script follows these steps, with the main part outlined below:
-
Download YouTube Video:
youtube_to_mp4 = sieve.function.get("sieve/youtube_to_mp4") output_video = youtube_to_mp4.run(url, resolution="highest-available", include_audio=True)
-
Summarize as Conversation:
visual_summarizer = sieve.function.get("sieve/visual-qa") summary_as_conversation = visual_summarizer.run(output_video, prompt="Summarize into a dialogue between 2 people.", fps=1)
The use of an appropriate prompt is important.
-
Text-to-Speech and Avatar Generation:
tts = sieve.function.get("sieve/tts") portrait_avatar = sieve.function.get("sieve/portrait-avatar")
We run these sieve functions iteratively for each turn of the conversation to generate the corresponding avatar videos. For different spekers in the conversation, input different voices for the tts function and different avatar images for the portrait-avatar function .
- Merge Video Clips:
ffmpeg -f concat -safe 0 -i file_list.txt -c copy output.mp4
For a detailed explanation, follow the tutorial here.
For a complete working example, see the demo here.
Special thanks to Sieve for their powerful APIs that made this project possible.