A Flask web application that automatically transcribes YouTube videos to text using OpenAI's Whisper model. The tool provides real-time progress tracking, timestamps in transcriptions, and a clean interface for managing transcripts.
- 🎥 Transcribe YouTube videos using just the URL
- 🔄 Real-time progress tracking with status updates
- ⏱️ Timestamps included in transcriptions
- 💾 Automatic saving and organization of transcripts
- 📥 Easy download of transcription files
- 📚 Sidebar history of all transcribed videos
- 🎯 Clean, intuitive web interface
- 🔍 Video title and URL metadata in transcripts
Before you begin, ensure you have the following installed:
- Python 3.8 or higher
- FFmpeg (required for audio processing)
- Git (for cloning the repository)
-
Install FFmpeg:
- Ubuntu/Debian:
sudo apt update sudo apt install ffmpeg
- macOS (using Homebrew):
brew install ffmpeg
- Windows: Download from FFmpeg website
- Ubuntu/Debian:
-
Clone the repository:
git clone https://github.com/morriswong/ytotxt.git cd ytotxt
-
Create and activate a virtual environment:
python -m venv venv # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activate
-
Install Python dependencies:
pip install -r requirements.txt
-
Start the application:
python app.py
-
Access the web interface:
- Open your browser and go to
http://localhost:5000
- You'll see the main transcription interface
- Open your browser and go to
-
Transcribe a video:
- Paste a YouTube URL into the input field
- Click "Transcribe"
- Watch the real-time progress:
- Getting video information
- Downloading audio
- Generating transcript
-
View and Download:
- View the transcript directly on the results page
- Download the transcript as a text file
- Start a new transcription or return to the main page
ytotxt/
├── app.py # Flask application and route handlers
├── yttotxt.py # Core transcription and YouTube download logic
├── requirements.txt # Python package dependencies
├── templates/ # HTML templates
│ ├── index.html # Main page template
│ └── result.html # Results page template
└── downloads/ # Storage for downloaded files and transcripts
-
Video Processing:
- Extracts video information using yt-dlp
- Downloads only the audio stream for efficiency
- Converts audio to required format using FFmpeg
-
Transcription:
- Uses OpenAI's Whisper model for speech recognition
- Processes audio in chunks for memory efficiency
- Generates timestamps for each segment
-
File Management:
- Creates unique directories for each video
- Saves transcripts with metadata
- Organizes downloads by video ID
- Flask: Web framework for the application
- yt-dlp: Advanced YouTube video downloader
- whisper: OpenAI's speech recognition model
- FFmpeg: Audio processing library
To contribute to the project:
- Fork the repository
- Create a feature branch
- Make your changes
- Submit a pull request
Common issues and solutions:
- FFmpeg not found: Ensure FFmpeg is installed and in your system PATH
- Memory errors: Try using a smaller model in
yttotxt.py
(e.g., "tiny" or "base") - Download errors: Check your internet connection and video availability
This project is licensed under the MIT License. See the LICENSE file for details.
- OpenAI Whisper for the transcription model
- yt-dlp for YouTube video downloading
- Flask for the web framework
If you encounter any issues or have questions:
- Check the troubleshooting section
- Open an issue on GitHub
- Provide detailed information about your problem
Made with ❤️ for the open-source community