A tool for managing RunPod serverless model scheduling with a Streamlit web interface.
Problem: RunPod serverless models suffer from cold start issues, which means they take time to activate when receiving the first request.
Solution: This project sends periodic requests to keep serverless models "warm" and ready for inference, significantly reducing response times.
Architecture:
- Streamlit Frontend: Easy-to-use web interface for configuration
- Cronjob Scheduler: Sends periodic requests to RunPod serverless inference servers
- Slack Integration: Reports scheduling results and status updates to Slack channels
# Update pip to the latest version
pip install --upgrade pip
# Create and activate virtual environment for the project
cd runpod-serverless-supervisor
python -m venv runpod-serverless-supervisor
source runpod-serverless-supervisor/bin/activate
# Install runtime dependencies
pip install -e .
# For development (includes code quality tools)
pip install -e ".[dev]"
cp template/settings.example.json config/settings.json
cp template/.env.example .env
Configure config/settings.json
for UI defaults and available models:
ui.max_interval
: Maximum interval allowed in minutes (default: 1440)ui.default_from_time
: Default start time for schedules (default: "07:30")ui.default_to_time
: Default end time for schedules (default: "16:30")ui.default_interval
: Default interval in minutes (default: 60)ui.auto_refresh_seconds
: UI refresh rate in seconds (default: 5)ui.timezone
: Default timezone for scheduling (default: "Asia/Seoul")models
: Array of available RunPod model names to choose from
Configure .env
file with your actual values:
Required Settings:
RUNPOD_API_KEY
: Your RunPod API key (required)
Slack Integration (Optional):
SLACK_WEBHOOK_URL
: Slack webhook URL for general notificationsSLACK_ENABLED
: Enable/disable Slack notifications (default: true)SLACK_CHANNEL
: Slack channel for notifications (default: #runpod-alerts)SLACK_USERNAME
: Bot username (default: RunPod Supervisor)SLACK_ICON_EMOJI
: Bot emoji (default: :robot_face:)SLACK_BOT_TOKEN
: Bot token for Web API-based threaded messagingSLACK_MENTION_USER
: User ID for critical failure mentions (format: U1234567890)
streamlit run web_interface.py
It will open automatically in your browser or access http://localhost:8501.
Of course, you can also set a different port number by using a command like streamlit run web_interface.py --server.port 8502
.
The page may appear as follows:
-
Time Settings
- From: Schedule start time (default: 07:30)
- To: Schedule end time (default: 16:30)
- Interval: Call interval in minutes (default: 60 minutes)
-
Model Configuration
- Target URL: Enter RunPod endpoint ID
- Model: Select model to use
-
Scheduler Control
βΆοΈ START: Start scheduler- βΉοΈ STOP: Stop scheduler
-
Status Monitoring
- Real-time active model count
- Detailed status table for each model
Here's how the scheduler works in practice:
- Configure and Start: Set your schedule (
From
andTo
) and press START - Status Updates: The status table immediately shows "π’ Running" for your model
- Automatic Scheduling: Between
From
andTo
, requests are sent everyInterval
to keep your serverless model warm - Slack Notifications: All scheduling activities are logged to Slack in real-time. The following example is just one case tested only on the
Qwen/Qwen2.5-32B-Instruct
model.
- Persistent Operation: The scheduler continues running daily until you stop it or terminate the Streamlit app
- Cold Start Prevention: Keep serverless models warm with periodic requests
- Real-time Monitoring: Automatic status updates with live dashboard
- Multi-model Support: Schedule multiple models simultaneously
- Automatic Cronjob Management: Persists through system restarts
- Immediate Testing: Performs connection test immediately on START
- Configurable Timezone: Support for multiple timezones worldwide
- Intuitive UI: Color-coded status indicators with Streamlit interface
- Enhanced Slack Integration: Web API-based threaded messaging with mention notifications for critical alerts
- Parallel Processing: Optimized concurrent scheduling for improved performance and reduced latency
- Intelligent Retry Logic: Automatic retry mechanisms with exponential backoff for API failures
- Cold Start Handling: Specialized handling for serverless model initialization delays
- On-demand Testing: Immediate model validation capabilities through the web interface
runpod-serverless-supervisor/
βββ web_interface.py # Streamlit web app (main)
βββ runpod_cronjob.py # Cronjob execution script
βββ core/
β βββ env_settings.py # Environment & settings management
β βββ scheduler_manager.py # Scheduler configuration management
β βββ runpod_api.py # RunPod API client
βββ utils/
β βββ cronjob_utils.py # Cronjob management utilities
β βββ slack_utils.py # Slack notification utilities
βββ config/
β βββ settings.json # UI settings & model list
β βββ scheduler_config.json # Dynamic scheduler state (auto-generated)
βββ template/
β βββ settings.example.json # Settings template file
β βββ .env.example # Environment variables template
βββ .env # Environment variables (API keys, Slack config)
βββ requirements.txt # Package dependencies
βββ runpod_cronjob.log # Cronjob execution logs (auto-generated)
.env
: Contains API keys and Slack webhook configurationconfig/settings.json
: UI defaults, model list, and timezone settingsconfig/scheduler_config.json
: Dynamic scheduler state (auto-generated)
All configurations are managed through the web interface.
# Required Configuration
RUNPOD_API_KEY=your_runpod_api_key_here
# Slack Integration (Optional)
SLACK_WEBHOOK_URL=your_slack_webhook_url # General notifications
SLACK_ENABLED=true
SLACK_CHANNEL=#runpod-alerts
SLACK_USERNAME=RunPod Supervisor
SLACK_BOT_TOKEN=xoxb-your-bot-token-here # Threaded messaging
SLACK_MENTION_USER=U1234567890 # Critical alerts
{
"ui": {
"max_interval": 1440,
"default_from_time": "07:30",
"default_to_time": "16:30",
"default_interval": 60,
"timezone": "Asia/Seoul"
},
"models": ["model1", "model2", "model3"]
}
- Buttons not responding: Refresh the page in your browser
- Cronjob not working: Check system cron service status with
systemctl status cron
- API connection failed: Verify Target URL and model settings in the web interface
- Timezone issues: Check timezone setting in
config/settings.json
- Slack notifications not working: Verify
SLACK_WEBHOOK_URL
in.env
file
- Cronjob logs: Check
runpod_cronjob.log
for execution details - Streamlit logs: Check terminal output where web app is running
- Configuration issues: Verify all files in
config/
directory exist
- Python: 3.8 or higher
- System: macOS, Linux (Windows with WSL)
- Dependencies: All packages listed in
requirements.txt
This project uses automated code quality tools:
# Install development dependencies
pip install -r requirements-dev.txt
# Install pre-commit hooks
pre-commit install
# Run code formatting and linting
ruff check . # Linting
ruff format . # Formatting
black . # Additional formatting
isort . # Import sorting
# Run all pre-commit hooks manually
pre-commit run --all-files
# Security check
bandit -r .
The repository includes CI/CD workflows that automatically:
- Run code linting and formatting checks
- Perform security scans
- Validate code quality on every push and PR
- Threaded Slack Notifications: Structured failure alerts and mentions with improved readability through thread organization
- Parallel Model Processing: Optimized concurrent scheduling architecture for enhanced performance and reduced latency
- Cold Start Management: Automated handling of serverless model initialization delays
- On-demand Testing: Immediate model validation capabilities through the web interface
- Intelligent Retry Logic: Enhanced error recovery with detailed failure notifications and exponential backoff
- Time Formatting Improvements: Consistent time display and optimized timezone handling
Quick Start: Simply run streamlit run web_interface.py
to access all features.