All notable changes to LocalLab will be documented in this file.
- Enhanced error handling and reliability in both clients
- Added timeout handling to sync client streaming methods
- Improved event loop cleanup and resource management
- Added connection state validation
- Added retry mechanism for streaming operations
- Added comprehensive logging throughout both clients
- Added proper cleanup of resources on client closure
- Fixed potential memory leaks in event loop handling
- Fixed thread cleanup in synchronous client
- Improved error propagation between async and sync clients
- Added proper timeout handling in streaming operations
- Enhanced connection state management
- Fixed package structure to avoid duplicate exports
- Updated version numbers to be consistent across all files
- Fixed imports in sync_client.py to use correct package name
- Improved package import reliability
- Ensured both LocalLabClient and SyncLocalLabClient are properly exported
- Fixed SyncLocalLabClient not being exported from locallab_client package
- Added proper exports for both LocalLabClient and SyncLocalLabClient in package init.py
- Ensured both sync and async clients are available through the main package import
- Renamed Python client package from
locallab-client
tolocallab_client
for better import compatibility - Updated client package version to 0.3.0
- Changed client package structure to use direct imports instead of nested packages
- Improved client package documentation with correct import examples
- Fixed server shutdown issues when pressing Ctrl+C
- Improved error handling during server shutdown process
- Enhanced handling of asyncio.CancelledError during shutdown
- Added proper handling for asyncio.Server objects during shutdown
- Reduced duplicate log messages during shutdown
- Added clean shutdown banner for better user experience
- Improved task cancellation with proper timeout handling
- Enhanced force exit mechanism to ensure clean termination
- Added a dedicated synchronous client (
SyncLocalLabClient
) that doesn't require async/await - Added automatic session closing to prevent resource leaks
- Added proper resource management with context managers
- Simplified client API with separate async and sync clients
- Updated documentation to clearly explain both client options
- Fixed issue with unclosed client sessions causing warnings
- Improved error handling in streaming responses
- Added unified client API that works both with and without async/await
- Implemented automatic session closing to the Python client
- Added proper resource management with atexit handlers and finalizers
- Improved error handling in the Python client
- Added synchronous context manager support (
with
statement)
- Simplified client API - same methods work in both sync and async contexts
- Updated Python client to track activity and close inactive sessions
- Enhanced client session management to prevent resource leaks
- Improved client package version to 0.2.0
- Fixed issue with unclosed client sessions causing warnings
- Improved error propagation in streaming responses
- Removed all response formatting from streaming generation
- Simplified token streaming to provide raw, unformatted tokens
- Removed text cleaning and formatting from all generation endpoints
- Improved error handling in streaming responses
- Optimized streaming generation for low-resource computers
- Implemented token-level streaming with proper error handling
- Added memory monitoring and adaptive token generation
- Enhanced error recovery mechanisms for streaming generation
- Improved client-side error handling for streaming responses
- Fixed issue with streaming generation stopping unexpectedly
- Improved error reporting in streaming responses
- Added timeout handling to prevent hanging during streaming
- Enhanced memory management to prevent OOM errors
- Optimized token generation for better performance on low-resource computers
- Reduced default max_length for streaming to conserve memory
- Improved token buffering for smoother streaming experience
- Enhanced Python client with better error handling for streaming
- Added proper error message propagation from server to client
- Added context awareness to streaming generation
- Enhanced streaming response quality with context tracking
- Improved streaming response coherence by maintaining conversation history
- Updated documentation with streaming context examples
- Fixed streaming response formatting issues
- Improved error handling in streaming generation
- Enhanced token cleanup for better readability
- Fixed Python client initialization error "'str' object has no attribute 'headers'"
- Updated client package to handle string URLs in constructor
- Bumped client package version to 1.0.2
- Updated documentation with correct client initialization examples
- Fixed HuggingFace token handling and validation in model loading
- Fixed ngrok token environment variable usage to use official
NGROK_AUTHTOKEN
name - Fixed token storage and retrieval in config and environment variables
- Improved CLI UX for token input and management
- Removed token masking for better visibility
- Show current token values when available
- Added proper token validation
- Enhanced token handling across the package
- Standardized environment variable names
- Better string handling for token values
- Consistent token validation
- Better error messages for token-related issues
- Improved networking setup with proper token handling
- Updated environment variable names to use official standards
NGROK_AUTHTOKEN
for ngrok tokenHUGGINGFACE_TOKEN
for HuggingFace token
- Standardized token management functions in config.py
- Fixed critical error with ngrok URL handling in Google Colab
- Fixed NgrokTunnel type error during server initialization
- Improved error messages for ngrok connection issues
- Updated footer design for better visibility
- Clarified URL usage in documentation (localhost vs ngrok)
- Simplified footer design in server output
- Enhanced ngrok tunnel setup process with better error handling
- Updated documentation to clearly distinguish between local and ngrok URLs
- Added support for HuggingFace token through CLI and environment variables
- Interactive prompt for HuggingFace token when required
- Secure token handling in configuration
- Improved error messages for model loading issues
- Made HuggingFace token optional but with interactive prompt when needed
- Enhanced model loading process with better token handling
- Updated documentation with HuggingFace token configuration details
- Fixed critical issue with BERT model loading by removing device_map for BERT models
- Added proper BERT model configuration for text generation
- Improved model loading process with better architecture detection
- Enhanced error handling for different model architectures
- Fixed memory management for CPU-only environments
- Added automatic model type detection and configuration
- Improved compatibility with various model architectures
- Enhanced error messages for better debugging
- Added support for BERT models in text generation mode
- Implemented automatic model architecture detection
- Added proper model-specific configurations
- Enhanced memory optimization for different model types
- Fixed critical issue with server not terminating properly when Ctrl+C is pressed
- Improved process termination by using os._exit() instead of sys.exit() for clean shutdown
- Added CPU compatibility by disabling quantization when CUDA is not available
- Fixed bitsandbytes error for CPU-only systems with clear warning messages
- Enhanced user experience with better error handling for non-GPU environments
- Added beautiful footer section with author information and social media links
- Included GitHub, Twitter, and Instagram links in the footer
- Added project repository link with star request
- Enhanced server startup display with comprehensive information
- Fixed critical issue with server not shutting down properly when Ctrl+C is pressed
- Improved signal handling in ServerWithCallback class to ensure clean shutdown
- Enhanced main_loop method to respond faster to shutdown signals
- Implemented more robust server shutdown process with proper resource cleanup
- Added additional logging during shutdown to help diagnose issues
- Increased shutdown timeout to allow proper cleanup of all resources
- Fixed multiple shutdown attempts when Ctrl+C is pressed repeatedly
- Ensured all server components are properly closed during shutdown
- Enhanced server compatibility with different versions of uvicorn
- Improved lifespan initialization with comprehensive fallback mechanisms
- Fixed server startup issues with newer versions of uvicorn (0.34.0+)
- Added robust error handling for lifespan initialization
- Implemented multiple initialization strategies for different uvicorn versions
- Improved logging during server startup to better diagnose initialization issues
- Enhanced server stability with proper error recovery during startup
- Fixed "Using NoopLifespan" warning by properly initializing lifespan components
- Ensured compatibility with both older and newer versions of uvicorn
- Improved server reliability in various Python environments
- Fixed critical issue with SimpleTCPServer not properly handling API requests
- Implemented proper ASGI server in SimpleTCPServer for handling API requests
- Added support for uvicorn's H11Protocol for better request handling
- Improved fallback server implementation with proper HTTP request parsing
- Fixed API documentation to show correct URLs based on environment
- Fixed API examples to show local URL or ngrok URL based on configuration
- Ensured server works correctly in both local and Google Colab environments
- Fixed import error: "cannot import name 'get_system_info' from 'locallab.utils.system'"
- Added backward compatibility function for system information retrieval
- Ensured proper display of system resources during server startup
- Enhanced compatibility between UI components and system utilities
- Improved error handling during server startup display
- Added graceful error recovery for UI component failures
- Ensured server continues to run even if display components fail
- Enhanced robustness of startup process with comprehensive error handling
- Added fallback mechanisms for all UI components to handle import errors
- Improved system resource display with multiple fallback options
- Enhanced model information display with graceful degradation
- Ensured server can start even with missing or incompatible dependencies
- Added minimal mode fallback server for critical initialization failures
- Implemented comprehensive error handling for configuration loading
- Created fallback endpoints for basic server functionality
- Added detailed error reporting in minimal mode
- Enhanced server resilience with multi-level fallback mechanisms
- Fixed critical error: "'Server' object has no attribute 'start'"
- Implemented robust SimpleTCPServer as a fallback when TCPServer import fails
- Added direct socket handling for maximum compatibility across environments
- Enhanced server startup process to handle different server implementations
- Improved error handling in server shutdown process
- Added graceful fallback for servers without start/shutdown methods
- Enhanced compatibility with different versions of uvicorn
- Improved server stability with better error recovery mechanisms
- Added comprehensive error handling for socket operations
- Implemented non-blocking socket I/O for better performance
- Added direct fallback to SimpleTCPServer when server.start() fails
- Improved Google Colab integration with better error handling
- Enhanced event loop handling for different Python environments
- Fixed critical error: "'Config' object has no attribute 'server_class'"
- Implemented custom startup method that doesn't rely on config.server_class
- Fixed import issues in Google Colab by properly exposing start_server in init.py
- Enhanced compatibility with different versions of uvicorn
- Improved server initialization for more reliable startup
- Added direct TCPServer initialization for better compatibility
- Implemented fallback mechanisms for TCPServer import to handle different uvicorn versions
- Added multiple import paths for TCPServer to ensure compatibility across all environments
- Enhanced error handling during server initialization
- Improved Google Colab integration with better import structure
- Added custom main_loop implementation with robust error handling
- Implemented graceful shutdown mechanism for all server components
- Enhanced server stability with improved error recovery
- Fixed critical error: "'NoneType' object has no attribute 'startup'"
- Implemented NoopLifespan class as a fallback when all lifespan initialization attempts fail
- Ensured server can start even when lifespan initialization fails
- Added proper error handling for startup and shutdown events
- Enhanced server stability across different environments and uvicorn versions
- Added robust error recovery during server startup process
- Overrode uvicorn's startup and shutdown methods to add additional error handling
- Improved logging for lifespan-related errors to aid in troubleshooting
- Added graceful fallback mechanisms for all critical server operations
- Ensured clean server shutdown even when lifespan shutdown fails
- Fixed critical error: "LifespanOn.init() takes 2 positional arguments but 3 were given"
- Enhanced lifespan initialization to handle different uvicorn versions with varying parameter requirements
- Implemented comprehensive parameter testing for all lifespan classes to ensure compatibility
- Added detailed logging for lifespan initialization to aid in troubleshooting
- Improved error handling for all lifespan-related operations
- Fixed critical error with LifespanOn initialization: "LifespanOn.init() got an unexpected keyword argument 'logger'"
- Improved compatibility with different versions of uvicorn by properly handling lifespan initialization
- Enhanced error handling for different lifespan implementations
- Added graceful fallbacks when lifespan initialization fails
- Fixed critical server startup error related to uvicorn lifespan initialization
- Fixed 'Config' object has no attribute 'logger' error during server startup
- Fixed 'Config' object has no attribute 'loaded_app' error
- Improved compatibility with different versions of uvicorn
- Enhanced error handling during server startup
- Fixed banner display functions to work with the latest server implementation
- Fixed critical issue with
locallab start
failing due to uvicorn lifespan module errors - Fixed
locallab config
command not properly prompting for new settings when reconfiguring - Significantly improved CLI startup speed with optimized imports and conditional loading
- Enhanced configuration system to include all available options (cache, logging, etc.)
- Improved compatibility with different Python versions and environments
- Added better error handling for ngrok authentication token
- Fixed event loop handling for both local and Google Colab environments
- Removed "What's New" sections from documentation in favor of directing users to the changelog
- Restored option to skip advanced configuration settings for better user experience
- Fixed critical issue with
locallab start
failing due to uvicorn lifespan module errors - Fixed
locallab config
command not properly prompting for new settings when reconfiguring - Significantly improved CLI startup speed with optimized imports and conditional loading
- Enhanced configuration system to include all available options (cache, logging, etc.)
- Improved compatibility with different Python versions and environments
- Added better error handling for ngrok authentication token
- Fixed event loop handling for both local and Google Colab environments
- Removed "What's New" sections from documentation in favor of directing users to the changelog
- Fixed critical issue with
locallab config
command not being respected when runninglocallab start
- Enhanced configuration system to properly load and apply saved settings
- Improved user experience by showing current configuration before prompting for changes
- Added clear feedback when configuration is saved and how to use it
- Fixed critical server startup error related to missing 'lifespan' attribute in ServerWithCallback class
- Fixed KeyError in 'locallab info' command by properly handling RAM information
- Significantly improved CLI startup speed through lazy loading of imports
- Enhanced error handling in system information display
- Fixed environment variable conflicts between CLI configuration and OS environment variables
- Improved configuration system to properly handle both CLI and environment variable settings
- Optimized server startup process for faster response time
- Reduced unnecessary operations during CLI startup for better performance
- Improved memory usage reporting with proper unit conversion (GB instead of MB)
- Enhanced ServerWithCallback class with proper lifespan initialization
- Updated configuration system to use a unified approach for all settings
- Enhanced CLI with interactive configuration wizard
- Added persistent configuration storage
- Implemented environment detection for smart defaults
- Added command groups: start, config, info
- Added support for configuring optimizations through CLI
- Improved Google Colab integration with context-aware prompts
- Added system information command
- Improved streaming generation quality to match non-streaming responses
- Added proper stopping conditions for streaming to prevent endless generation
- Implemented repetition detection to stop low-quality streaming responses
- Reduced token chunk size for better quality control in streaming mode
- Ensured consistent generation parameters between streaming and non-streaming modes
- Added memory monitoring to prevent CUDA out of memory errors
- Implemented adaptive token generation for streaming responses
- Added CUDA memory configuration with expandable segments
- Fixed torch.compile() errors by adding proper error handling and fallback to eager mode
- Fixed early stopping warning by correctly setting num_beams parameter
- Improved streaming generation with smaller token chunks for more responsive output
- Added memory-aware generation that adapts to available GPU resources
- Implemented error recovery for out-of-memory situations during generation
- Fixed issue with banners (running banner, system instructions, model configuration, API documentation) repeating in the console at regular intervals
- Added flag to ensure startup information is only displayed once during server initialization
- Improved server callback handling to prevent duplicate banner displays
- Fixed Env Configuration by removing the duplicated Env Configuration.
- Added comprehensive API documentation display on server startup with curl examples
- Added model configuration section that displays current model and optimization settings
- Added system instructions section showing the current prompt template
- Improved environment variable handling for model configuration
- Enhanced server startup logging with detailed optimization settings
- Added support for reading HUGGINGFACE_MODEL environment variable to specify model
- Redesigned modern ASCII art banners for a more aesthetic interface
- Improved UI with cleaner banner separations and better readability
- Fixed parameter mismatch in text generation endpoints by properly handling
max_new_tokens
parameter - Resolved coroutine awaiting issues in streaming generation endpoints
- Fixed async generator handling in
stream_chat
andgenerate_stream
functions - Enhanced error handling in streaming responses to provide better error messages
- Improved compatibility between route parameters and model manager methods
- Added missing dependencies in
setup.py
: huggingface_hub, pynvml, and typing_extensions - Improved dependency management with dev extras for testing packages
- Enhanced error handling for GPU memory detection
- Fixed circular import issues between modules
- Improved error handling in system utilities
- Enhanced compatibility with Google Colab environments
- New model loading endpoint that accepts model_id in the request body at
/models/load
format_chat_messages
function to properly format chat messages for the model- CLI function to support command-line usage with click interface
- Properly awaiting async
generate_text
in chat completion endpoint - Fixed async generator handling in
generate_stream
function - Fixed streaming in the
stream_chat
function to correctly send server-sent events - Properly escaped newline characters in the streaming response
- Added missing dependencies in
setup.py
: colorama, python-multipart, websockets, psutil, and nest-asyncio
get_network_interfaces
function to retrieve information about available network interfacesget_public_ip
async function to retrieve the public IP address of the machine- Adapter methods in
ModelManager
(generate_text
andgenerate_stream
) to maintain API compatibility with route handlers
- Import error for
get_public_ip
andget_network_interfaces
functions - Naming mismatch between route handlers and
ModelManager
methods - New dependencies in
setup.py
:netifaces
andhttpx
- Fixed API endpoint errors for
/models/available
and other model endpoints - Resolved parameter error in
get_model_generation_params()
function - Improved error handling for model optimization settings through environment variables
- Fixed circular import issues between routes and core modules
- Enhanced Flash Attention warning message to be more informative
- Added new
get_gpu_info()
function for detailed GPU monitoring - Added improved system resource endpoint with detailed GPU metrics
- Added robust environment variable handling for optimization settings
- Made optimization flags more robust by checking for empty string values
- Improved fallback handling for missing torch packages
- Enhanced server startup logs with better optimization information
- Fixed critical server startup error in Google Colab environment with uvicorn callback configuration
- Resolved "'list' object is not callable" error by properly implementing the callback_notify as an async function
- Enhanced server startup sequence for better compatibility with both local and Colab environments
- Improved custom server implementation to handle callbacks more robustly
- Fixed circular import issue between core/app.py and routes/system.py by updating system.py to use get_request_count from logger module directly
- Made Flash Attention warning less alarming by changing it from a warning to an info message with better explanation
- Enhanced get_system_info endpoint with cleaner code and better organization
- Fixed potential issues with GPU info retrieval through better error handling
- Comprehensive environment check system that validates:
- Python version compatibility
- CUDA/GPU availability and configuration
- Ngrok token presence when running in Google Colab
- Improved error handling with detailed error messages and suggestions
- Clear instructions for setting up ngrok authentication token
- Complete removal of the deprecated monolithic
main.py
file - Enhanced ngrok setup process with better authentication handling:
- Automatic detection of auth token from environment variables
- Clear error messages when auth token is missing
- Improved token validation and connection process
- Parameter renamed from
ngrok
touse_ngrok
for clarity - More readable ASCII art for initializing banner
- Improved documentation about ngrok requirements for Google Colab
- Fixed circular import issues between core/app.py and routes modules
- Fixed ngrok authentication flow to properly use auth token from environment variables
- Fixed error with missing torch import in the server.py file
- Added graceful handling of missing torch module to prevent startup failures
- Improved error messages when server fails to start
- Better exception handling throughout the codebase
- Clear ASCII art status indicators ("INITIALIZING" and "RUNNING") showing server state
- Warning messages that prevent users from making API requests before the server is ready
- Callback mechanism to display the "RUNNING" banner only when the server is fully operational
- New dedicated logger module with comprehensive features:
- Colorized console output for different log levels
- Server status tracking (initializing, running, error, shutting_down)
- Request tracking with detailed metrics
- Model loading/unloading metrics
- Performance monitoring for slow requests
- API documentation for logger module with usage examples
- Completely refactored the codebase into a more modular structure:
- Split main.py into smaller, focused modules
- Created separate directories for routes, UI components, utilities, and core functionality
- Improved import structure to prevent circular dependencies
- Better organization of server startup and API functionality
- Enhanced model loading process with proper timing and status updates
- Improved error handling throughout the application
- Better request metrics in response headers
- Removed old logger.py in favor of the new dedicated logger module
- Complete removal of health checks and validation when setting up ngrok tunnels
- Fixed issue where logs did not appear correctly due to server starting in a separate process
- Simplified ngrok setup process to run without validation to prevent connection errors during startup
- Improved server startup flow to be more direct without background health checks or API validation
- Reorganized startup sequence to work properly with ngrok, enhancing compatibility with Colab
- Removed the background process workflow for server startup. The server now runs directly in the main process, ensuring that all logs (banner, model details, system resources, etc.) are displayed properly.
- Simplified the startup process by directly calling uvicorn.run(), with optional ngrok setup if the server is run in Google Colab.
- Added utility function is_port_in_use(port: int) → bool to check if a port is already in use.
- Added async utility function load_model_in_background(model_id: str) to load the model asynchronously in the background while managing the global loading flag.
- Updated server startup functions to incorporate these utilities, ensuring proper port management and asynchronous model loading.
- Extended the initial wait time in start_server from 5 to 15 seconds to allow the server ample time to initialize, especially in Google Colab environments.
- Increased health check timeout to 120 seconds for ngrok mode and 60 seconds for local mode to accommodate slower startups.
- Added detailed logging during health checks to aid in debugging startup issues.
- Improved logging across startup: the banner, model details, configuration, system resources, API documentation, quick start guide, and footer are now fully logged and printed.
- Updated the start_server function to extend the health check timeout to 60 seconds in Google Colab (when using ngrok) and to set an environment variable to trigger the Colab branch in run_server_proc.
- Modified startup_event to load the model in the background, ensuring that the server's /health endpoint becomes available in time and that logging output is complete.
- Updated GitHub Actions workflow to install the Locallab package along with its runtime dependencies in CI, ensuring that all required packages are available for proper testing.
- Refactored
run_server_proc
in the spawned process to initialize a dedicated logger ("locallab.spawn") to avoid inheriting SemLock objects from a fork context. - Ensured that the log queue is created using the multiprocessing spawn context, preventing runtime errors in Google Colab.
- Updated Mermaid diagrams in
README.md
anddocs/colab/README.md
to enclose node labels in double quotes, resolving parse errors in GitHub rendering. - Removed duplicate architecture diagrams from the root
README.md
file.