Run powerful AI models directly in your browser - no internet required!
LocalGPT is a browser-based AI assistant that runs entirely on your device without requiring an internet connection (after initial model download). It's specifically designed for users in rural areas with limited internet access, allowing you to have AI assistance even when completely offline.
Note: You must first use LocalGPT while online to download the model (400MB-900MB). After downloading, you can use it completely offline. See the Offline Usage Guide for details.
Screenshot: LocalGPT running in a browser with offline capability
Screenshot: LocalGPT running in a browser with online capability
- π 100% Local Processing: All AI processing happens on your device - no data is sent to external servers
- π΅ True Offline Support: Once the model is downloaded, the app works completely offline
- πΎ Robust Caching: Models are cached locally for reliable offline use
- π₯οΈ Hardware Optimized: Automatically adapts to your device's capabilities
- π Network Status Aware: Shows online/offline status and adapts accordingly
- π Performance Monitoring: Real-time CPU/GPU usage metrics
- π¨ Modern UI: Clean, responsive interface that works on desktop and mobile devices
Important: You must first download the model while online before using offline. See the Offline Usage Guide for details.
IMPORTANT NOTICE: LocalGPT runs AI models directly on your device, which means responses are MUCH slower than cloud-based AI services.
- Initial responses typically take 30 seconds to 5 minutes depending on your hardware
- Subsequent responses in the same session may be slightly faster
- Smaller models respond faster than larger models
- This is normal and expected for on-device AI processing
- The tradeoff is complete privacy and offline capability
If you need faster responses, consider using a smaller model (see Model Information section below).
-
Clone this repository:
git clone https://github.com/subhadeeproy3902/localgpt.git cd localgpt
-
Install dependencies:
npm install # or yarn install # or bun install
-
Run the development server:
npm run dev # or yarn dev # or bun dev
-
Open http://localhost:3000 in your browser
LocalGPT uses WebLLM, a technology that allows running AI language models directly in your browser using WebGPU.
- When you first open the app, it will download the AI model (approximately 900MB)
- This initial download requires an internet connection
- The download may take several minutes depending on your connection speed
- A progress bar will show the download status
- The model is cached in your browser's IndexedDB storage
- On future visits, the model loads from cache - no download needed
- You can use the app completely offline once the model is cached
LocalGPT currently uses the Llama 3.2 1B model (quantized version), which offers a good balance between performance and resource requirements. The model:
- Requires approximately 900MB of download
- Uses approximately 2GB of memory when running
- Automatically adapts to your device's capabilities
- Will run on most modern computers and high-end mobile devices
You can customize LocalGPT to use different models based on your hardware capabilities. Simply edit the constant/modelConfig.ts
file and change the modelId
to one of the following:
SmolLM2-135M-Instruct-q0f16-MLC
(~360MB VRAM, very fast)SmolLM2-360M-Instruct-q4f16_1-MLC
(~380MB VRAM, fast)Llama-3.2-1B-Instruct-q4f16_1-MLC
(~880MB VRAM, good balance)gemma-2-2b-it-q4f16_1-MLC-1k
(~1.6GB VRAM, good quality)
Phi-3.5-mini-instruct-q4f16_1-MLC-1k
(~2.5GB VRAM, high quality)Llama-3.2-3B-Instruct-q4f16_1-MLC
(~2.3GB VRAM, high quality)gemma-2-2b-it-q4f16_1-MLC
(~1.9GB VRAM, high quality)
Llama-3.1-8B-Instruct-q4f16_1-MLC-1k
(~4.6GB VRAM, excellent quality)Phi-3.5-mini-instruct-q4f16_1-MLC
(~3.7GB VRAM, excellent quality)Mistral-7B-Instruct-v0.3-q4f16_1-MLC
(~4.6GB VRAM, excellent quality)
- Open the file
constant/modelConfig.ts
- Change the
modelId
value to your preferred model from the list above - Optionally update the
modelName
for display purposes - Save the file and restart the application
Example:
// For a smaller, faster model
export const modelId = "SmolLM2-360M-Instruct-q4f16_1-MLC";
export const modelName = "SmolLM 360M";
// For a larger, higher quality model
export const modelId = "Phi-3.5-mini-instruct-q4f16_1-MLC";
export const modelName = "Phi-3.5 Mini";
Choose a model that matches your hardware capabilities for the best experience. Smaller models will load faster and respond more quickly, while larger models generally provide higher quality responses.
CRITICAL: YOU MUST FIRST USE LOCALGPT ONLINE TO DOWNLOAD THE MODEL BEFORE USING IT OFFLINE!
The LLM model must be downloaded and cached while you have an internet connection before offline use is possible. This is a mandatory first step.
-
First Visit (MUST BE ONLINE):
- Open LocalGPT while connected to the internet
- Wait for the model to fully download (100% on progress bar)
- This download is approximately 400MB-900MB depending on the model
- The download may take several minutes on slower connections
- You'll see "Your LocalGPT has been set" when the model is ready
- Test the app by sending a few messages to verify it's working
-
Going Offline:
- After the model is fully downloaded and cached, you can disconnect from the internet
- The app will detect offline status and load the model from cache
- You'll see "Using model (cached and ready for offline use)" at the bottom
- All functionality will continue to work without internet
-
Troubleshooting Offline Issues:
- If you see "Offline mode - model not available", it means you didn't complete the online download
- You MUST connect to the internet and let the model fully download first
- If the model doesn't load offline after downloading, try clearing your browser cache and redownloading
- Some browsers limit offline storage - Chrome works best for offline use
- The app includes diagnostic information at the bottom to help identify issues
To get the best experience with LocalGPT:
-
Choose the Right Model: Select a model that matches your hardware capabilities:
- For slower devices: Use
SmolLM2-135M-Instruct-q0f16-MLC
orSmolLM2-360M-Instruct-q4f16_1-MLC
- For average devices: Use
Llama-3.2-1B-Instruct-q4f16_1-MLC
(default) - For powerful devices: Try
Phi-3.5-mini-instruct-q4f16_1-MLC
or larger models
- For slower devices: Use
-
Use a Modern Browser: Chrome or Edge with WebGPU support works best
-
Hardware Acceleration: Ensure your browser has hardware acceleration enabled:
- Chrome: Settings β System β Use hardware acceleration when available
- Edge: Settings β System and performance β Use hardware acceleration when available
-
Patience: The first response after loading may be slower as the model warms up
-
Keep Questions Concise: Shorter questions generally receive faster responses
-
Close Other Tabs: Reducing browser load can improve performance
-
Restart Occasionally: If performance degrades over time, refresh the page
- Browser: Chrome 113+ or Edge 113+ with WebGPU support
- OS: Windows 10/11, macOS, Linux, or Android
- RAM: 4GB minimum, 8GB+ recommended
- GPU: Integrated graphics minimum, dedicated GPU recommended
- Storage: 1GB free space for model caching
This project is licensed under the MIT LICENSE - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
For questions or feedback, please open an issue on GitHub or reach out to the maintainer:
- GitHub: @subhadeeproy3902