Secure Data Analysis System β Python-based solution for analyzing data files (csv) using advanced Language Models (LLMs) in a fully isolated docker container environment. By combining the power of OpenAI's language models like o3-mini with the enhanced security of Docker containerization, this system offers both cuttingβedge data insights and top-notch protection.
- π Secure File Access & Code Execution: Executes code within isolated Docker containers, ensuring a safe and controlled environment for data analysis.
- π€ Intelligent Agents: Utilizes OpenAI's language models to deliver sophisticated data understanding and processing.
- π οΈ Extensible Tool System: Easily extend functionality through a modular tool interface.
- π Built-in Data Analysis Support: Comes pre-equipped with libraries like
pandas
,numpy
,matplotlib
, andseaborn
for comprehensive data exploration. - π Detailed Logging & Error Tracking: Provides in-depth logging to facilitate efficient debugging and monitoring.
- ποΈ Clean, Object-Oriented Architecture: Designed with a clear separation of concerns, aiding in scalability and maintainability.
flowchart TD
%% Entry Layer
subgraph "Entry Layer"
CLI["CLI (secure_analyzer.py)"]
end
%% Agent Layer
subgraph "Agent Layer"
FAA["FileAccessAgent"]
PEA["PythonExecAgent"]
Base["BaseAgent (abstract)"]
end
%% Tool Layer
subgraph "Tool Layer"
FAT["FileAccessTool"]
PCT["PythonCodeInterpreterTool"]
TI["ToolInterface (abstract)"]
end
%% Service Layer
subgraph "Service Layer"
LMI["LanguageModelInterface"]
OLM["OpenAI Language Model"]
OCF["OpenAI Client Factory"]
end
%% Execution Layer
subgraph "Execution Layer"
DC["Docker Container"]
end
%% Utility Layer
subgraph "Utility Layer"
Logger["Logger"]
OAU["OpenAI Util"]
end
%% Relationships
CLI -->|"initiates"| FAA
CLI -->|"initiates"| PEA
FAA -->|"performs file read using"| FAT
PEA -->|"executes code using"| PCT
FAA -->|"logs"| Logger
PEA -->|"logs"| Logger
PEA -->|"calls"| LMI
LMI -->|"routes request to"| OLM
OLM -->|"utilizes"| OCF
FAA -->|"executes in"| DC
PEA -->|"executes in"| DC
%% Utilities are used across the system
LMI -->|"assists with"| OAU
%% Inheritance and Interface relationships (informational - not directional)
Base --- FAA
Base --- PEA
TI --- FAT
TI --- PCT
%% Click Events
click CLI "https://github.com/imsharad/openai-code-interpreter/blob/main/secure_analyzer.py"
click FAA "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/agents/file_access_agent.py"
click PEA "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/agents/python_code_exec_agent.py"
click LMI "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/language_model_interface.py"
click OLM "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/openai_language_model.py"
click OCF "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/openai_factory.py"
click DC "https://github.com/imsharad/openai-code-interpreter/tree/main/resources/docker"
click FAT "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/tools/file_access_tool.py"
click PCT "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/tools/python_code_interpreter_tool.py"
click Logger "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/utils/logger.py"
click OAU "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/utils/openai_util.py"
%% Styles
class CLI cli
class FAA,PEA agent
class LMI,OLM,OCF service
class FAT,PCT tool
class DC docker
class Logger,OAU util
%% Class Definitions
classDef cli fill:#ffebcd,stroke:#8b4513,stroke-width:2px;
classDef agent fill:#ffcccc,stroke:#cc0000,stroke-width:2px;
classDef service fill:#ccffcc,stroke:#008000,stroke-width:2px;
classDef tool fill:#ffffcc,stroke:#cccc00,stroke-width:2px;
classDef docker fill:#cce5ff,stroke:#0056b3,stroke-width:2px;
classDef util fill:#e6ccff,stroke:#800080,stroke-width:2px;
-
Agent System (
resources/registry/agents/
):- BaseAgent: The abstract base class for all agents.
- FileAccessAgent: Manages secure file operations.
- PythonExecAgent: Oversees code generation and execution within Docker containers.
-
Services (
resources/object_oriented_agents/services/
):- LanguageModelInterface: An abstract interface for interacting with language models.
- OpenAILanguageModel: A concrete implementation that leverages the OpenAI API.
- OpenAIClientFactory: Handles the creation and management of OpenAI API clients.
-
Utils (
resources/object_oriented_agents/utils/
):- logger.py: Provides a centralized logging system.
- openai_util.py: Offers utility functions for interacting with the OpenAI API.
sequenceDiagram
participant User
participant CLI
participant FileAgent
participant CodeAgent
participant LLM
participant Docker
User->>CLI: Run analysis
CLI->>Docker: Start container
CLI->>FileAgent: Read file
FileAgent->>CLI: File content
CLI->>CodeAgent: Generate & run code
CodeAgent->>LLM: Request code
LLM-->>CodeAgent: Return code
CodeAgent->>Docker: Execute code
Docker-->>CodeAgent: Results
CodeAgent->>CLI: Analysis results
CLI->>User: Display results
- Python 3.10+
- Docker
- An active OpenAI API key
-
Clone the Repository:
git clone https://github.com/yourusername/secure-data-analysis.git cd secure-data-analysis
-
Install Dependencies:
python -m venv venv source venv/bin/activate # On Windows: .\venv\Scripts\activate pip install -r requirements.txt
-
Configure the OpenAI API:
export OPENAI_API_KEY='your-api-key-here'
Run the analyzer with a CSV file and your question:
python secure_analyzer.py --file your_data.csv --question "What are the monthly trends?"
- Executes all code in isolated Docker containers.
- Runs with a non-root user.
- Enforces strict resource limits and network isolation.
# Example from FileAccessAgent
def safe_file_access(self, filename: str) -> str:
if not self._is_safe_path(filename):
return "Error: Invalid file path"
return self._read_file_safely(filename)
# From OpenAILanguageModel
def generate_completion(
self,
model: str,
messages: List[Dict[str, str]],
tools: Optional[List[Dict[str, Any]]] = None,
reasoning_effort: Optional[str] = None
) -> Dict[str, Any]:
try:
response = self.openai_client.chat.completions.create(
model=model,
messages=messages,
tools=tools
)
return response
except Exception as e:
self.logger.error(f"OpenAI call failed: {str(e)}", exc_info=True)
raise
python secure_analyzer.py \
--file sales_data.csv \
--question "Show me the average sales by quarter"
python secure_analyzer.py \
--file traffic_data.csv \
--question "Create a line plot showing accidents over time_of_day"
Implement the ToolInterface
to add custom capabilities:
from ...object_oriented_agents.core_classes.tool_interface import ToolInterface
class CustomAnalysisTool(ToolInterface):
def get_definition(self) -> Dict[str, Any]:
return {
"function": {
"name": "custom_analysis",
"description": "Performs custom data analysis",
"parameters": {
"type": "object",
"properties": {
"analysis_type": {"type": "string"},
"columns": {"type": "array", "items": {"type": "string"}}
},
"required": ["analysis_type", "columns"]
}
}
}
def run(self, arguments: Dict[str, Any]) -> str:
# Implementation
pass
Extend the base agent to create custom agents:
from ...object_oriented_agents.core_classes.base_agent import BaseAgent
class CustomAnalysisAgent(BaseAgent):
def __init__(
self,
developer_prompt: str = "Your custom prompt here",
model_name: str = "gpt-4",
logger=None,
language_model_interface=None
):
super().__init__(
developer_prompt=developer_prompt,
model_name=model_name,
logger=logger,
language_model_interface=language_model_interface
)
self.setup_tools()
def setup_tools(self) -> None:
self.tool_manager.register_tool(CustomAnalysisTool())
Our project utilizes a hierarchical logging system to streamline monitoring and debugging:
from ...object_oriented_agents.utils.logger import get_logger
logger = get_logger(__name__)
# Usage examples
logger.info("Starting analysis...")
logger.debug("Processing file: %s", filename)
logger.error("Error in analysis", exc_info=True)
Log Configuration:
- Log Level: Configurable (DEBUG, INFO, ERROR)
- Output: Both console and file (
logs.log
) - Format:
%(asctime)s - %(name)s - %(levelname)s - %(message)s
We welcome contributions! To get started:
- Fork the repository.
- Create your feature branch (
git checkout -b feature/NewFeature
). - Commit your changes (
git commit -m 'Add NewFeature'
). - Push to your branch (
git push origin feature/NewFeature
). - Open a Pull Request.
- Adhere to the PEP 8 style guide.
- Include unit tests for new features.
- Update the documentation as needed.
- Maintain type hints and clear commit messages.
This project is licensed under the MIT License β see the LICENSE file for details.
- OpenAI for their groundbreaking language models.
- Docker for delivering robust containerization.
- All contributors and users who help improve this project.
Sharad Jain β @seekingtroooth β sharadsfo@gmail.com
Project Link: https://github.com/Imsharad/openai-code-interpreter
To help illustrate the architecture and workflow of the project, the following Mermaid diagrams offer a detailed look at the components and their interactions.
This diagram shows the core classes, their relationships, and how the agent system is structured:
classDiagram
class BaseAgent {
+developer_prompt: str
+model_name: str
+messages: ChatMessages
+task(user_task: str, tool_call_enabled: bool): str
}
class FileAccessAgent {
+setup_tools()
}
class PythonExecAgent {
+setup_tools()
}
BaseAgent <|-- FileAccessAgent
BaseAgent <|-- PythonExecAgent
class ChatMessages {
+add_developer_message(content: str)
+add_user_message(content: str)
+add_assistant_message(content: str)
}
class ToolManager {
+register_tool(tool: ToolInterface)
+get_tool_definitions()
}
class LanguageModelInterface {
<<interface>>
+generate_completion(model: str, messages: List, tools?: List, reasoning_effort?: str): Dict
}
class OpenAILanguageModel {
+generate_completion(model: str, messages: List, tools?: List, reasoning_effort?: str): Dict
}
LanguageModelInterface <|.. OpenAILanguageModel
BaseAgent --> ChatMessages
BaseAgent --> ToolManager
BaseAgent --> LanguageModelInterface
This diagram describes a detailed flow of how an analysis request passes through the different componentsβfrom initial file access to code execution and results presentation:
sequenceDiagram
participant User
participant CLI as "CLI (secure_analyzer.py)"
participant FA as "FileAccessAgent"
participant CA as "PythonExecAgent"
participant TM as "ToolManager"
participant LLM as "LLM Interface"
participant Docker as "Docker Container"
User->>CLI: Start analysis with CSV & question
CLI->>FA: Call task(file_prompt)
FA->>TM: Invoke safe_file_access(filename)
TM->>Docker: Read file content securely
Docker-->>TM: Return file content
TM-->>FA: Deliver file content
FA-->>CLI: Provide file context
CLI->>CA: Call task(question with context)
CA->>LLM: Generate code using LLM
LLM-->>CA: Return generated code
CA->>TM: Execute code via execute_python_code
TM->>Docker: Run Python code in container
Docker-->>TM: Return execution result
TM-->>CA: Forward execution result
CA-->>CLI: Return final analysis result
CLI->>User: Display analysis result
This diagram provides an overview of the high-level data flow within the systemβfrom user input to the execution results reflecting back to the user:
graph LR
A[User Input: CSV file + Question]:::user
B["CLI (secure-analyzer.py)"]:::cli
C[FileAccessAgent]:::agent
D[PythonExecAgent]:::agent
E[ToolManager]:::service
F["LLM (OpenAI API)"]:::external
G[Docker Container]:::infra
H[OpenAIClientFactory]:::service
I[LanguageModelInterface]:::interface
J[Logger]:::util
K[SecurityValidator]:::security
L[BaseAgent]:::abstract
M[ChatMessages]:::data
N[ToolInterface]:::interface
O[PythonExecTool]:::tool
P[FileAccessTool]:::tool
subgraph Agents
C
D
L
end
subgraph Services
E
H
I
end
subgraph Security
K
G
end
subgraph Core Components
M
N
end
%% Main flow
A --> B
B --> C & D
C & D --> E
E --> K
K --> E
E --> G & I
I --> H
H --> F
F --> H
H --> I
I --> E
G --> E
E --> D
D --> B
B --> A
%% Class relationships
L --> C & D
L --> M
L --> I
E --> O & P
O & P --> N
J --> C & D & E
classDef user fill:#c9f7d4,stroke:#2b6620;
classDef cli fill:#fff4de,stroke:#a2790d;
classDef agent fill:#d3ddfa,stroke:#1c3a94;
classDef service fill:#e1d5f7,stroke:#5015a0;
classDef external fill:#ffd8d8,stroke:#c90000;
classDef infra fill:#d1f0f6,stroke:#0a708a;
classDef util fill:#f0f0f0,stroke:#666666;
classDef interface fill:#fce1e4,stroke:#d60047;
classDef security fill:#ffebcc,stroke:#e67e00;
classDef abstract fill:#f8f9fa,stroke:#adb5bd,stroke-dasharray: 5 5;
classDef data fill:#e3f2fd,stroke:#1976d2;
classDef tool fill:#f3e5f5,stroke:#9c27b0;
click G href "https://docs.docker.com/engine/security/" "Docker Security Docs"
click K href "#security-features" "Project Security Details"
note["π Security Layers:
1. Path validation
2. Docker isolation
3. LLM output sanitization
4. Resource limits"]:::security
note -.- G