Skip to content

Imsharad/openai-code-interpreter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

10 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ›‘οΈ Secure Data Analysis System

Secure Data Analysis System – Python-based solution for analyzing data files (csv) using advanced Language Models (LLMs) in a fully isolated docker container environment. By combining the power of OpenAI's language models like o3-mini with the enhanced security of Docker containerization, this system offers both cutting‐edge data insights and top-notch protection.

✨ Features

  • πŸ”’ Secure File Access & Code Execution: Executes code within isolated Docker containers, ensuring a safe and controlled environment for data analysis.
  • πŸ€– Intelligent Agents: Utilizes OpenAI's language models to deliver sophisticated data understanding and processing.
  • πŸ› οΈ Extensible Tool System: Easily extend functionality through a modular tool interface.
  • πŸ“Š Built-in Data Analysis Support: Comes pre-equipped with libraries like pandas, numpy, matplotlib, and seaborn for comprehensive data exploration.
  • πŸ” Detailed Logging & Error Tracking: Provides in-depth logging to facilitate efficient debugging and monitoring.
  • πŸ—οΈ Clean, Object-Oriented Architecture: Designed with a clear separation of concerns, aiding in scalability and maintainability.

πŸ›οΈ Architecture

flowchart TD
    %% Entry Layer
    subgraph "Entry Layer"
        CLI["CLI (secure_analyzer.py)"]
    end

    %% Agent Layer
    subgraph "Agent Layer"
        FAA["FileAccessAgent"]
        PEA["PythonExecAgent"]
        Base["BaseAgent (abstract)"]
    end

    %% Tool Layer
    subgraph "Tool Layer"
        FAT["FileAccessTool"]
        PCT["PythonCodeInterpreterTool"]
        TI["ToolInterface (abstract)"]
    end

    %% Service Layer
    subgraph "Service Layer"
        LMI["LanguageModelInterface"]
        OLM["OpenAI Language Model"]
        OCF["OpenAI Client Factory"]
    end

    %% Execution Layer
    subgraph "Execution Layer"
        DC["Docker Container"]
    end

    %% Utility Layer
    subgraph "Utility Layer"
        Logger["Logger"]
        OAU["OpenAI Util"]
    end

    %% Relationships
    CLI -->|"initiates"| FAA
    CLI -->|"initiates"| PEA

    FAA -->|"performs file read using"| FAT
    PEA -->|"executes code using"| PCT

    FAA -->|"logs"| Logger
    PEA -->|"logs"| Logger

    PEA -->|"calls"| LMI
    LMI -->|"routes request to"| OLM
    OLM -->|"utilizes"| OCF

    FAA -->|"executes in"| DC
    PEA -->|"executes in"| DC

    %% Utilities are used across the system
    LMI -->|"assists with"| OAU

    %% Inheritance and Interface relationships (informational - not directional)
    Base --- FAA
    Base --- PEA
    TI --- FAT
    TI --- PCT

    %% Click Events
    click CLI "https://github.com/imsharad/openai-code-interpreter/blob/main/secure_analyzer.py"
    click FAA "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/agents/file_access_agent.py"
    click PEA "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/agents/python_code_exec_agent.py"
    click LMI "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/language_model_interface.py"
    click OLM "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/openai_language_model.py"
    click OCF "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/services/openai_factory.py"
    click DC "https://github.com/imsharad/openai-code-interpreter/tree/main/resources/docker"
    click FAT "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/tools/file_access_tool.py"
    click PCT "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/registry/tools/python_code_interpreter_tool.py"
    click Logger "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/utils/logger.py"
    click OAU "https://github.com/imsharad/openai-code-interpreter/blob/main/resources/object_oriented_agents/utils/openai_util.py"

    %% Styles
    class CLI cli
    class FAA,PEA agent
    class LMI,OLM,OCF service
    class FAT,PCT tool
    class DC docker
    class Logger,OAU util

    %% Class Definitions
    classDef cli fill:#ffebcd,stroke:#8b4513,stroke-width:2px;
    classDef agent fill:#ffcccc,stroke:#cc0000,stroke-width:2px;
    classDef service fill:#ccffcc,stroke:#008000,stroke-width:2px;
    classDef tool fill:#ffffcc,stroke:#cccc00,stroke-width:2px;
    classDef docker fill:#cce5ff,stroke:#0056b3,stroke-width:2px;
    classDef util fill:#e6ccff,stroke:#800080,stroke-width:2px;
Loading

Core Components

  1. Agent System (resources/registry/agents/):

    • BaseAgent: The abstract base class for all agents.
    • FileAccessAgent: Manages secure file operations.
    • PythonExecAgent: Oversees code generation and execution within Docker containers.
  2. Services (resources/object_oriented_agents/services/):

    • LanguageModelInterface: An abstract interface for interacting with language models.
    • OpenAILanguageModel: A concrete implementation that leverages the OpenAI API.
    • OpenAIClientFactory: Handles the creation and management of OpenAI API clients.
  3. Utils (resources/object_oriented_agents/utils/):

    • logger.py: Provides a centralized logging system.
    • openai_util.py: Offers utility functions for interacting with the OpenAI API.

Component Interactions Sequence Diagram

sequenceDiagram
    participant User
    participant CLI
    participant FileAgent
    participant CodeAgent
    participant LLM
    participant Docker

    User->>CLI: Run analysis
    CLI->>Docker: Start container
    CLI->>FileAgent: Read file
    FileAgent->>CLI: File content
    CLI->>CodeAgent: Generate & run code
    CodeAgent->>LLM: Request code
    LLM-->>CodeAgent: Return code
    CodeAgent->>Docker: Execute code
    Docker-->>CodeAgent: Results
    CodeAgent->>CLI: Analysis results
    CLI->>User: Display results
Loading

πŸš€ Getting Started

Prerequisites

  • Python 3.10+
  • Docker
  • An active OpenAI API key

Installation

  1. Clone the Repository:

    git clone https://github.com/yourusername/secure-data-analysis.git
    cd secure-data-analysis
  2. Install Dependencies:

    python -m venv venv
    source venv/bin/activate  # On Windows: .\venv\Scripts\activate
    pip install -r requirements.txt
  3. Configure the OpenAI API:

    export OPENAI_API_KEY='your-api-key-here'

Basic Usage

Run the analyzer with a CSV file and your question:

python secure_analyzer.py --file your_data.csv --question "What are the monthly trends?"

Project Image

πŸ”’ Security Features

1. Docker Isolation

  • Executes all code in isolated Docker containers.
  • Runs with a non-root user.
  • Enforces strict resource limits and network isolation.

2. File Access Security

# Example from FileAccessAgent
def safe_file_access(self, filename: str) -> str:
    if not self._is_safe_path(filename):
        return "Error: Invalid file path"
    return self._read_file_safely(filename)

3. Language Model Security

# From OpenAILanguageModel
def generate_completion(
    self,
    model: str,
    messages: List[Dict[str, str]],
    tools: Optional[List[Dict[str, Any]]] = None,
    reasoning_effort: Optional[str] = None
) -> Dict[str, Any]:
    try:
        response = self.openai_client.chat.completions.create(
            model=model,
            messages=messages,
            tools=tools
        )
        return response
    except Exception as e:
        self.logger.error(f"OpenAI call failed: {str(e)}", exc_info=True)
        raise

πŸ“– Example Workflows

1. Basic Data Analysis

python secure_analyzer.py \
    --file sales_data.csv \
    --question "Show me the average sales by quarter"

2. Advanced Analysis with Visualization

python secure_analyzer.py \
    --file traffic_data.csv \
    --question "Create a line plot showing accidents over time_of_day"

πŸ› οΈ Development Guide

Creating a New Tool

Implement the ToolInterface to add custom capabilities:

from ...object_oriented_agents.core_classes.tool_interface import ToolInterface

class CustomAnalysisTool(ToolInterface):
    def get_definition(self) -> Dict[str, Any]:
        return {
            "function": {
                "name": "custom_analysis",
                "description": "Performs custom data analysis",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "analysis_type": {"type": "string"},
                        "columns": {"type": "array", "items": {"type": "string"}}
                    },
                    "required": ["analysis_type", "columns"]
                }
            }
        }

    def run(self, arguments: Dict[str, Any]) -> str:
        # Implementation
        pass

Creating a New Agent

Extend the base agent to create custom agents:

from ...object_oriented_agents.core_classes.base_agent import BaseAgent

class CustomAnalysisAgent(BaseAgent):
    def __init__(
        self,
        developer_prompt: str = "Your custom prompt here",
        model_name: str = "gpt-4",
        logger=None,
        language_model_interface=None
    ):
        super().__init__(
            developer_prompt=developer_prompt,
            model_name=model_name,
            logger=logger,
            language_model_interface=language_model_interface
        )
        self.setup_tools()

    def setup_tools(self) -> None:
        self.tool_manager.register_tool(CustomAnalysisTool())

πŸ“ Logging System

Our project utilizes a hierarchical logging system to streamline monitoring and debugging:

from ...object_oriented_agents.utils.logger import get_logger

logger = get_logger(__name__)

# Usage examples
logger.info("Starting analysis...")
logger.debug("Processing file: %s", filename)
logger.error("Error in analysis", exc_info=True)

Log Configuration:

  • Log Level: Configurable (DEBUG, INFO, ERROR)
  • Output: Both console and file (logs.log)
  • Format: %(asctime)s - %(name)s - %(levelname)s - %(message)s

🀝 Contributing

We welcome contributions! To get started:

  1. Fork the repository.
  2. Create your feature branch (git checkout -b feature/NewFeature).
  3. Commit your changes (git commit -m 'Add NewFeature').
  4. Push to your branch (git push origin feature/NewFeature).
  5. Open a Pull Request.

Development Guidelines

  • Adhere to the PEP 8 style guide.
  • Include unit tests for new features.
  • Update the documentation as needed.
  • Maintain type hints and clear commit messages.

πŸ“„ License

This project is licensed under the MIT License – see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for their groundbreaking language models.
  • Docker for delivering robust containerization.
  • All contributors and users who help improve this project.

πŸ“ž Contact

Sharad Jain – @seekingtroooth – sharadsfo@gmail.com

Project Link: https://github.com/Imsharad/openai-code-interpreter

πŸ“š Additional Resources

πŸ“Š Detailed Diagrams

To help illustrate the architecture and workflow of the project, the following Mermaid diagrams offer a detailed look at the components and their interactions.

1. Component Class Diagram

This diagram shows the core classes, their relationships, and how the agent system is structured:

classDiagram
    class BaseAgent {
        +developer_prompt: str
        +model_name: str
        +messages: ChatMessages
        +task(user_task: str, tool_call_enabled: bool): str
    }
    class FileAccessAgent {
        +setup_tools()
    }
    class PythonExecAgent {
        +setup_tools()
    }
    BaseAgent <|-- FileAccessAgent
    BaseAgent <|-- PythonExecAgent

    class ChatMessages {
        +add_developer_message(content: str)
        +add_user_message(content: str)
        +add_assistant_message(content: str)
    }

    class ToolManager {
        +register_tool(tool: ToolInterface)
        +get_tool_definitions()
    }

    class LanguageModelInterface {
        <<interface>>
        +generate_completion(model: str, messages: List, tools?: List, reasoning_effort?: str): Dict
    }
    class OpenAILanguageModel {
        +generate_completion(model: str, messages: List, tools?: List, reasoning_effort?: str): Dict
    }
    LanguageModelInterface <|.. OpenAILanguageModel

    BaseAgent --> ChatMessages
    BaseAgent --> ToolManager
    BaseAgent --> LanguageModelInterface
Loading

2. Detailed Sequence Diagram

This diagram describes a detailed flow of how an analysis request passes through the different componentsβ€”from initial file access to code execution and results presentation:

sequenceDiagram
    participant User
    participant CLI as "CLI (secure_analyzer.py)"
    participant FA as "FileAccessAgent"
    participant CA as "PythonExecAgent"
    participant TM as "ToolManager"
    participant LLM as "LLM Interface"
    participant Docker as "Docker Container"

    User->>CLI: Start analysis with CSV & question
    CLI->>FA: Call task(file_prompt)
    FA->>TM: Invoke safe_file_access(filename)
    TM->>Docker: Read file content securely
    Docker-->>TM: Return file content
    TM-->>FA: Deliver file content
    FA-->>CLI: Provide file context
    CLI->>CA: Call task(question with context)
    CA->>LLM: Generate code using LLM
    LLM-->>CA: Return generated code
    CA->>TM: Execute code via execute_python_code
    TM->>Docker: Run Python code in container
    Docker-->>TM: Return execution result
    TM-->>CA: Forward execution result
    CA-->>CLI: Return final analysis result
    CLI->>User: Display analysis result
Loading

3. High-Level Data Flow Diagram

This diagram provides an overview of the high-level data flow within the systemβ€”from user input to the execution results reflecting back to the user:

graph LR
    A[User Input: CSV file + Question]:::user
    B["CLI (secure-analyzer.py)"]:::cli
    C[FileAccessAgent]:::agent
    D[PythonExecAgent]:::agent
    E[ToolManager]:::service
    F["LLM (OpenAI API)"]:::external
    G[Docker Container]:::infra
    H[OpenAIClientFactory]:::service
    I[LanguageModelInterface]:::interface
    J[Logger]:::util
    K[SecurityValidator]:::security
    L[BaseAgent]:::abstract
    M[ChatMessages]:::data
    N[ToolInterface]:::interface
    O[PythonExecTool]:::tool
    P[FileAccessTool]:::tool
    
    subgraph Agents
        C
        D
        L
    end
    
    subgraph Services
        E
        H
        I
    end
    
    subgraph Security
        K
        G
    end

    subgraph Core Components
        M
        N
    end

    %% Main flow
    A --> B
    B --> C & D
    C & D --> E
    E --> K
    K --> E
    E --> G & I
    I --> H
    H --> F
    F --> H
    H --> I
    I --> E
    G --> E
    E --> D
    D --> B
    B --> A
    
    %% Class relationships
    L --> C & D
    L --> M
    L --> I
    E --> O & P
    O & P --> N
    J --> C & D & E

    classDef user fill:#c9f7d4,stroke:#2b6620;
    classDef cli fill:#fff4de,stroke:#a2790d;
    classDef agent fill:#d3ddfa,stroke:#1c3a94;
    classDef service fill:#e1d5f7,stroke:#5015a0;
    classDef external fill:#ffd8d8,stroke:#c90000;
    classDef infra fill:#d1f0f6,stroke:#0a708a;
    classDef util fill:#f0f0f0,stroke:#666666;
    classDef interface fill:#fce1e4,stroke:#d60047;
    classDef security fill:#ffebcc,stroke:#e67e00;
    classDef abstract fill:#f8f9fa,stroke:#adb5bd,stroke-dasharray: 5 5;
    classDef data fill:#e3f2fd,stroke:#1976d2;
    classDef tool fill:#f3e5f5,stroke:#9c27b0;

    click G href "https://docs.docker.com/engine/security/" "Docker Security Docs"
    click K href "#security-features" "Project Security Details"
    
    note["πŸ”’ Security Layers:
    1. Path validation
    2. Docker isolation
    3. LLM output sanitization
    4. Resource limits"]:::security
    note -.- G
Loading

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published