A Next.js-based web application for validating and cleansing employee data using AI-powered analysis. The system provides real-time validation, duplicate detection, and data standardization capabilities while maintaining data privacy and security.
- 📤 Drag-and-drop file upload interface
- 🧪 Try it out: You can load the included example file
employee_records.xlsx
to test the application functionality. - 🤖 AI-powered data validation and cleansing
- 🔍 Fuzzy duplicate detection with configurable thresholds
- 📊 Interactive data grid with inline editing
- 📝 Change tracking and history
- 📑 Standardized Excel export
- 🔒 No permanent data storage (in-memory processing)
- ⚡ Real-time validation and feedback
- 🎯 Configurable validation rules and thresholds
- Frontend Framework: Next.js 14 with TypeScript
- Styling: TailwindCSS
- AI Integration: Anthropic API
- File Processing: SheetJS
- State Management: React Context
- UI Components: shadcn/ui
- Charts: Recharts
- Development Tools: ESLint (Airbnb config)
- Node.js 18.0.0 or higher
- npm or yarn
- Anthropic API key
- Clone the repository:
git clone https://github.com/yourusername/ai-data-cleansing.git
cd ai-data-cleansing
- Install dependencies:
npm install
# or
yarn install
- Create a
.env.local
file in the root directory:
ANTHROPIC_API_KEY=your_api_key_here
- Start the development server:
npm run dev
# or
yarn dev
The application follows a modular architecture with clear separation of concerns:
DataContext
: Manages the application's data stateConfigContext
: Handles validation and processing configurationsFileUpload
: Handles file upload and initial processingDataGrid
: Interactive data display and editing interfaceValidationIndicator
: Visual feedback for validation status
anthropicService
: Handles AI-powered validationexcelProcessor
: Processes Excel file uploadsemployeeValidator
: Implements validation rules
useAIValidation
: Manages AI validation processuseDuplicateDetection
: Handles duplicate record detectionuseValidation
: Implements client-side validation rulesuseFileUpload
: Manages file upload process
The system processes employee records with the following structure:
interface EmployeeRecord {
firstName: string;
lastName: string;
dob: string;
email: string;
// ... (see types/employee/EmployeeRecord.ts for full structure)
}
The system implements various validation rules:
- Required field validation
- Email format validation
- Phone number format standardization
- Date format validation
- Business logic validation (e.g., FT/PT status vs. hours)
- Salary calculations
- Address standardization
- Duplicate detection
Validation and processing settings can be adjusted through the UI or by modifying:
src/constants/validation.ts
: Validation thresholds and rulessrc/constants/duplication.ts
: Duplicate detection settings
-
Code Style
- Follow Airbnb ESLint configuration
- Use TypeScript strict mode
- Implement proper error handling
- Document complex logic
- Use meaningful variable names
-
Git Workflow
- Feature branches
- Pull request reviews
- Conventional commits
- Version tagging
-
Testing
- Unit tests for validation rules
- Integration tests for file processing
- UI component testing
- End-to-end validation flows
- No permanent storage of sensitive data
- All processing done in-memory
- Secure API key handling
- Input sanitization
- Rate limiting implementation
- File type validation
- Chunked data processing
- Progressive loading
- Debounced validations
- Memory management
- Caching strategies
The system implements comprehensive error handling:
- Validation errors with clear messages
- Processing failures with recovery options
- API error handling with retries
- User-friendly error notifications
The application can be deployed using Vercel:
npm run build
vercel deploy
Environment variables must be configured in the deployment platform.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature
) - Commit your changes (
git commit -m 'Add some AmazingFeature'
) - Push to the branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
2025 by Innovategy Oy is licensed under CC BY 4.0
For support, please open an issue in the GitHub repository or contact the development team.
- Anthropic for providing the AI capabilities
- SheetJS for Excel file processing
- shadcn/ui for UI components
- The Next.js team for the amazing framework