AI-powered recruitment system that uses semantic matching to find the best-fit candidates using FAISS vector database and Gemini AI.
- π― Semantic Resume Matching - Goes beyond keyword matching using AI embeddings
- π Fast Vector Search - Lightning-fast similarity search with FAISS
- π€ AI-Powered Explanations - Gemini AI generates detailed match explanations
- π Multi-Format Support - Process PDF, DOCX, and TXT resume files
- π‘ Clean Architecture - Modular microservices design with FastAPI + Streamlit
- π Analytics Dashboard - Comprehensive insights and matching analytics
- β‘ Real-Time Processing - Instant resume processing and matching
- π Advanced Filtering - Filter by skills, experience, location, and more
ai_recruitr/ βββ backend/ # FastAPI microservices β βββ services/ # Core business logic β βββ api/ # REST API endpoints β βββ models/ # Pydantic schemas βββ frontend/ # Streamlit UI β βββ pages/ # UI pages β βββ components/ # Reusable components βββ config/ # Configuration βββ data/ # Data storage βββ utils/ # Utilities | Component | Technology |
|---|---|
| Backend | FastAPI + Python 3.9+ |
| Frontend | Streamlit |
| Embeddings | mxbai-embed-large-v1 (Hugging Face) |
| Vector DB | FAISS |
| LLM | Google Gemini |
| Resume Parsing | PyMuPDF, python-docx |
| Data Processing | Pandas, NumPy |
- Python 3.9 or higher
- Git
- Google Gemini API key
git clone https://github.com/yourusername/ai-recruitr.git cd ai-recruitr# Create virtual environment python -m venv venv # Activate virtual environment # On Windows: venv\Scripts\activate # On macOS/Linux: source venv/bin/activatepip install -r requirements.txtCreate a .env file in the project root:
cp .env.example .envEdit .env and add your API keys:
# Required: Google Gemini API Key GEMINI_API_KEY=your_gemini_api_key_here # Optional: Customize settings API_HOST=localhost API_PORT=8000 STREAMLIT_HOST=localhost STREAMLIT_PORT=8501 LOG_LEVEL=INFO- Go to Google AI Studio
- Create a new API key
- Copy and paste it into your
.envfile
# Terminal 1: Start FastAPI backend python -m backend.main # Terminal 2: Start Streamlit frontend streamlit run frontend/app.py# Start backend start_backend.bat # Start frontend start_frontend.bat# Start backend ./start_backend.sh # Start frontend ./start_frontend.sh- Streamlit UI: http://localhost:8501
- FastAPI Docs: http://localhost:8000/docs
- API Health: http://localhost:8000/health
- Navigate to "π Upload Resumes" page
- Drag and drop PDF/DOCX resume files
- Click "π Process All Files"
- Wait for processing to complete
- Go to "π Job Matching" page
- Fill in the job description form:
- Job title
- Detailed job description
- Required skills
- Experience level
- Click "π Find Matching Resumes"
- Review the matching results
- Visit "π Results & Analytics" page
- View current matching results
- Explore analytics and insights
- Export data in JSON/CSV format
| Variable | Description | Default |
|---|---|---|
GEMINI_API_KEY | Google Gemini API key | Required |
API_HOST | FastAPI host | localhost |
API_PORT | FastAPI port | 8000 |
STREAMLIT_HOST | Streamlit host | localhost |
STREAMLIT_PORT | Streamlit port | 8501 |
LOG_LEVEL | Logging level | INFO |
MAX_FILE_SIZE | Max upload size (bytes) | 10485760 (10MB) |
TOP_K_MATCHES | Default max matches | 10 |
SIMILARITY_THRESHOLD | Default threshold | 0.7 |
The system uses:
- Embedding Model:
mixedbread-ai/mxbai-embed-large-v1 - LLM:
gemini-pro - Vector Dimension: 1024
- Max Sequence Length: 512 tokens
curl -X POST "http://localhost:8000/api/v1/upload-resume" \ -H "Content-Type: multipart/form-data" \ -F "file=@resume.pdf"curl -X POST "http://localhost:8000/api/v1/match-job" \ -H "Content-Type: application/json" \ -d '{ "job_description": { "title": "Senior Python Developer", "description": "We are looking for...", "skills_required": ["Python", "Django", "PostgreSQL"] }, "top_k": 10, "similarity_threshold": 0.7 }'curl "http://localhost:8000/api/v1/resumes/count"ai_recruitr/ βββ π backend/ β βββ __init__.py β βββ main.py # FastAPI application β βββ π api/ β β βββ __init__.py β β βββ routes.py # API endpoints β βββ π models/ β β βββ __init__.py β β βββ schemas.py # Pydantic models β βββ π services/ β βββ __init__.py β βββ embedding_service.py # mxbai embeddings β βββ faiss_service.py # Vector database β βββ gemini_service.py # Gemini LLM β βββ resume_parser.py # Resume processing βββ π frontend/ β βββ __init__.py β βββ app.py # Streamlit main app β βββ π pages/ β β βββ __init__.py β β βββ upload_resume.py # Upload interface β β βββ job_matching.py # Matching interface β β βββ results.py # Analytics dashboard β βββ π components/ β βββ __init__.py β βββ ui_components.py # Reusable UI components βββ π config/ β βββ __init__.py β βββ settings.py # Configuration βββ π data/ β βββ π resumes/ # Uploaded resumes β βββ π faiss_index/ # FAISS index files β βββ π processed/ # Processed data βββ π utils/ β βββ __init__.py β βββ helpers.py # Utility functions βββ requirements.txt # Python dependencies βββ .env.example # Environment template βββ .gitignore # Git ignore rules βββ README.md # This file Problem: Missing or invalid Gemini API key.
Solution:
# Check your .env file cat .env # Ensure GEMINI_API_KEY is set echo $GEMINI_API_KEYProblem: FAISS installation fails on some systems.
Solution:
# Try installing CPU version specifically pip install faiss-cpu==1.7.4 # On macOS with Apple Silicon: conda install -c pytorch faiss-cpuProblem: PDF text extraction returns empty content.
Solution:
- Ensure PDFs are text-based, not scanned images
- Try converting PDFs to text format first
- Check file permissions
Problem: Frontend can't connect to FastAPI backend.
Solution:
# Check if backend is running curl http://localhost:8000/health # Verify ports in .env file grep -E "(API_PORT|STREAMLIT_PORT)" .envProblem: Embedding generation takes too long.
Solution:
- Check if you have GPU available
- Reduce batch size in processing
- Consider using smaller embedding model for testing
Enable debug logging:
# Set in .env LOG_LEVEL=DEBUG # Or run with debug python -m backend.main --log-level DEBUG- Change default ports
- Set up proper CORS origins
- Use environment-specific API keys
- Enable HTTPS
- Implement rate limiting
- Add authentication
- Secure file uploads
- Monitor API usage
- Implement data retention policies
- Add resume deletion functionality
- Encrypt sensitive data
- Audit API access
- Comply with GDPR/privacy laws
- Database: Replace FAISS with Pinecone/Weaviate for production
- Caching: Add Redis for embedding caching
- Queue: Use Celery for async processing
- Load Balancing: Deploy with multiple API instances
- Multi-language Support: Add language detection
- Resume Scoring: Implement comprehensive scoring
- Bias Detection: Add fairness checking
- Integration: Connect with LinkedIn, ATS systems
- Real-time Updates: WebSocket for live updates
- Fork the repository
- Create feature branch (
git checkout -b feature/amazing-feature) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open Pull Request
# Install development dependencies pip install -r requirements-dev.txt # Run tests pytest tests/ # Format code black . isort . # Lint code flake8 .This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for mxbai embeddings
- Google for Gemini LLM
- Facebook Research for FAISS
- FastAPI team
- Streamlit team
- π§ Email: support@ai-recruitr.com
- π¬ Discord: AI Recruitr Community
- π Issues: GitHub Issues
- π Documentation: Full Docs
Made with β€οΈ for smarter recruiting