An intelligent Telegram bot that extracts event information from web pages using Playwright for browser automation and OpenAI GPT-4o for intelligent data extraction.
- 🌐 Universal Web Scraping: Handles JavaScript-heavy sites (Lu.ma, Meetup, etc.) with Playwright
- 🧠 AI-Powered Extraction: Uses GPT-4o for intelligent event/update data extraction
- 📊 Airtable Integration: Automatically saves events and updates to organized tables
- ⚡ Fast Processing: ~5-10 second response times
- 🛡️ Robust Error Handling: Graceful failures with helpful user feedback
- 📈 Weekly Summaries: Generate newsletter-style event and update summaries
User Input (URL) → Playwright (Render Page) → OpenAI (Extract Data) → Airtable (Save) → User Feedback - Playwright: Handles modern JavaScript-heavy event platforms
- OpenAI GPT-4o: Intelligent, context-aware data extraction
- Direct Integration: No third-party scraping services, full control
- Cost Effective: Only OpenAI API costs (~$20-50/month typical usage)
/start- Welcome message and usage guide/weeklyweave- Generate weekly summary of events and updates
event: https://lu.ma/event-link event: https://meetup.com/group/events/123456 event: https://eventbrite.com/e/event-name-123456 update: https://techcrunch.com/article-link update: Just wanted to share that our meetup went great! - Lu.ma events - Full dynamic content support
- Meetup.com - Comprehensive event details
- News sites - TechCrunch, Wired, etc.
- Simple event pages - Static HTML sites
- Blog posts - Personal and corporate blogs
- Eventbrite - May be blocked due to anti-bot measures
- Facebook Events - Requires authentication
- LinkedIn Events - Anti-scraping protection
TELEGRAM_BOT_TOKEN=your_telegram_bot_token OPENAI_API_KEY=your_openai_api_key AIRTABLE_API_KEY=your_airtable_api_key AIRTABLE_BASE_ID=your_airtable_base_id AIRTABLE_TABLE_NAME=EventsAIRTABLE_TABLE_ID=optional_events_table_id AIRTABLE_VIEW_ID=optional_events_view_id AIRTABLE_UPDATES_TABLE_NAME=Updates AIRTABLE_UPDATES_TABLE_ID=optional_updates_table_id AIRTABLE_UPDATES_VIEW_ID=optional_updates_view_id- Fork this repository
- Connect to Render
- Set environment variables
- Deploy as Worker service
# Build image docker build -t weavebot . # Run container docker run -d \ --name weavebot \ -e TELEGRAM_BOT_TOKEN=your_token \ -e OPENAI_API_KEY=your_key \ -e AIRTABLE_API_KEY=your_key \ -e AIRTABLE_BASE_ID=your_base_id \ -e AIRTABLE_TABLE_NAME=Events \ weavebot# Install dependencies pip install -r requirements.txt # Install Playwright browsers playwright install chromium # Set environment variables in .env file cp .env.example .env # Edit .env with your keys # Run the bot python bot.py- Event Title (Text)
- Description (Long Text)
- Start Datetime (Date/Time)
- End Datetime (Date/Time)
- Location (Text)
- Link (URL)
- Content (Long Text)
- Received At (Date/Time - auto-generated)
- Cold Start: ~5-10 seconds
- Warm Processing: ~3-5 seconds
- Memory Usage: ~150-200MB
- Browser Overhead: Minimal (headless Chromium)
This version removes ScrapeGraphAI in favor of a cleaner architecture:
- Complex setup with multiple dependencies
- ScrapeGraphAI reliability issues
- Credit-based pricing confusion
- Performance overhead
- Direct Playwright + OpenAI integration
- Predictable OpenAI-only costs
- Better error handling and logging
- Faster processing times
WeaveBot includes a comprehensive test suite with 22 tests covering all functionality:
# Run all tests python3 run_tests.py all # Run only unit tests (fast) python3 run_tests.py unit # Run with coverage report python3 run_tests.py coverage- ✅ Date validation and formatting
- ✅ OpenAI data extraction with mocking
- ✅ Playwright browser automation
- ✅ Airtable integration and data mapping
- ✅ Newsletter generation and formatting
- ✅ End-to-end workflow testing
- ✅ Comprehensive error handling
- GitHub Actions: Automated testing on push/PR
- Multiple Python versions: 3.9, 3.10, 3.11
- Code quality: Linting with flake8, black, isort
- Coverage reporting: Integrated with Codecov
See Testing Guide for detailed documentation.
WeaveBot/ ├── bot.py # Main bot logic ├── test_bot.py # Comprehensive test suite ├── run_tests.py # Test runner script ├── pytest.ini # Test configuration ├── requirements.txt # Python dependencies ├── Dockerfile # Container configuration ├── render.yaml # Render deployment config ├── docs/ # Documentation │ ├── testing.md # Testing guide │ └── python-revert-analysis.md └── README.md # This file - Event Processing:
scrape_event_data()+extract_event_data_with_openai() - Update Processing:
scrape_update_data()+extract_update_data_with_openai() - Browser Automation:
get_html_with_playwright() - Data Storage:
save_event_to_airtable()+save_update_to_airtable()
Bot not responding
- Check Telegram bot token
- Verify internet connectivity
- Check logs for error messages
Scraping failures
- Some sites block automated access
- Try different event platforms (Lu.ma, Meetup)
- Check if URL is accessible manually
Airtable errors
- Verify API key and base ID
- Check table names match exactly
- Ensure required fields exist in tables
The bot provides detailed logging for debugging:
# View logs in production docker logs weavebot # Local development python bot.py # Logs print to consoleTrack your bot usage:
- Successful Events: Check Airtable Events table
- Updates Processed: Check Airtable Updates table
- Error Rates: Monitor application logs
- Response Times: Built-in timing logs
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details
For issues or questions:
- Check the troubleshooting section
- Review application logs
- Open a GitHub issue with details
Built with ❤️ using Python, Playwright, and OpenAI GPT-4o