A comprehensive system for collecting, analyzing, moderating, and sharing events relevant to non-profit organizations.
This project is a complete event management system that:
- Scrapes Events: Collects event information from various websites
- Analyzes with AI: Uses LLM to extract structured data and determine event relevance
- Provides Moderation: Web interface for reviewing and approving events
- Syncs to Calendar: Synchronizes approved events with a Nextcloud calendar
- Displays Events: Website for showcasing approved events
Detailed documentation for each component of the system is available in the docs directory:
- Installation Guide - Complete guide to setting up and using the system
- Scraper Documentation - Details on the event scraper component
- ICS Import Documentation - Guide to importing events from ICS calendars
- Analyzer Documentation - Information about the LLM analysis component
- Sync Documentation - Details on the Directus-Nextcloud sync
- Moderation Interface - Guide to the moderation web interface
- Website Documentation - Information about the website component
- CSS Customization - How to customize the website appearance
- Python 3.6+
- Directus instance for data storage
- Nextcloud with Calendar app for event sharing
- OpenAI API key for LLM analysis
- Web server for hosting the moderation interface (optional)
-
Clone the repository:
git clone https://github.com/yourusername/Event-Scraper.git cd Event-Scraper -
Install dependencies:
# Create a virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install required packages pip install -r requirements.txt
-
Create a
.envfile with your credentials (see.env.examplefor template) -
Run the system components:
# Event Management System python events/event_scraper.py python events/ics_import.py python events/event_analyzer.py python events/calendar_sync.py # Fördermittel (Funding) System python foerdermittel/foerdermittel_scraper.py python foerdermittel/foerdermittel_analyzer.py python foerdermittel/foerdermittel_importer.py # Moderate content using Directus admin interface # Access at: https://calapi.buerofalk.de/admin
The master script provides a convenient way to run all components:
./run_system.sh {command}Available commands:
scrape- Run the scraper onceanalyze- Run the LLM analysis oncesync- Start the sync service (continuous)sync-once- Run the sync service once and exitclean- Clean the Nextcloud calendarall- Run scraper and analysis, then start sync service in backgroundstop- Stop all background servicesstatus- Check the status of all servicessetup-cron- Set up cron jobs for automationremove-cron- Remove cron jobs
python event_scraper.py [options]Options:
--config,-c- Path to configuration file (default: config/sources.json)--directus-config,-d- Path to Directus configuration file (default: config/directus.json)--output,-o- Output directory for scraped data (default: data)--max-events,-m- Maximum events to scrape per source (-1 for all)--verbose,-v- Enable verbose logging--no-directus- Disable Directus database integration--save-html- Save HTML files to disk--cache-dir- Directory to store cache files (default: .cache)--clear-cache- Clear URL cache before running
python event_analyzer.py [options]Options:
--limit,-l- Maximum number of items to process (default: 10)--batch,-b- Batch size for processing (default: 3)--flag-mismatches,-f- Flag events where LLM determination doesn't match human feedback--only-flag,-o- Only flag mismatches without processing new events--log-file- Path to log file for LLM extraction results (default: llm_extraction.log)
python calendar_sync.py [options]Options:
--clean- Clean Nextcloud calendar by removing all non-Directus events--sync-once- Run sync once and exit (this is now the default behavior)--schedule- Enable hourly scheduling (disabled by default)
Event moderation is handled through the Directus admin interface at https://calapi.buerofalk.de/admin. The admin interface provides:
- Filtering and sorting events
- Bulk approval/rejection operations
- Custom fields and workflows
- User permission management
Event-Scraper/ ├── events/ # Event management system │ ├── event_scraper.py # Main event scraper │ ├── event_analyzer.py # LLM-based event analysis │ ├── ics_import.py # ICS calendar import │ ├── calendar_sync.py # Nextcloud calendar sync │ ├── feedback_analyzer.py # Feedback analysis │ └── migrate_to_tags.py # Tag migration utility │ ├── foerdermittel/ # Funding opportunity system │ ├── foerdermittel_scraper.py # Funding program scraper │ ├── foerdermittel_analyzer.py # LLM relevance analysis │ ├── foerdermittel_importer.py # Import to Directus │ ├── README.md # Fördermittel documentation │ └── config/ # Funding sources config │ ├── shared/ # Shared utilities │ ├── __init__.py │ └── directus_client.py # Directus API client │ ├── website/ # Public event website (GitHub Pages) │ ├── src/ # Svelte components │ └── public/ # Built static files │ ├── config/ # Shared configuration │ ├── directus.json.example │ └── nextcloud.json.example │ ├── docs/ # Documentation ├── scripts/ # Utility scripts └── run_system.sh # Master control script A new feature has been added to import events from ICS calendar files:
- Created a dedicated script (
ics_import.py) for importing events from ICS calendars - Added support for HumHub and other ICS calendar sources
- Implemented a configuration system for managing multiple ICS sources
- Events are imported directly into the Directus database for processing by the analyzer
- See ICS Import Documentation for details
The event categorization system has been completely redesigned:
- Removed the legacy category-based system in favor of a more flexible tag-based approach
- Updated the LLM prompt to generate normalized, consistent tags
- Implemented tag grouping (topic, format, audience, cost)
- Added tag frequency filtering to show only commonly used tags
- Improved the UI with consistent styling for tags and time filters
- Enhanced the event cards to display end times alongside start times
- Fixed currency display to use proper Euro symbol (€)
These changes provide a more intuitive and flexible way to organize and filter events.
The date extraction in the LLM analysis script has been improved:
- Removed regex-based date extraction to rely solely on the LLM's extraction capabilities
- Fixed registration link extraction to only match valid URLs
- Added comprehensive logging for better debugging
- Improved override logic to prioritize LLM-extracted dates
The calendar_sync.py script behavior has been changed:
- Now runs once and exits by default (no continuous scheduling)
- Added
--scheduleflag to explicitly enable hourly scheduling if needed - Updated documentation in docs/sync.md with new options and examples
- Added instructions for stopping the sync service if it's running in the background
See Sync Documentation for more details on these changes.
The project has been reorganized for better clarity and maintainability:
- Renamed scripts to follow consistent naming conventions
- Consolidated documentation into a central
docsdirectory - Archived obsolete files and deprecated code
- Updated file references in documentation and scripts