Skip to content

exlab-code/digikal

Repository files navigation

Event Scraper & Management System

A comprehensive system for collecting, analyzing, moderating, and sharing events relevant to non-profit organizations.

Overview

This project is a complete event management system that:

  1. Scrapes Events: Collects event information from various websites
  2. Analyzes with AI: Uses LLM to extract structured data and determine event relevance
  3. Provides Moderation: Web interface for reviewing and approving events
  4. Syncs to Calendar: Synchronizes approved events with a Nextcloud calendar
  5. Displays Events: Website for showcasing approved events

Documentation

Detailed documentation for each component of the system is available in the docs directory:

System Requirements

  • Python 3.6+
  • Directus instance for data storage
  • Nextcloud with Calendar app for event sharing
  • OpenAI API key for LLM analysis
  • Web server for hosting the moderation interface (optional)

Quick Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/Event-Scraper.git cd Event-Scraper
  2. Install dependencies:

    # Create a virtual environment (recommended) python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate # Install required packages pip install -r requirements.txt
  3. Create a .env file with your credentials (see .env.example for template)

  4. Run the system components:

    # Event Management System python events/event_scraper.py python events/ics_import.py python events/event_analyzer.py python events/calendar_sync.py # Fördermittel (Funding) System python foerdermittel/foerdermittel_scraper.py python foerdermittel/foerdermittel_analyzer.py python foerdermittel/foerdermittel_importer.py # Moderate content using Directus admin interface # Access at: https://calapi.buerofalk.de/admin

Command Line Arguments

Master Script (run_system.sh)

The master script provides a convenient way to run all components:

./run_system.sh {command}

Available commands:

  • scrape - Run the scraper once
  • analyze - Run the LLM analysis once
  • sync - Start the sync service (continuous)
  • sync-once - Run the sync service once and exit
  • clean - Clean the Nextcloud calendar
  • all - Run scraper and analysis, then start sync service in background
  • stop - Stop all background services
  • status - Check the status of all services
  • setup-cron - Set up cron jobs for automation
  • remove-cron - Remove cron jobs

Scraper (event_scraper.py)

python event_scraper.py [options]

Options:

  • --config, -c - Path to configuration file (default: config/sources.json)
  • --directus-config, -d - Path to Directus configuration file (default: config/directus.json)
  • --output, -o - Output directory for scraped data (default: data)
  • --max-events, -m - Maximum events to scrape per source (-1 for all)
  • --verbose, -v - Enable verbose logging
  • --no-directus - Disable Directus database integration
  • --save-html - Save HTML files to disk
  • --cache-dir - Directory to store cache files (default: .cache)
  • --clear-cache - Clear URL cache before running

LLM Analysis (event_analyzer.py)

python event_analyzer.py [options]

Options:

  • --limit, -l - Maximum number of items to process (default: 10)
  • --batch, -b - Batch size for processing (default: 3)
  • --flag-mismatches, -f - Flag events where LLM determination doesn't match human feedback
  • --only-flag, -o - Only flag mismatches without processing new events
  • --log-file - Path to log file for LLM extraction results (default: llm_extraction.log)

Sync Events (calendar_sync.py)

python calendar_sync.py [options]

Options:

  • --clean - Clean Nextcloud calendar by removing all non-Directus events
  • --sync-once - Run sync once and exit (this is now the default behavior)
  • --schedule - Enable hourly scheduling (disabled by default)

Event Moderation

Event moderation is handled through the Directus admin interface at https://calapi.buerofalk.de/admin. The admin interface provides:

  • Filtering and sorting events
  • Bulk approval/rejection operations
  • Custom fields and workflows
  • User permission management

Project Structure

Event-Scraper/ ├── events/ # Event management system │ ├── event_scraper.py # Main event scraper │ ├── event_analyzer.py # LLM-based event analysis │ ├── ics_import.py # ICS calendar import │ ├── calendar_sync.py # Nextcloud calendar sync │ ├── feedback_analyzer.py # Feedback analysis │ └── migrate_to_tags.py # Tag migration utility │ ├── foerdermittel/ # Funding opportunity system │ ├── foerdermittel_scraper.py # Funding program scraper │ ├── foerdermittel_analyzer.py # LLM relevance analysis │ ├── foerdermittel_importer.py # Import to Directus │ ├── README.md # Fördermittel documentation │ └── config/ # Funding sources config │ ├── shared/ # Shared utilities │ ├── __init__.py │ └── directus_client.py # Directus API client │ ├── website/ # Public event website (GitHub Pages) │ ├── src/ # Svelte components │ └── public/ # Built static files │ ├── config/ # Shared configuration │ ├── directus.json.example │ └── nextcloud.json.example │ ├── docs/ # Documentation ├── scripts/ # Utility scripts └── run_system.sh # Master control script 

Recent Updates (April 2025)

Added ICS Calendar Import

A new feature has been added to import events from ICS calendar files:

  • Created a dedicated script (ics_import.py) for importing events from ICS calendars
  • Added support for HumHub and other ICS calendar sources
  • Implemented a configuration system for managing multiple ICS sources
  • Events are imported directly into the Directus database for processing by the analyzer
  • See ICS Import Documentation for details

Migrated from Categories to Tags-Based System

The event categorization system has been completely redesigned:

  • Removed the legacy category-based system in favor of a more flexible tag-based approach
  • Updated the LLM prompt to generate normalized, consistent tags
  • Implemented tag grouping (topic, format, audience, cost)
  • Added tag frequency filtering to show only commonly used tags
  • Improved the UI with consistent styling for tags and time filters
  • Enhanced the event cards to display end times alongside start times
  • Fixed currency display to use proper Euro symbol (€)

These changes provide a more intuitive and flexible way to organize and filter events.

Improved Date Extraction in LLM Analysis

The date extraction in the LLM analysis script has been improved:

  • Removed regex-based date extraction to rely solely on the LLM's extraction capabilities
  • Fixed registration link extraction to only match valid URLs
  • Added comprehensive logging for better debugging
  • Improved override logic to prioritize LLM-extracted dates

Modified Sync Script Behavior

The calendar_sync.py script behavior has been changed:

  • Now runs once and exits by default (no continuous scheduling)
  • Added --schedule flag to explicitly enable hourly scheduling if needed
  • Updated documentation in docs/sync.md with new options and examples
  • Added instructions for stopping the sync service if it's running in the background

See Sync Documentation for more details on these changes.

Project Reorganization

The project has been reorganized for better clarity and maintainability:

  • Renamed scripts to follow consistent naming conventions
  • Consolidated documentation into a central docs directory
  • Archived obsolete files and deprecated code
  • Updated file references in documentation and scripts

License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •