This project is a part of CLEF-CheckThat! Lab's Task2 (2025). Given a noisy, unstructured social media post, the task is to simplify it into a concise form. This system leverages advanced AI models to transform complex, informal claims into clear, normalized statements suitable for fact-checking and analysis.
Demo.mp4
- π Web Application: https://nikhil-kadapala.github.io/clef2025-checkthat-lab-task2/
- β‘ API Backend: Available via the web application interface
- π Python SDK: In development - will be released soon for programmatic access
- Interactive Chat Interface: Real-time claim normalization with streaming responses
- Batch Evaluation: Upload datasets for comprehensive evaluation with multiple models
- Model Support: GPT-4, Claude, Gemini, Llama, and Grok models
- Real-time Progress: WebSocket-based live evaluation tracking
- Self-Refine & Cross-Refine: Advanced refinement algorithms
- METEOR Scoring: Automatic evaluation with detailed metrics
- Modern UI: Responsive design with dark theme
- RESTful Endpoints: Clean API for claim normalization
- WebSocket Support: Real-time evaluation progress updates
- Multiple Models: Support for 8+ AI models
- Streaming Responses: Efficient real-time text generation
- CORS Configured: Ready for cross-origin requests
The system follows a sophisticated pipeline to normalize social media claims:
- Input Processing: Receives noisy, unstructured social media posts
- Model Selection: Chooses from multiple AI models (GPT-4, Claude, Gemini, etc.)
- Normalization: Applies selected prompting strategy:
- Zero-shot: Direct claim normalization
- Few-shot: Example-based learning
- Chain-of-Thought: Step-by-step reasoning
- Self-Refine: Iterative improvement process
- Cross-Refine: Multi-model collaborative refinement
- Evaluation: Automated METEOR scoring for quality assessment
- Output: Clean, normalized claims ready for fact-checking
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Backend: FastAPI + WebSocket + Streaming
- AI Models: OpenAI GPT, Anthropic Claude, Google Gemini, Meta Llama, xAI Grok
- Evaluation: METEOR scoring (nltk) with pandas + numpy
- Deployment: GitHub Pages + Render
- Node.js (v18 or higher)
- Python (v3.8 or higher)
- API Keys for chosen models (OpenAI, Anthropic, Gemini, xAI)
Follow these steps to get the application running locally for development and testing.
The project includes automation scripts that handle the entire setup process:
# 1. Clone the repository git clone <repository-url> cd clef2025-checkthat-lab-task2 # 2. Set up environment variables (see below) # 3. Run automated setup ./setup-project.sh # 4. Start the application ./run-project.shπ Note: You only need to run
./setup-project.shonce for initial setup. After that, use./run-project.shto start the application.
Set these before running the setup script:
# Linux/macOS: export OPENAI_API_KEY="your-openai-key" export ANTHROPIC_API_KEY="your-anthropic-key" export GEMINI_API_KEY="your-gemini-key" export GROK_API_KEY="your-grok-key" # Windows (PowerShell): $env:OPENAI_API_KEY="your-openai-key" $env:ANTHROPIC_API_KEY="your-anthropic-key" $env:GEMINI_API_KEY="your-gemini-key" $env:GROK_API_KEY="your-grok-key"setup-project.sh:
- β Detects your OS (Linux/macOS/Windows)
- β Terminates conflicting processes on port 5173
- β Installs Node.js dependencies for the frontend
- β Fixes npm vulnerabilities automatically
- β Creates Python virtual environment
- β
Installs Python dependencies with
uv(faster) or falls back topip - β Handles cross-platform compatibility
run-project.sh:
- π Starts both frontend and backend simultaneously
- π― Frontend runs on
http://localhost:5173 - π― Backend runs on
http://localhost:8000 - π Graceful shutdown with
Ctrl+C - π Shows process IDs for monitoring
If you prefer manual setup or encounter issues with the scripts:
Backend:
# Create and activate virtual environment python -m venv .venv source .venv/bin/activate # Linux/macOS # .venv\Scripts\activate # Windows # Install Python dependencies pip install -r requirements.txtFrontend:
cd src/app npm installBackend & Frontend:
# Start both servers ./run-project.shAlternatively, you can run them separately:
# Terminal 1: Backend cd src/api && python main.py # Terminal 2: Frontend cd src/app && npm run devOpen your browser and navigate to http://localhost:5173 to see the application.
| Provider | Model | Free Tier | API Key Required |
|---|---|---|---|
| Together.ai | Llama 3.3 70B | β | β |
| OpenAI | GPT-4o, GPT-4.1 | β | β |
| Anthropic | Claude 3.7 Sonnet | β | β |
| Gemini 2.5 Pro, Flash | β | β | |
| xAI | Grok 3 | β | β |
- Zero-shot: Direct claim normalization
- Few-shot: Example-based learning
- Zero-shot-CoT: Chain-of-thought reasoning
- Few-shot-CoT: Examples with reasoning
- Self-Refine: Iterative improvement
- Cross-Refine: Multi-model refinement
Input: "The government is hiding something from us!" Output: "Government transparency concerns have been raised by citizens regarding public information access." METEOR Score: 0.847 # Start both frontend and backend ./run-project.shVisit http://localhost:5173 to access the interactive web interface.
cd src/api python main.pycurl -X POST "http://localhost:8000/chat" \ -H "Content-Type: application/json" \ -d '{ "user_query": "The government is hiding something from us!", "model": "meta-llama/Llama-3.3-70B-Instruct-Turbo-Free" }'# Default usage (Llama 3.3 70B, Zero-shot) python src/claim_norm.py # Custom configuration python src/claim_norm.py -m OpenAI -p Zero-Shot-CoT -it 1For production deployment, you can run the application in production mode:
# Set environment variables export OPENAI_API_KEY="your-openai-key" export ANTHROPIC_API_KEY="your-anthropic-key" export GEMINI_API_KEY="your-gemini-key" export GROK_API_KEY="your-grok-key" # Build frontend for production cd src/app npm run build # Start backend in production mode cd ../api python main.pyπ Note: Docker support is not currently implemented. The application runs natively using Python and Node.js.
The frontend is automatically deployed to GitHub Pages:
- Build:
npm run deploy - Commit and push changes
- GitHub Pages serves from
/docsfolder
The FastAPI backend is deployed to a cloud hosting service with standard configuration:
- Runtime: Python FastAPI application
- Environment: Production environment with API keys configured
- Features: CORS enabled, WebSocket support, streaming responses
- Frontend: React + TypeScript + Vite + Tailwind CSS
- Backend: FastAPI + WebSocket + Streaming
- Evaluation: METEOR scoring with pandas/numpy
- Models: Multiple AI providers with unified interface
- TypeScript for type safety
- ESLint + Prettier for code formatting
- Python type hints
- Error handling and logging
# Make scripts executable chmod +x setup-project.sh run-project.sh# Use Git Bash or WSL bash setup-project.sh bash run-project.sh # Or install WSL if not available wsl --installThe scripts automatically handle port conflicts, but if you encounter issues:
# Kill processes on port 5173 (frontend) # Linux/macOS: lsof -ti:5173 | xargs kill -9 # Windows: netstat -ano | findstr :5173 taskkill /PID <PID> /F # Kill processes on port 8000 (backend) # Linux/macOS: lsof -ti:8000 | xargs kill -9 # Windows: netstat -ano | findstr :8000 taskkill /PID <PID> /F- Ensure environment variables are set before running scripts
- Check for typos in environment variable names
- Verify API keys are valid and have sufficient quota
- Clear node_modules and reinstall dependencies
- Check Node.js version (requires v18+)
- Update npm to latest version:
npm install -g npm@latest
- Ensure virtual environment is activated
- Install missing dependencies:
pip install -r requirements.txt - Check Python version (requires v3.8+)
clef2025-checkthat-lab-task2/ βββ src/ β βββ api/ # FastAPI backend (deployed to Render) β β βββ main.py # Main API server with WebSocket support β βββ app/ # Full-stack web application β β βββ client/ # React frontend application β β β βββ src/ β β β β βββ components/ # React components β β β β βββ contexts/ # React contexts β β β β βββ lib/ # Utility libraries β β β β βββ pages/ # Page components β β β βββ index.html # Main HTML template β β βββ server/ # Development server (Express + Vite) β β βββ shared/ # Shared types and utilities β β β βββ types.ts # TypeScript type definitions β β β βββ prompts.ts # Prompt templates β β β βββ schema.ts # Database schema β β βββ vite.config.ts # Vite configuration β β βββ package.json # Frontend dependencies β βββ utils/ # ML utilities and model interfaces β β βββ evaluate.py # Evaluation logic with METEOR scoring β β βββ self_refine.py # Self-refinement algorithms β β βββ get_model_response.py # Model API orchestration β β βββ prompts.py # Python prompt templates β β βββ gpt.py # OpenAI GPT integration β β βββ llama.py # Llama model integration β β βββ claude.py # Anthropic Claude integration β β βββ gemini.py # Google Gemini integration β β βββ grok.py # xAI Grok integration β βββ data/ # Dataset results and cache β βββ claim_norm.py # CLI tool (legacy interface) βββ data/ # Main datasets β βββ dev.csv # Development dataset β βββ test.csv # Test dataset β βββ dev_data.jsonl # JSONL format development data β βββ dev_data_fixed.jsonl # Corrected development data βββ docs/ # Production build (GitHub Pages) βββ requirements.txt # Python dependencies βββ README.md # Project documentation Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
This project is licensed under the MIT License. See the LICENSE file for more details.
If you use this project in your research, please cite:
@misc{nkadapala-clef2025-checkthat-task2, title={Claim Extraction and Normalization for CLEF-CheckThat! Lab Task 2}, author={Nikhil Kadapala}, year={2025}, url={https://github.com/nikhil-kadapala/clef2025-checkthat-lab-task2} }I'll help you build a comprehensive Python SDK for claim normalization with a refinement loop using DeepEval's G-Eval metric. Let me first explore your current codebase to understand the existing structure and then provide you with a detailed implementation plan.
[8 tools called]
Based on my analysis of your codebase, I can see you have a solid foundation with LLM integrations, claim normalization logic, and partial DeepEval integration. However, there's a syntax error in your refine.py file and several areas that need enhancement for a production-ready SDK. Here's my comprehensive implementation plan:
First, let's fix the syntax error in api/services/refine.py:
# Line 74-76 needs completion if feedback is not None and feedback.score > 0.5: # Add refinement logic here refine_user_prompt = f""" ## Original Query {original_query} ## Current Response {current_claim} ## Feedback {feedback} ## Task Please refine the response based on the feedback provided. """ current_claim = self._refine_claim(refine_user_prompt, self.refine_sys_prompt)Create a new SDK module structure:
clef_claim_norm_sdk/ βββ __init__.py βββ core/ β βββ __init__.py β βββ normalizer.py # Main ClaimNormalizer class β βββ refiner.py # Enhanced refinement logic β βββ evaluator.py # DeepEval G-Eval integration β βββ session_manager.py # Session and state management βββ models/ β βββ __init__.py β βββ llm_providers.py # LLM provider abstractions β βββ schemas.py # Pydantic models β βββ config.py # Configuration management βββ utils/ β βββ __init__.py β βββ decorators.py # Custom decorators β βββ logging.py # Structured logging β βββ metrics.py # Performance metrics β βββ exceptions.py # Custom exceptions βββ api/ β βββ __init__.py β βββ client.py # SDK client interface β βββ async_client.py # Async client interface βββ examples/ βββ basic_usage.py βββ advanced_refinement.py βββ batch_processing.py # clef_claim_norm_sdk/api/client.py from typing import List, Optional, Dict, Any, Union from dataclasses import dataclass from ..core.normalizer import ClaimNormalizer from ..core.refiner import ClaimRefiner from ..core.evaluator import GEvalEvaluator from ..models.config import SDKConfig, ModelConfig, RefinementConfig from ..utils.decorators import retry, log_execution, monitor_performance @dataclass class ClaimResult: """Result container for claim processing""" original_claim: str normalized_claim: str confidence_score: float refinement_iterations: int evaluation_metrics: Dict[str, float] metadata: Dict[str, Any] class ClaimNormalizationSDK: """ Main SDK interface for claim normalization with refinement and evaluation. Example: >>> from clef_claim_norm_sdk import ClaimNormalizationSDK >>> sdk = ClaimNormalizationSDK( ... model_provider="openai", ... model_name="gpt-4", ... api_key="your-api-key" ... ) >>> result = sdk.normalize_claim("The government is hiding something!") >>> print(result.normalized_claim) """ def __init__( self, model_provider: str = "openai", model_name: str = "gpt-4", api_key: Optional[str] = None, config: Optional[SDKConfig] = None, **kwargs ): self.config = config or SDKConfig.from_dict(kwargs) self.normalizer = ClaimNormalizer( model_provider=model_provider, model_name=model_name, api_key=api_key ) self.evaluator = GEvalEvaluator( model_provider=model_provider, model_name=model_name, api_key=api_key ) self.refiner = ClaimRefiner( normalizer=self.normalizer, evaluator=self.evaluator, config=self.config.refinement ) @retry(max_attempts=3, backoff_factor=2) @log_execution @monitor_performance def normalize_claim( self, claim: str, enable_refinement: bool = True, custom_criteria: Optional[List[str]] = None ) -> ClaimResult: """ Normalize a single claim with optional refinement. Args: claim: Raw claim text to normalize enable_refinement: Whether to use refinement loop custom_criteria: Custom evaluation criteria for G-Eval Returns: ClaimResult with normalized claim and metrics """ # Implementation here pass @log_execution @monitor_performance def normalize_batch( self, claims: List[str], batch_size: int = 10, enable_refinement: bool = True, progress_callback: Optional[callable] = None ) -> List[ClaimResult]: """ Process multiple claims in batches. Args: claims: List of raw claims to normalize batch_size: Number of claims to process simultaneously enable_refinement: Whether to use refinement loop progress_callback: Optional callback for progress updates Returns: List of ClaimResult objects """ # Implementation here pass# clef_claim_norm_sdk/core/evaluator.py from typing import List, Dict, Any, Optional import asyncio from deepeval.metrics import GEval from deepeval.test_case import LLMTestCase, LLMTestCaseParams from deepeval.models import GPTModel, AnthropicModel, GeminiModel from ..utils.decorators import async_retry, log_execution from ..utils.exceptions import EvaluationError class GEvalEvaluator: """ G-Eval based evaluator for claim quality assessment. Supports multiple evaluation criteria and custom scoring. """ DEFAULT_CRITERIA = [ "Clarity: Is the claim clear and unambiguous?", "Verifiability: Can this claim be fact-checked with reliable sources?", "Specificity: Is the claim specific rather than vague?", "Neutrality: Is the claim presented in a neutral, factual manner?", "Completeness: Does the claim contain sufficient context to be understood?" ] def __init__( self, model_provider: str, model_name: str, api_key: str, criteria: Optional[List[str]] = None, threshold: float = 0.7 ): self.model_provider = model_provider self.model_name = model_name self.api_key = api_key self.criteria = criteria or self.DEFAULT_CRITERIA self.threshold = threshold self._setup_model() def _setup_model(self): """Initialize the evaluation model based on provider""" if self.model_provider.lower() == "openai": self.model = GPTModel(model=self.model_name, api_key=self.api_key) elif self.model_provider.lower() == "anthropic": self.model = AnthropicModel(model=self.model_name, api_key=self.api_key) elif self.model_provider.lower() == "gemini": self.model = GeminiModel(model=self.model_name, api_key=self.api_key) else: raise ValueError(f"Unsupported model provider: {self.model_provider}") @async_retry(max_attempts=3) @log_execution async def evaluate_claim_quality( self, original_claim: str, normalized_claim: str, custom_criteria: Optional[List[str]] = None ) -> Dict[str, float]: """ Evaluate normalized claim quality using G-Eval. Args: original_claim: Original raw claim normalized_claim: Processed claim to evaluate custom_criteria: Override default evaluation criteria Returns: Dictionary with evaluation scores """ try: criteria = custom_criteria or self.criteria evaluation_results = {} for criterion in criteria: metric = GEval( name=f"Claim_Quality_{criterion.split(':')[0]}", criteria=criterion, evaluation_params=[ LLMTestCaseParams.INPUT, LLMTestCaseParams.ACTUAL_OUTPUT ], model=self.model ) test_case = LLMTestCase( input=original_claim, actual_output=normalized_claim ) # Run evaluation await metric.a_measure(test_case) criterion_name = criterion.split(':')[0].lower() evaluation_results[criterion_name] = metric.score # Calculate overall score evaluation_results['overall_score'] = sum(evaluation_results.values()) / len(evaluation_results) return evaluation_results except Exception as e: raise EvaluationError(f"G-Eval evaluation failed: {str(e)}") def meets_threshold(self, scores: Dict[str, float]) -> bool: """Check if evaluation scores meet the quality threshold""" return scores.get('overall_score', 0) >= self.threshold# clef_claim_norm_sdk/core/refiner.py from typing import Optional, Dict, Any, List import asyncio from ..models.config import RefinementConfig from ..utils.decorators import log_execution, monitor_performance from ..utils.exceptions import RefinementError class ClaimRefiner: """ Advanced claim refinement with G-Eval feedback loop. """ def __init__( self, normalizer, evaluator, config: RefinementConfig ): self.normalizer = normalizer self.evaluator = evaluator self.config = config @log_execution @monitor_performance async def refine_claim( self, original_claim: str, initial_normalized: str, custom_criteria: Optional[List[str]] = None ) -> Dict[str, Any]: """ Refine claim through iterative G-Eval feedback. Args: original_claim: Original raw claim initial_normalized: Initial normalization attempt custom_criteria: Custom evaluation criteria Returns: Dictionary with final claim and refinement metadata """ current_claim = initial_normalized iteration = 0 refinement_history = [] while iteration < self.config.max_iterations: # Evaluate current claim scores = await self.evaluator.evaluate_claim_quality( original_claim, current_claim, custom_criteria ) refinement_history.append({ 'iteration': iteration, 'claim': current_claim, 'scores': scores }) # Check if quality threshold is met if self.evaluator.meets_threshold(scores): break # Generate refinement feedback feedback = self._generate_feedback(scores, original_claim, current_claim) # Apply refinement refined_claim = await self.normalizer.refine_with_feedback( original_claim, current_claim, feedback ) current_claim = refined_claim iteration += 1 return { 'final_claim': current_claim, 'iterations': iteration, 'final_scores': scores, 'history': refinement_history, 'converged': self.evaluator.meets_threshold(scores) } def _generate_feedback( self, scores: Dict[str, float], original_claim: str, current_claim: str ) -> str: """Generate specific feedback based on G-Eval scores""" feedback_parts = [] for criterion, score in scores.items(): if criterion == 'overall_score': continue if score < self.config.improvement_threshold: feedback_parts.append( f"Improve {criterion}: Current score {score:.2f} is below threshold. " f"Focus on making the claim more {criterion.lower()}." ) return "\n".join(feedback_parts)# clef_claim_norm_sdk/models/config.py from pydantic import BaseModel, Field, validator from typing import Dict, List, Optional, Any from enum import Enum class ModelProvider(str, Enum): OPENAI = "openai" ANTHROPIC = "anthropic" GEMINI = "gemini" XAI = "xai" class RefinementConfig(BaseModel): """Configuration for refinement process""" max_iterations: int = Field(default=3, ge=1, le=10) improvement_threshold: float = Field(default=0.6, ge=0.0, le=1.0) convergence_threshold: float = Field(default=0.8, ge=0.0, le=1.0) early_stopping: bool = True class SDKConfig(BaseModel): """Main SDK configuration""" # Model settings model_provider: ModelProvider = ModelProvider.OPENAI model_name: str = "gpt-4" temperature: float = Field(default=0.1, ge=0.0, le=2.0) max_tokens: int = Field(default=1000, ge=1) # Refinement settings refinement: RefinementConfig = Field(default_factory=RefinementConfig) # Evaluation settings evaluation_criteria: Optional[List[str]] = None quality_threshold: float = Field(default=0.7, ge=0.0, le=1.0) # Performance settings request_timeout: int = Field(default=30, ge=1) max_retries: int = Field(default=3, ge=0, le=10) batch_size: int = Field(default=10, ge=1, le=100) # Logging and monitoring enable_logging: bool = True log_level: str = "INFO" enable_metrics: bool = True @classmethod def from_dict(cls, config_dict: Dict[str, Any]) -> 'SDKConfig': """Create config from dictionary""" return cls(**config_dict) @classmethod def from_file(cls, config_path: str) -> 'SDKConfig': """Load config from YAML/JSON file""" import json import yaml from pathlib import Path path = Path(config_path) if path.suffix.lower() == '.json': with open(path) as f: config_dict = json.load(f) elif path.suffix.lower() in ['.yml', '.yaml']: with open(path) as f: config_dict = yaml.safe_load(f) else: raise ValueError(f"Unsupported config file format: {path.suffix}") return cls.from_dict(config_dict)# clef_claim_norm_sdk/utils/decorators.py import functools import time import asyncio import logging from typing import Any, Callable, Optional from .exceptions import RetryExhaustedError, SDKError from .metrics import performance_monitor logger = logging.getLogger(__name__) def retry(max_attempts: int = 3, backoff_factor: float = 2, exceptions: tuple = (Exception,)): """Retry decorator with exponential backoff""" def decorator(func: Callable) -> Callable: @functools.wraps(func) def wrapper(*args, **kwargs): last_exception = None for attempt in range(max_attempts): try: return func(*args, **kwargs) except exceptions as e: last_exception = e if attempt < max_attempts - 1: wait_time = backoff_factor ** attempt logger.warning(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...") time.sleep(wait_time) else: logger.error(f"All {max_attempts} attempts failed") raise RetryExhaustedError(f"Failed after {max_attempts} attempts") from last_exception return wrapper return decorator def async_retry(max_attempts: int = 3, backoff_factor: float = 2, exceptions: tuple = (Exception,)): """Async retry decorator with exponential backoff""" def decorator(func: Callable) -> Callable: @functools.wraps(func) async def wrapper(*args, **kwargs): last_exception = None for attempt in range(max_attempts): try: return await func(*args, **kwargs) except exceptions as e: last_exception = e if attempt < max_attempts - 1: wait_time = backoff_factor ** attempt logger.warning(f"Attempt {attempt + 1} failed: {e}. Retrying in {wait_time}s...") await asyncio.sleep(wait_time) else: logger.error(f"All {max_attempts} attempts failed") raise RetryExhaustedError(f"Failed after {max_attempts} attempts") from last_exception return wrapper return decorator def log_execution(func: Callable) -> Callable: """Log function execution with timing""" @functools.wraps(func) def wrapper(*args, **kwargs): start_time = time.time() logger.info(f"Starting {func.__name__}") try: result = func(*args, **kwargs) execution_time = time.time() - start_time logger.info(f"Completed {func.__name__} in {execution_time:.2f}s") return result except Exception as e: execution_time = time.time() - start_time logger.error(f"Failed {func.__name__} after {execution_time:.2f}s: {e}") raise return wrapper def monitor_performance(func: Callable) -> Callable: """Monitor function performance metrics""" @functools.wraps(func) def wrapper(*args, **kwargs): return performance_monitor.track_execution(func, *args, **kwargs) return wrapper def validate_inputs(**validation_rules): """Validate function inputs""" def decorator(func: Callable) -> Callable: @functools.wraps(func) def wrapper(*args, **kwargs): # Perform validation based on rules for param_name, rule in validation_rules.items(): if param_name in kwargs: value = kwargs[param_name] if not rule(value): raise ValueError(f"Invalid value for {param_name}: {value}") return func(*args, **kwargs) return wrapper return decorator# clef_claim_norm_sdk/api/async_client.py import aiohttp import asyncio from typing import List, Dict, Any, Optional, AsyncGenerator from ..models.schemas import ClaimResult, BatchProcessRequest from ..utils.exceptions import APIError class AsyncClaimNormalizationClient: """ Async client for remote API access to claim normalization service. """ def __init__( self, base_url: str, api_key: Optional[str] = None, timeout: int = 30 ): self.base_url = base_url.rstrip('/') self.api_key = api_key self.timeout = aiohttp.ClientTimeout(total=timeout) async def __aenter__(self): self.session = aiohttp.ClientSession(timeout=self.timeout) return self async def __aexit__(self, exc_type, exc_val, exc_tb): await self.session.close() async def normalize_claim( self, claim: str, enable_refinement: bool = True, custom_criteria: Optional[List[str]] = None ) -> ClaimResult: """Normalize a single claim via API""" url = f"{self.base_url}/api/v1/normalize" payload = { "claim": claim, "enable_refinement": enable_refinement, "custom_criteria": custom_criteria } headers = {"Content-Type": "application/json"} if self.api_key: headers["Authorization"] = f"Bearer {self.api_key}" async with self.session.post(url, json=payload, headers=headers) as response: if response.status == 200: data = await response.json() return ClaimResult(**data) else: error_msg = await response.text() raise APIError(f"API request failed: {response.status} - {error_msg}") async def normalize_batch_stream( self, claims: List[str], batch_size: int = 10, enable_refinement: bool = True ) -> AsyncGenerator[ClaimResult, None]: """Stream batch normalization results""" url = f"{self.base_url}/api/v1/normalize/batch/stream" payload = { "claims": claims, "batch_size": batch_size, "enable_refinement": enable_refinement } headers = {"Content-Type": "application/json"} if self.api_key: headers["Authorization"] = f"Bearer {self.api_key}" async with self.session.post(url, json=payload, headers=headers) as response: if response.status == 200: async for line in response.content: if line: data = await response.json() yield ClaimResult(**data) else: error_msg = await response.text() raise APIError(f"Stream request failed: {response.status} - {error_msg}")# setup.py from setuptools import setup, find_packages with open("README.md", "r", encoding="utf-8") as fh: long_description = fh.read() with open("requirements.txt", "r", encoding="utf-8") as fh: requirements = [line.strip() for line in fh if line.strip() and not line.startswith("#")] setup( name="clef-claim-norm-sdk", version="1.0.0", author="Your Name", author_email="your.email@domain.com", description="Professional SDK for claim normalization with G-Eval refinement", long_description=long_description, long_description_content_type="text/markdown", url="https://github.com/yourusername/clef-claim-norm-sdk", packages=find_packages(), classifiers=[ "Development Status :: 4 - Beta", "Intended Audience :: Developers", "Intended Audience :: Science/Research", "License :: OSI Approved :: MIT License", "Operating System :: OS Independent", "Programming Language :: Python :: 3", "Programming Language :: Python :: 3.8", "Programming Language :: Python :: 3.9", "Programming Language :: Python :: 3.10", "Programming Language :: Python :: 3.11", "Topic :: Scientific/Engineering :: Artificial Intelligence", "Topic :: Text Processing :: Linguistic", ], python_requires=">=3.8", install_requires=requirements, extras_require={ "dev": [ "pytest>=7.0.0", "pytest-asyncio>=0.21.0", "pytest-cov>=4.0.0", "black>=22.0.0", "ruff>=0.0.200", "mypy>=1.0.0", ], "docs": [ "sphinx>=5.0.0", "sphinx-rtd-theme>=1.0.0", "myst-parser>=0.18.0", ], }, entry_points={ "console_scripts": [ "clef-claim-norm=clef_claim_norm_sdk.cli:main", ], }, include_package_data=True, package_data={ "clef_claim_norm_sdk": ["data/*.json", "templates/*.txt"], }, )# tests/test_sdk_integration.py import pytest import asyncio from clef_claim_norm_sdk import ClaimNormalizationSDK from clef_claim_norm_sdk.models.config import SDKConfig, RefinementConfig @pytest.mark.asyncio class TestSDKIntegration: @pytest.fixture def sdk_config(self): return SDKConfig( model_provider="openai", model_name="gpt-4", refinement=RefinementConfig( max_iterations=2, improvement_threshold=0.6 ) ) @pytest.fixture def sdk(self, sdk_config): return ClaimNormalizationSDK( config=sdk_config, api_key="test-api-key" ) async def test_single_claim_normalization(self, sdk): """Test basic claim normalization""" result = await sdk.normalize_claim( "The government is hiding alien technology!" ) assert result.normalized_claim assert result.confidence_score >= 0 assert result.refinement_iterations >= 0 assert 'overall_score' in result.evaluation_metrics async def test_batch_processing(self, sdk): """Test batch claim processing""" claims = [ "The government is hiding alien technology!", "COVID vaccines contain microchips", "Climate change is a hoax" ] results = await sdk.normalize_batch(claims, batch_size=2) assert len(results) == len(claims) for result in results: assert result.normalized_claim assert isinstance(result.evaluation_metrics, dict) async def test_custom_criteria(self, sdk): """Test custom evaluation criteria""" custom_criteria = [ "Scientific accuracy: Is the claim scientifically plausible?", "Evidence requirement: Does the claim require specific evidence?" ] result = await sdk.normalize_claim( "Water boils at 100Β°C at sea level", custom_criteria=custom_criteria ) assert 'scientific_accuracy' in result.evaluation_metrics assert 'evidence_requirement' in result.evaluation_metricsThis implementation plan provides:
- ποΈ Robust Architecture: Modular design with clear separation of concerns
- π§ Industry Standards: Proper decorators, error handling, logging, and monitoring
- π G-Eval Integration: Comprehensive evaluation with custom criteria support
- π Advanced Refinement: Iterative improvement loop with convergence detection
- π API Ready: Both local SDK and remote API client capabilities
- π¦ Distribution Ready: Proper packaging for PyPI publication
- π§ͺ Testing Coverage: Comprehensive test suite for reliability
- π Documentation: Clear examples and API documentation
Implementation Timeline: 4 weeks to complete all phases
Next Immediate Steps:
- Fix the syntax error in
refine.py - Set up the new SDK project structure
- Implement the core
ClaimNormalizerclass - Integrate G-Eval evaluation
- Create comprehensive tests
Would you like me to start implementing any specific component, or would you prefer to discuss any aspects of this plan in more detail?