MindTrial: Evaluate and compare AI language models (LLMs) on text-based tasks with optional file/image attachments and tool use. Supports multiple providers (OpenAI, Google, Anthropic, DeepSeek, Mistral AI, xAI, Alibaba, Moonshot AI), custom tasks in YAML, and HTML/CSV reports.
nlp opensource openai xai artificial-intelligence-projects anthropic ai-tool qwen deepseek mistral-ai llm-evaluation-framework google-gemini-ai llm-benchmarking moonshot-ai language-models-ai llm-comparison ai-benchmark ai-evaluation-tools grok-ai ai-model-comparison
- Updated
Nov 21, 2025 - Go