Skip to content

feat(refactoring): Support Structured Logging (JSON)#30170

Merged
crazywoola merged 17 commits intolanggenius:mainfrom
41tair:feat/log-formatter
Jan 4, 2026
Merged

feat(refactoring): Support Structured Logging (JSON)#30170
crazywoola merged 17 commits intolanggenius:mainfrom
41tair:feat/log-formatter

Conversation

@41tair
Copy link
Member

@41tair 41tair commented Dec 25, 2025

Important

  1. Make sure you have read our contribution guidelines
  2. Ensure there is an associated issue and you have been assigned to it
  3. Use the correct syntax to link this PR: Fixes #<issue number>.

Fixes #30169

Summary

What

This PR introduces structured logging support (JSON format) to the backend service. It standardizes log outputs, making them machine-readable and consistent.

Why

To better integrate with modern log aggregation and observability systems (e.g., ELK, Datadog, Loki). Structured logging eliminates the need for complex regex parsing and improves query capabilities on log fields.

Key Changes

  • Refactored logging implementation to support JSON serialization.
  • Unified log field naming (e.g., level, msg, ts, trace_id).
  • Added configuration to toggle between console (text) and json formats (defaulting to text for dev environments).

Fixes #30169

Screenshots

Before After
image image

Checklist

  • This change requires a documentation update, included: Dify Document
  • I understand that this PR may be closed in case there was no previous discussion or issues. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.
  • I ran dev/reformat(backend) and cd web && npx lint-staged(frontend) to appease the lint gods
@dosubot dosubot bot added size:XXL This PR changes 1000+ lines, ignoring generated files. 💪 enhancement New feature or request labels Dec 25, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @41tair, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the backend's logging infrastructure to support structured JSON logging. The primary motivation is to enhance observability and simplify log analysis by providing machine-readable logs that are easier to parse, query, and aggregate in modern logging systems. It introduces a new logging module with context variables for request and trace IDs, configurable output formats, and improved integration with OpenTelemetry for distributed tracing, ensuring a unified and consistent logging experience across the application, including Flask requests and Celery tasks.

Highlights

  • Structured Logging (JSON): Introduced comprehensive support for structured logging in JSON format, making log outputs machine-readable and consistent for better integration with log aggregation systems like ELK, Datadog, and Loki.
  • Configurable Log Output Format: Added a new configuration option, LOG_OUTPUT_FORMAT, allowing users to easily toggle between human-readable 'text' logs (default for development) and 'json' structured logs for production environments.
  • Unified Log Context and Field Naming: Implemented a new core.logging module that provides request-scoped context variables for request_id and trace_id, ensuring consistent log field naming (e.g., level, msg, ts, trace_id, span_id, identity).
  • Distributed Tracing Integration: Enhanced distributed tracing by integrating OpenTelemetry context into log records and injecting W3C traceparent headers into outgoing HTTP requests (via ssrf_proxy and plugin daemon calls), even when OpenTelemetry instrumentation might not fully cover specific call paths.
  • Improved Exception Handling in OpenTelemetry: Modified the OpenTelemetry exception logging handler to record exceptions directly on the current active span, rather than creating new spans, ensuring better trace context consistency for error reporting.
  • Celery Task Context Initialization: Ensured that Celery tasks also initialize a logging context, allowing for consistent request_id and trace_id generation within asynchronous background jobs.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a robust structured logging implementation, which is a fantastic improvement for observability. The new core.logging module is well-structured, using contextvars for request context and providing a flexible JSON formatter. The changes are thoughtfully applied across the application, including Flask request hooks and Celery tasks, ensuring consistent trace and request IDs. My review includes several suggestions, primarily focused on moving local imports to the top level of modules for better code style and maintainability. I also noted a small opportunity for code simplification in the new JSON formatter and a suggestion to enhance test coverage for one of the identity filters. Overall, this is a high-quality contribution that significantly enhances the application's logging capabilities.

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@QuantumGhost QuantumGhost requested a review from fatelei December 25, 2025 10:21
Copy link
Member

@crazywoola crazywoola left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces structured logging support with JSON format to improve integration with modern log aggregation and observability platforms (ELK, Datadog, Loki, etc.). The implementation standardizes log outputs with consistent field naming and provides configurable format switching between human-readable text and machine-readable JSON.

Key Changes

  • Introduced new logging infrastructure with context-aware filters, JSON formatter, and request-scoped context variables using Python's contextvars
  • Integrated OpenTelemetry trace propagation through W3C traceparent headers for distributed tracing across services
  • Added configuration option LOG_OUTPUT_FORMAT to toggle between text and json formats (defaults to text for backward compatibility)

Reviewed changes

Copilot reviewed 19 out of 20 changed files in this pull request and generated no comments.

Show a summary per file
File Description
api/core/logging/context.py New module providing framework-agnostic request context using contextvars for thread-safe logging
api/core/logging/filters.py New logging filters that extract trace/span IDs from OpenTelemetry and user identity from Flask-Login
api/core/logging/structured_formatter.py New JSON formatter that outputs structured logs with standardized fields (ts, severity, trace_id, identity, etc.)
api/core/logging/__init__.py Module initialization exposing logging components
api/extensions/ext_logging.py Refactored to support both text and JSON formats, added filter integration, maintained backward compatibility
api/extensions/ext_celery.py Initialize logging context for Celery tasks similar to Flask request lifecycle
api/extensions/otel/instrumentation.py Simplified exception logging to record on current span instead of creating new spans
api/libs/external_api.py Removed explicit log_exception call to avoid duplicate logging (framework handles this)
api/app_factory.py Initialize request context on each request, inject X-Span-Id header alongside X-Trace-Id
api/configs/feature/__init__.py Added LOG_OUTPUT_FORMAT configuration option
api/core/helper/trace_id_helper.py Added helpers for span ID extraction and W3C traceparent header generation
api/core/helper/ssrf_proxy.py Inject traceparent headers for distributed tracing when OpenTelemetry is disabled
api/core/plugin/impl/base.py Inject traceparent headers for plugin daemon requests
api/tests/unit_tests/core/logging/test_context.py Comprehensive tests for logging context module
api/tests/unit_tests/core/logging/test_filters.py Tests for trace and identity context filters
api/tests/unit_tests/core/logging/test_structured_formatter.py Tests for JSON formatter with various scenarios
api/tests/unit_tests/core/logging/test_trace_helpers.py Tests for trace helper functions
api/tests/unit_tests/core/helper/test_ssrf_proxy.py Updated tests for SSRF proxy with new tracing behavior
api/tests/unit_tests/libs/test_external_api.py Simplified test by removing sys.exc_info mocking (no longer needed)

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@41tair 41tair force-pushed the feat/log-formatter branch from 0e81b2d to ab5df57 Compare December 31, 2025 03:08
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Jan 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

💪 enhancement New feature or request lgtm This PR has been approved by a maintainer size:XXL This PR changes 1000+ lines, ignoring generated files.

8 participants