Skip to content

Commit 507bff9

Browse files
committed
feat: add DAG parsing profiler tool
1 parent eff0040 commit 507bff9

File tree

4 files changed

+749
-0
lines changed

4 files changed

+749
-0
lines changed

composer/tools/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
# Composer Tools
22

33
* [Composer DAGs Pausing/Unpausing script](composer_dags.md)
4+
* [Composer DAGs Parsing Profiler tool](parsing_profiler/README.md)
Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
# 🚀 Composer DAG Linter & Parsing Profiler
2+
3+
## Overview
4+
This Airflow DAG is a specialized **parsing performance profiler** designed to safeguard and optimize your Google Cloud Composer environment.
5+
6+
When triggered, it offloads the resource-intensive DAG parsing process to a temporary, isolated Kubernetes Pod. Its primary goal is to detect **parsing latency issues** and identify heavy top-level code execution without impacting your environment's workload resources. As a byproduct of this analysis, it also validates DAG integrity and catches syntax errors.
7+
8+
## 🌟 Key Features
9+
* **Isolated Execution:** Offloads parsing logic to a separate Pod, protecting the Scheduler from resource contention and crashes.
10+
* **Top-Level Code Profiling:** Detects DAGs that exceed a configurable parse-time threshold and generates a `cProfile` report to identify the specific calls causing delays (e.g., database connections, heavy imports).
11+
* **Smart Image Detection:** Automatically detects the correct worker image for environments with **extra PyPI packages**, ensuring accurate replication of dependencies and **Airflow overrides**.
12+
* *Note:* If a "Vanilla" (Default) environment is detected, the task will **Skip** gracefully and request manual configuration.
13+
* **Parallel Processing:** Leverages multiprocessing to analyze the entire DAG repository efficiently.
14+
* **Cross-Environment Diagnostics:** Capable of scanning unstable or crashing environments by running this tool from a separate, stable Composer instance.
15+
16+
---
17+
18+
## ⚙️ Quick Setup
19+
20+
### 1. Installation
21+
Upload both files to your Composer environment's `dags/` folder:
22+
* `dag_linter_kubernetes_pod.py` (The Orchestrator)
23+
* `linter_core.py` (The Logic Script)
24+
25+
### 2. Configuration
26+
Open `dag_linter_kubernetes_pod.py`. The tool automatically detects your bucket and image, but you can configure limits:
27+
28+
| Variable | Description |
29+
| :--- | :--- |
30+
| `_CONFIG_GCS_BUCKET_NAME` | The bucket containing your DAGs/Plugins. Set to `None` for auto-detection. |
31+
| `_CONFIG_POD_IMAGE` | **CRITICAL:** Path to your Composer Worker image. Set to `None` for auto-detection.<br><br>**Manual Retrieval (for Vanilla envs or troubleshooting):**<br>1. Check Cloud Build logs.<br>2. Inspect `airflow-worker` YAML in GKE (`image:` field).<br>3. **Support:** Customers with a valid package can contact Google Cloud Support for assistance. |
32+
| `_CONFIG_POD_DISK_SIZE` | Ephemeral storage size for the Pod (ensure this fits your repo size). |
33+
| `_CONFIG_PARSE_TIME_THRESHOLD_SECONDS` | Time limit before a DAG is flagged as "slow". |
34+
35+
### 3. Execution
36+
Trigger the DAG **`composer_dag_parser_profile`** manually from the Airflow UI.
37+
38+
Check the Task Logs for the **`profile_and_check_linter`** task to view the integrity report and performance profiles.

0 commit comments

Comments
 (0)