AI-powered facial skin quality analysis
Upload a selfie. Get instant 7-zone scoring, concern heatmaps, and biological age estimation.
Live Demo · Documentation · API Reference
An end-to-end ML system that analyzes facial photographs to produce per-region skin quality scores, concern heatmaps, and estimated biological "skin age" — all from a single phone camera selfie.
The system downloads public face datasets, generates pseudo-labels using classical computer vision (Canny edges, Laplacian variance, CIELAB color analysis), and trains a multi-task EfficientNet-B2 with a U-Net decoder, quality head, and age head. It ships with a FastAPI serving layer and a 5-page Streamlit dashboard featuring zone overlays, heatmap exploration, and before/after comparison.
Download 3 datasets Align & extract zones Generate pseudo-labels UTKFace (20K) --> MediaPipe 468-point --> Wrinkle (Canny edges) FFHQ (10K) face mesh, affine Pigmentation (L* std) CelebA (20K) warp to 512x512 Redness (a* mean) Pore texture (Laplacian) | | | v v v Quality gating 7 facial zones 4-channel heatmaps blur, angle, bright- forehead, under-eyes, --> pixel-level concern ness, occlusion check cheeks, nose, chin, maps at 512x512 crow's feet, nasolabial | | | v v v Stratified splits 28 quality scores Multi-task training 70/15/15 by age (7 zones x 4 concerns) --> EfficientNet-B2 backbone decade + ethnicity normalized 0-100 + 3 heads, two-phase The model is evaluated against these thresholds after training on pseudo-labeled data:
| Metric | Target | What It Measures |
|---|---|---|
| Per-zone Quality MAE | ≤ 8 points | Average error on 0-100 quality scores per zone |
| Quality Pearson r | ≥ 0.80 | Correlation between predicted and pseudo-label scores |
| Heatmap SSIM | ≥ 0.70 | Structural similarity of predicted vs pseudo-label heatmaps |
| Metric | Target | What It Measures |
|---|---|---|
| Overall Age MAE | ≤ 5.0 years | Mean absolute error on UTKFace test set |
| Age MAE (20-50) | ≤ 4.0 years | Tighter target for the core demographic |
| Metric | Target | What It Measures |
|---|---|---|
| Score Gap | ≤ 6 points | Max quality score difference between any two ethnic groups |
| Age MAE Gap | ≤ 1.5 years | Max age prediction error difference between groups |
| Redness Calibration | Per Fitzpatrick | Redness scoring adjusted for skin tone |
Input (B, 3, 512, 512) | +---------------------------+ | EfficientNet-B2 Backbone | | (timm, features_only) | +---------------------------+ | | skip features GAP pooled [16,24,48, (B, 1408) 120,352] | | +-----+-----+ v | | +-----------+ +--------+ +--------+ | U-Net | |Quality | | Age | | Decoder | | Head | | Head | | 4 blocks | |FC->512 | |FC->256 | | + skips | |->28 sig| |->1 ReLU| +-----------+ +--------+ +--------+ | | | v v v Heatmaps Quality Age (B,4,512,512) (B,28) (B,1) [0,1] per [0,1] x100 years concern = 0-100 L_total = 1.0 * L_heatmap(MSE) + 2.0 * L_quality(SmoothL1) + 1.5 * L_age(SmoothL1) Quality is weighted highest — accurate zone scores are the core product. Age loss is only computed on UTKFace samples (mixed-label batches via age_indices tensor).
| Phase | Backbone | LR | Epochs | Purpose |
|---|---|---|---|---|
| 1 — Warm-up | Frozen | 1e-3 | 3 | Train heads without corrupting pretrained features |
| 2 — Fine-tune | Unfrozen | 5e-5 -> 1e-6 | Up to 30 | End-to-end with cosine annealing + early stopping (patience 7) |
BatchNorm in the frozen backbone stays in eval mode via a custom train() override — prevents running stats corruption.
| Source | What It Provides | Images | Coverage |
|---|---|---|---|
| UTKFace | Aligned faces with age, gender, ethnicity labels | 20K | Ages 0-116, 5 ethnic groups |
| FFHQ | High-quality 1024x1024 faces (no age labels) | 10K subset | Diverse demographics |
| CelebA | Celebrity faces with attribute annotations | 20K subset | 40 binary attributes |
All images are aligned to 512x512 using MediaPipe face detection + affine transformation (horizontal eye-line, 180px inter-eye distance).
Since no ground-truth cosmetic quality dataset exists, we generate training labels using classical computer vision:
| Concern | Method | Signal |
|---|---|---|
| Wrinkle | Canny edge density per zone | Edge pixels / total pixels after morphological filtering |
| Pigmentation | L* channel std deviation | CIELAB lightness variation within zone |
| Redness | a* channel mean | CIELAB red-green axis intensity |
| Pore/Texture | Laplacian variance + Gabor energy | High-frequency texture roughness |
Scores are normalized to 0-100 using dataset-wide percentile mapping with age adjustment. Pixel-level heatmaps (Canny response, local L* std, local a*, local Laplacian variance) provide spatial supervision for the U-Net decoder.
| Zone | Weight | Concerns Assessed | Why It Matters |
|---|---|---|---|
| Forehead | 1.0 | Wrinkle, pigmentation | Horizontal expression lines, age-related laxity |
| Under-eyes | 1.2 | Wrinkle, pigmentation, pore | Earliest zone to show intrinsic aging |
| Cheeks | 1.5 | All 4 concerns | Largest surface area, pore visibility, redness |
| Nose | 0.8 | Redness, pore | Sebaceous activity, pore texture |
| Chin | 0.7 | Wrinkle, pigmentation | Volume loss, jowl formation |
| Crow's feet | 1.0 | Wrinkle | Primary chronological age indicator |
| Nasolabial | 1.0 | Wrinkle, redness | Fold depth strongly correlates with perceived age |
Cheeks carry the highest weight (1.5) — they represent the largest visible skin surface and are assessed across all four concern types.
| Channel | Name | Range | Severity Labels |
|---|---|---|---|
| 0 | Wrinkle | 0.0 - 1.0 | Minimal -> Mild -> Moderate -> Significant |
| 1 | Pigmentation | 0.0 - 1.0 | Minimal -> Mild -> Moderate -> Significant |
| 2 | Redness | 0.0 - 1.0 | Minimal -> Mild -> Moderate -> Significant |
| 3 | Pore/Texture | 0.0 - 1.0 | Minimal -> Mild -> Moderate -> Significant |
SkinAge/ ├── config/ │ ├── model_config.yaml # Architecture, loss weights, training schedule │ ├── data_config.yaml # Dataset paths, pseudo-label params, augmentation │ ├── zones_config.yaml # 7 zones, landmarks, weights, score labels │ └── api_config.yaml # Server settings, quality thresholds, inference ├── src/ │ ├── data/ │ │ ├── download.py # Dataset downloaders with resume support │ │ ├── face_alignment.py # MediaPipe detection + affine alignment │ │ ├── lighting.py # CLAHE + gray-world white balance │ │ ├── zone_extraction.py # 7 zones from 468 landmarks, polygon masks │ │ ├── pseudo_labels.py # Classical CV feature extraction + heatmaps │ │ ├── quality_gate.py # 6 quality checks with actionable messages │ │ ├── dataset.py # PyTorch Dataset, mixed-label collate │ │ ├── augmentation.py # Albumentations (no color jitter — skin tone is signal) │ │ └── splits.py # Stratified splits by age decade + ethnicity │ ├── models/ │ │ ├── backbone.py # EfficientNet-B2 encoder, BN freeze override │ │ ├── unet_decoder.py # 4-block decoder with skip connections │ │ ├── quality_head.py # FC -> 28 sigmoid outputs │ │ ├── age_head.py # FC -> 1 ReLU output │ │ ├── skinage_model.py # Full assembly, from_config(), checkpoints │ │ ├── losses.py # MultiTaskLoss with mixed-label support │ │ └── trainer.py # Two-phase training, mixed precision, early stopping │ ├── evaluation/ │ │ ├── metrics.py # MAE, Pearson, SSIM, age metrics │ │ ├── fairness.py # Group gaps, Fitzpatrick redness calibration │ │ └── visualize.py # Score distributions, correlation matrices │ ├── api/ │ │ ├── schemas.py # Pydantic v2 request/response models │ │ ├── inference.py # Preprocess -> predict -> postprocess pipeline │ │ ├── routes.py # /analyze, /compare, /health endpoints │ │ └── app.py # FastAPI factory with lifespan model loading │ ├── dashboard/ │ │ ├── app.py # Multi-page Streamlit app │ │ └── pages/ │ │ ├── live_demo.py # Upload selfie, gauge chart, score cards │ │ ├── heatmap_explorer.py# Full-size overlays, concern toggle, opacity │ │ ├── comparison.py # Before/after with delta indicators │ │ ├── model_internals.py # Distributions, correlations, fairness │ │ └── dataset_explorer.py# Browse by age/ethnicity/score filters │ └── utils/ │ ├── cielab.py # RGB <-> CIELAB conversion │ ├── landmarks.py # MediaPipe landmark utilities │ └── reproducibility.py # Seed setting, device detection ├── scripts/ │ ├── generate_pseudo_labels.py # Batch pseudo-label generation CLI │ ├── train.py # Training CLI with --resume support │ ├── evaluate.py # Evaluation + fairness report CLI │ ├── fairness_report.py # Standalone fairness report generator │ ├── export_onnx.py # ONNX export with verification │ ├── serve.py # Start FastAPI server │ └── dashboard.py # Start Streamlit dashboard ├── tests/ # Unit + integration tests (>= 65% coverage) │ ├── conftest.py # Shared fixtures (dummy tensors, mock model) │ ├── test_backbone.py # Backbone encoder tests │ ├── test_decoder.py # U-Net decoder tests │ ├── test_heads.py # Quality and age head tests │ ├── test_model.py # Full model integration tests │ ├── test_losses.py # Multi-task loss tests │ ├── test_dataset.py # Dataset and collation tests │ ├── test_utils.py # Utility module tests │ └── test_api.py # API endpoint tests ├── outputs/ │ └── models/ # Checkpoints, ONNX exports, MediaPipe models ├── Dockerfile # Multi-stage build, < 4GB ├── docker-compose.yml # API + Dashboard services ├── requirements.txt # All dependencies ├── pyproject.toml # Project metadata, pytest, mypy, ruff config └── .gitignore # Setup python -m venv venv venv\Scripts\activate # Windows # source venv/bin/activate # macOS/Linux pip install -r requirements.txt # Download datasets python -m SkinAge.src.data.download --dataset utk_face --output data/raw/ python -m SkinAge.src.data.download --dataset ffhq --output data/raw/ --limit 10000 python -m SkinAge.src.data.download --dataset celeba --output data/raw/ --limit 20000 # Generate pseudo-labels python scripts/generate_pseudo_labels.py \ --data-dir data/raw/ \ --output-dir data/processed/ # Train the model (two-phase: frozen backbone -> full fine-tune) python scripts/train.py \ --config config/model_config.yaml \ --data-dir data/processed/ # Evaluate python scripts/evaluate.py \ --checkpoint outputs/models/best_model.pth \ --data-dir data/processed/ # Export to ONNX python scripts/export_onnx.py \ --checkpoint outputs/models/best_model.pth \ --verify # Launch the API python scripts/serve.py --port 8000 # Launch the dashboard python scripts/dashboard.py# Build and run everything docker-compose up --build # API available at http://localhost:8000 # Dashboard available at http://localhost:8501Upload a selfie and receive a full skin analysis.
curl -X POST http://localhost:8000/api/v1/analyze \ -F "file=@selfie.jpg" \ -F "age=30"Response:
{ "overall_score": 74.2, "predicted_age": 32.1, "age_delta": 2.1, "zone_scores": [ { "zone": "forehead", "composite_score": 78.5, "label": "Good", "concerns": { "wrinkle": {"score": 72.3, "severity": "mild"}, "pigmentation": {"score": 84.7, "severity": "minimal"} } }, { "zone": "cheeks", "composite_score": 68.1, "label": "Fair", "concerns": { "wrinkle": {"score": 65.2, "severity": "mild"}, "pigmentation": {"score": 71.0, "severity": "mild"}, "redness": {"score": 58.3, "severity": "moderate"}, "pore_texture": {"score": 77.8, "severity": "mild"} } } ], "heatmaps": { "wrinkle": "data:image/png;base64,...", "pigmentation": "data:image/png;base64,...", "redness": "data:image/png;base64,...", "pore_texture": "data:image/png;base64,..." }, "metadata": { "processing_time_ms": 1243, "model_version": "1.0.0" } }Compare two images (before/after).
curl -X POST http://localhost:8000/api/v1/compare \ -F "before=@before.jpg" \ -F "after=@after.jpg"Response includes both analyses plus per-zone delta scores with improvement indicators.
curl http://localhost:8000/api/v1/health{ "status": "healthy", "model_version": "1.0.0", "device": "cuda", "uptime_seconds": 3621 }Launch with streamlit run SkinAge/src/dashboard/app.py — 5 pages:
| Page | What It Shows |
|---|---|
| Live Demo | Upload selfie, zone overlay, score cards with color-coded labels, heatmap thumbnails, gauge chart |
| Heatmap Explorer | Full-size concern overlays, radio toggle between wrinkle/pigmentation/redness/pore, opacity slider |
| Before/After | Side-by-side comparison, delta indicators with color coding, grouped bar chart |
| Model Internals | Pseudo-label distributions, zone score histograms, correlation matrix, fairness metrics |
| Dataset Explorer | Browse by age/ethnicity/score filters, paginated image grid, pseudo-label detail view |
Images that fail any quality check are rejected with actionable guidance before inference:
| Check | Threshold | Rejection Message |
|---|---|---|
| Face detection | Confidence >= 0.70 | "No face detected — ensure your face is clearly visible" |
| Head yaw | <= 25 deg | "Face is turned too far sideways — look straight at the camera" |
| Head pitch | <= 20 deg | "Face is tilted too far up/down — hold the camera at eye level" |
| Blur | Laplacian >= 80 | "Image is too blurry — hold the camera steady" |
| Brightness | 40-220 | "Image is too dark/bright — move to even lighting" |
| Resolution | >= 200x200 | "Image resolution too low — move closer or use a higher-res camera" |
| Landmarks | >= 90% visible | "Face is partially occluded — remove sunglasses, hair, or hands" |
All checks run unconditionally (no short-circuit) so the user can fix everything in one go.
The system includes built-in fairness monitoring:
- Ethnicity mapping: UTKFace categories (White, Black, Asian, Indian, Other) mapped to approximate Fitzpatrick types
- Score gap audit: Maximum quality score difference between any two ethnic groups must be <= 6 points
- Age MAE gap: Maximum age prediction error difference between groups must be <= 1.5 years
- Redness calibration: Redness scoring calibrated per Fitzpatrick type to account for natural skin tone variation
- No color jitter: Augmentation pipeline deliberately excludes color jitter — skin tone carries diagnostic signal for redness and pigmentation
Generate a full fairness report:
python scripts/fairness_report.py \ --checkpoint outputs/models/best_model.pth \ --data-dir data/processed/ \ --output-dir outputs/fairness/Produces: Markdown report + JSON data + PNG visualizations (score distributions, group comparisons, redness calibration curves).
All configuration files are in config/ and use YAML format:
| Key | Description | Default |
|---|---|---|
backbone.pretrained | Use ImageNet weights | true |
backbone.feature_dim | Backbone output dimension | 1408 |
unet_decoder.output_channels | Heatmap channels (one per concern) | 4 |
quality_head.layers | FC layer sizes | [1408, 512, 28] |
quality_head.dropout | Dropout rate | 0.3 |
age_head.layers | FC layer sizes | [1408, 256, 1] |
loss_weights.heatmap | Heatmap MSE weight | 1.0 |
loss_weights.quality | Quality SmoothL1 weight | 2.0 |
loss_weights.age | Age SmoothL1 weight | 1.5 |
| Key | Description | Default |
|---|---|---|
training.phase1.epochs | Phase 1 epochs (heads only) | 3 |
training.phase1.learning_rate | Phase 1 LR | 1e-3 |
training.phase2.epochs | Phase 2 max epochs | 30 |
training.phase2.learning_rate | Phase 2 LR | 5e-5 |
early_stopping.patience | Epochs without improvement | 7 |
dataloader.batch_size | Training batch size | 16 |
optimizer.name | Optimizer | AdamW |
optimizer.weight_decay | Weight decay | 1e-4 |
# Run the full test suite pytest SkinAge/tests/ -v # Run with coverage report pytest SkinAge/tests/ --cov=SkinAge/src --cov-report=term-missing # Run specific test module pytest SkinAge/tests/test_model.py -vTests are designed to run without trained models or downloaded datasets — all use mock fixtures and dummy tensors.
For optimized CPU inference in production:
python scripts/export_onnx.py \ --checkpoint outputs/models/best_model.pth \ --output outputs/models/skinage.onnx \ --opset 17 \ --verifyThe ONNX model supports dynamic batch sizes and produces three named outputs: heatmaps, quality, and age. The --verify flag runs ONNXRuntime inference and compares against PyTorch outputs (atol=1e-4).
| Category | Tools |
|---|---|
| ML | PyTorch, timm (EfficientNet-B2), torch.amp (mixed precision) |
| Computer Vision | OpenCV, MediaPipe (face mesh, 468 landmarks), scikit-image (SSIM) |
| Data | Albumentations, pandas, NumPy, CIELAB color space |
| API | FastAPI, Pydantic v2, uvicorn |
| Dashboard | Streamlit, matplotlib |
| Production | ONNX, ONNXRuntime, Docker, docker-compose |
| Testing | pytest (>= 65% coverage target) |
| Config | YAML (4 config files: model, data, zones, api) |
| Code Quality | mypy (strict), ruff, isort |
- Pseudo-labels, not ground truth — All quality scores are derived from classical CV features, not dermatologist annotations. V2 will add professional annotation pipelines.
- No video/real-time analysis — Single-image analysis only. Real-time webcam analysis is out of scope for V1.
- Age labels only from UTKFace — FFHQ and CelebA don't carry age labels, so age loss is only computed on ~40% of training batches.
- Ethnicity categories are coarse — UTKFace provides 5 broad categories; finer-grained Fitzpatrick typing would improve redness calibration.
- No mobile deployment — V1 is server-side only. CoreML/TFLite export is planned for V2.
- MediaPipe model files required — Face detection and landmark models must be downloaded separately to
outputs/models/mediapipe/. - xG-style proxy for skin quality — Similar to how proxy xG models estimate expected goals from limited data, our pseudo-labels estimate quality from observable texture/color features. Professional annotations would improve accuracy.
MIT
Built with PyTorch, MediaPipe, FastAPI, and Streamlit.