Skip to content

danielloader/oci-pull-through

Repository files navigation

oci-pull-through

A pull-through cache for OCI container registries. It sits between your container runtime and upstream registries, transparently caching image layers and manifests on first pull. Subsequent pulls for the same content are served from the cache without contacting the upstream registry.

This exists because pulling the same images repeatedly across a fleet of machines is wasteful. Rate limits, network latency, and registry outages compound the problem. A pull-through cache eliminates redundant transfers and provides a degree of resilience against upstream unavailability for previously-cached content.

How it works

The proxy implements the OCI Distribution Spec read path. The upstream registry hostname is encoded in the request path:

GET /v2/{registry}/{image}/manifests/{reference} GET /v2/{registry}/{image}/blobs/{digest} 

For example, pulling ghcr.io/org/app:v1.2.3 through the proxy running on cache.internal:8080:

docker pull cache.internal:8080/ghcr.io/org/app:v1.2.3

On a cache miss, the proxy fetches from the upstream registry and simultaneously streams the response to the client and the cache store. The client is never blocked by cache writes -- if the cache store is slow or fails, the client stream continues uninterrupted.

On a cache hit with the S3 backend, the proxy returns an HTTP 307 redirect to a presigned S3 URL. The client fetches the blob directly from S3, removing the proxy from the data path entirely. This avoids the double-bandwidth penalty (S3→proxy→client) that streaming would incur. The OCI distribution spec explicitly allows 307 redirects for blob GETs, and Docker/containerd clients handle them correctly.

The filesystem backend continues to stream directly from disk (with full Range/206 support via http.ServeContent).

All upstream response headers (excluding hop-by-hop headers) are stored alongside the cached object and replayed on cache hits, making the proxy transparent to clients that depend on headers like ETag or Accept-Ranges.

Caching behaviour

Content-addressed objects (blobs and manifests resolved by digest) are immutable. They are always cached and served with Cache-Control: public, max-age=31536000, immutable.

Tag references are mutable -- a tag can point to a different digest at any time. Caching of tag manifests is therefore optional and controlled by configuration:

Scenario Cached Condition
Blob (/blobs/sha256:...) Always Immutable
Manifest by digest Always Immutable
Manifest by tag Configurable CACHE_TAG_MANIFESTS=true
Manifest by latest Configurable Both tag and latest flags

When tag manifests are cached, they are served with Cache-Control: public, max-age=2419200 (28 days). The latest tag uses a shorter Cache-Control: public, max-age=3600 (1 hour) to balance freshness with upstream rate limits.

Non-2xx upstream responses are forwarded to the client as-is and are never cached.

Proxy modes

The proxy operates in one of two mutually exclusive modes, controlled by the required PROXY_MODE environment variable. Setting both or neither is a startup error.

Proxy Mode: transparent

Maximum availability, no auth enforcement. The proxy is an unauthenticated cache that serves whatever it has. Ideal for private Kubernetes clusters where the network is the security boundary.

  • /v2/ check: tries upstream; if unreachable, returns a static 200 OK so clients can proceed with cached content.
  • HEAD (cache hit): served immediately from cache. Auth header ignored.
  • GET (cache hit): served from cache (S3 redirect or FS stream) immediately. Auth header ignored.
  • Cache miss with upstream down: 502 Bad Gateway — can't serve what we don't have.
  • Cache miss with upstream up: forwarded to upstream with the client's auth header. Response is tee-streamed to cache.

Security implications:

  • Any client that can reach the proxy can pull any cached content.
  • No token validation occurs on cache hits.
  • The /v2/ auth challenge is forwarded when upstream is up (clients still authenticate with upstream on cache misses), but when upstream is down auth is skipped entirely.
  • Secure this mode with network policy, private subnets, or an authenticating reverse proxy in front.

Proxy Mode: authenticated

Auth is always validated against upstream. The cache accelerates delivery of large layers but never bypasses access control. Upstream must be reachable for all requests.

  • /v2/ check: always forwarded to upstream. If unreachable → 502 Bad Gateway.
  • HEAD: always forwarded to upstream with the client's auth. Cache is not consulted — upstream HEAD is lightweight and gives the freshest headers.
  • GET (cache hit): before serving from cache, a HEAD request is sent to upstream for the same resource with the client's auth:
    • Upstream 200 → auth valid, serve body from cache.
    • Upstream 401/403 → forwarded to client (auth rejected).
    • Upstream 404 → forwarded to client (resource removed upstream).
    • Upstream unreachable → 502 (no degraded fallback).
  • GET (cache miss): forwarded to upstream with auth. Response is tee-streamed to cache.

Performance characteristics:

  • Every cache-hit GET adds one upstream HEAD round-trip (~100-200ms).
  • But the blob body comes from local S3/FS instead of the internet.
  • For large images (1GB+ layers), the HEAD overhead is negligible compared to bandwidth savings.
  • HEAD requests from clients are always forwarded to upstream (no cache benefit for HEAD).

Configuration

All configuration is via environment variables.

Variable Default Description
PROXY_MODE (required) transparent or authenticated. See Proxy modes.
STORAGE_BACKEND s3 Storage backend. s3 or fs.
LISTEN_ADDR :8080 (:8443 with TLS) Listen address.
GENERATE_SELF_SIGNED_TLS false Generate a self-signed TLS certificate on startup.
LOG_LEVEL info debug, info, warn, error.
CACHE_TAG_MANIFESTS true Cache manifests resolved by tag.
CACHE_LATEST_TAG false Cache the latest tag.

S3 backend

Variable Default Description
S3_BUCKET oci-cache Bucket name. Auto-created.
S3_PREFIX -- Key prefix for all objects. Allows multiple proxy instances to share a bucket.
S3_FORCE_PATH_STYLE true Path-style S3 URLs.
S3_LIFECYCLE_DAYS 28 Expire cached objects after this many days. 0 disables.
AWS_ACCESS_KEY_ID -- Standard SDK credential chain.
AWS_SECRET_ACCESS_KEY -- Standard SDK credential chain.
AWS_REGION -- Standard SDK credential chain.
AWS_ENDPOINT_URL -- S3-compatible endpoint override.

Credentials, region, and endpoint are resolved through the standard AWS SDK default credential chain. IAM instance profiles, ECS task roles, and ~/.aws/credentials all work as expected.

Shared buckets

Multiple proxy instances (each fronting a different upstream registry) can share a single S3 bucket by setting S3_PREFIX:

# Instance 1: ghcr.io proxy S3_BUCKET=oci-cache S3_PREFIX=ghcr UPSTREAM_REGISTRY=https://ghcr.io ... # Instance 2: Docker Hub proxy S3_BUCKET=oci-cache S3_PREFIX=dockerhub UPSTREAM_REGISTRY=https://registry-1.docker.io ...

Objects are stored under {prefix}/blobs/... and {prefix}/manifests/.... The lifecycle policy is scoped to the prefix, so each instance manages its own expiry independently.

Filesystem backend

Variable Default Description
FS_ROOT /data/oci-cache Root directory for cache.

Objects are stored as files with .meta.json sidecar files containing content metadata and the full set of upstream response headers. Writes are atomic (temp file + rename). The S3 backend uses the same .meta.json sidecar pattern (stored as a separate S3 object alongside the data object) for parity between backends.

Running

Docker Compose (development)

The included docker-compose.yml runs the proxy with SeaweedFS as an S3-compatible backend:

docker compose up

The proxy is available on localhost:8080. SeaweedFS provides S3 on port 8333.

Container image

Images are built with ko using a gcr.io/distroless/static-debian12:nonroot base image.

Build a local image:

KO_DOCKER_REPO=ko.local ko build ./cmd/oci-pull-through

Run it:

docker run -p 8080:8080 \ -e STORAGE_BACKEND=s3 \ -e AWS_ENDPOINT_URL=http://your-s3:9000 \ -e AWS_ACCESS_KEY_ID=access \ -e AWS_SECRET_ACCESS_KEY=secret \ ko.local/oci-pull-through

Binary

go build -o oci-pull-through ./cmd/oci-pull-through STORAGE_BACKEND=fs FS_ROOT=/var/cache/oci ./oci-pull-through

Health check

GET /healthz returns 200 OK when the server is accepting connections.

For scratch containers (no shell, no curl), the binary includes a built-in health check client:

oci-pull-through -healthcheck

This is what the Docker Compose healthcheck uses. Exit code 0 on success, 1 on failure.

API endpoints

Method Path Description
GET /healthz Health check.
GET /v2/ OCI version check.
GET, HEAD /v2/{reg}/{name}/manifests/{ref} Manifest.
GET, HEAD /v2/{reg}/{name}/blobs/{digest} Blob.
GET /v2/{reg}/{name}/referrers/{digest} Referrers (proxied to upstream).

The proxy supports multi-segment image names (e.g., /v2/ghcr.io/org/sub/image/manifests/latest).

docker.io is automatically resolved to registry-1.docker.io for upstream requests.

Protocol

By default the proxy serves both HTTP/1.1 and cleartext HTTP/2 (h2c) on the same port. TLS termination is expected to be handled by a reverse proxy or load balancer in front of this service.

Self-signed TLS

Setting GENERATE_SELF_SIGNED_TLS=true generates an in-memory ECDSA P-256 self-signed certificate on startup (valid for 10 years, with SANs for localhost, host.docker.internal, 127.0.0.1, and ::1). The server switches to HTTPS with HTTP/2 and the default listen address changes to :8443.

This is useful for local development where the Docker daemon requires HTTPS to pull from a registry. No certificate files are written to disk.

Docker Desktop (macOS / Windows)

On Docker Desktop the daemon runs inside a Linux VM. The VM's loopback (127.0.0.1 / [::1]) does not reach the host, so localhost:8443 will not work. Use host.docker.internal instead:

docker pull host.docker.internal:8443/docker.io/library/postgres:13

You will also need to add the registry to Docker's insecure registries list (since the certificate is self-signed). In Docker Desktop go to Settings → Docker Engine and add:

{ "insecure-registries": ["host.docker.internal:8443"] }

Then apply and restart Docker Desktop.

Authorization headers from the client are forwarded to the upstream registry as-is. The proxy does not perform authentication or token exchange. If your upstream registry requires authentication, the client must provide valid credentials.

Signals

The process handles SIGINT and SIGTERM for graceful shutdown with a 30-second drain timeout.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages