Skip to content
View erinyoung's full-sized avatar

Organizations

@UPHL-BioNGS

Block or report erinyoung

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
erinyoung/README.md

Hi, I'm Erin Young.

ORCID

Bioinformatician | Regional Technical Lead | Open Source Contributor

I am a Senior Data Scientist and Technical Lead specializing in high-throughput genomic data engineering. Currently, I serve as the Bioinformatics Regional Resource for the Mountain Region, architecting scalable, reproducible workflows for public health surveillance.

My work focuses on workflow orchestration (Nextflow), containerization (Docker/Singularity), and cloud infrastructure (AWS) to turn petabytes of raw sequencing data into actionable epidemiological insights.


Technical Stack

  • Languages: Python (Pandas, Scipy, PySAM), R, Groovy, Bash
  • Workflow Orchestration: Nextflow (DSL2), Snakemake, WDL
  • Infrastructure: Docker, Singularity, AWS (Batch, S3, HealthOmics), GitHub Actions
  • Data Engineering: ETL pipeline design, algorithmic benchmarking, metadata governance

Featured Projects

Role: Lead Architect & Maintainer

The standard-of-care SARS-CoV-2 sequencing pipeline used by the CDC and public health laboratories across the US.

  • Tech: Nextflow, Docker, Singularity, AWS Batch.
  • Scale: Orchestrates alignment, variant calling, and lineage classification for thousands of concurrent samples.
  • Impact: CLIA-validated and deployed for real-time genomic surveillance.

Role: Lead Maintainer

A command-line tool for Unsupervised Machine Learning in genomic epidemiology.

  • Tech: Python, Scikit-learn (PCA, Silhouette Analysis), Fastcluster.
  • ML Features: Uses Auto-K optimization to mathematically identify lineage thresholds and PCA for cluster validation.
  • Performance: Optimized $O(N^2)$ clustering for large-scale distance matrices.

Role: Core Maintainer

A community-driven repository for reproducible bioinformatics containers.

  • Tech: Docker, GitHub Actions CI/CD.
  • Impact: Solves the "it works on my machine" problem by providing version-controlled, public-health-grade images.

Connect

Pinned Loading

  1. StaPH-B/docker-builds StaPH-B/docker-builds Public

    📦 🐳 Dockerfiles and documentation on tools for public health bioinformatics

    Dockerfile 222 137

  2. UPHL-BioNGS/Cecret UPHL-BioNGS/Cecret Public

    Reference-based consensus creation

    Nextflow 60 29

  3. UPHL-BioNGS/Donut_Falls UPHL-BioNGS/Donut_Falls Public

    Assembly of Nanopore Sequencing

    Nextflow 17 7

  4. update_mash_dist update_mash_dist Public

    mash works best when given a mash dist file with a bunch of references.

    3 1

  5. MinkeMap MinkeMap Public

    A Python-based Circular Genome Visualization Tool

    Python 2

  6. heatcluster heatcluster Public template

    Forked from DrB-S/heatcluster

    Creates a heat map with an accompanying cluster map for a SNP matrix

    Python