Skip to content
View astrosica's full-sized avatar
πŸ‘©β€πŸ’»
πŸ‘©β€πŸ’»

Block or report astrosica

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
astrosica/README.md

Hello, I'm Jess! ✨

I'm Data Scientist and former Astrophysicist working in Fintech.

πŸ‘©β€πŸ’» About Me

πŸ“ˆ Data Science Advisor and Educator @ fintech startup
πŸ“Š Automate cash forecasting, foreign exchange (FX) hedging, and cash reporting with Python
πŸ‘©πŸΌβ€πŸ« Teach corporate treasury teams (including Fortune 500 companies) about AI, ML, and LLMs
πŸ“š PhD in Astronomy & Astrophysics + HBSc in Astronomy & Physics from the University of Toronto
πŸ’¬ Taught 500+ technical and non-technical students over 17 classes, including on the use of Python
πŸ“ Published several quantitative research papers in high-impact journals
πŸ‘₯ Mentored a student on a year-long data project through to publication
🧡 Fun fact: I cross-stitch realistic astronomy observations on Etsy

πŸ› οΈ Skills

Languages: Python (Scikit-Learn, Pandas, NumPy, Matplotlib, Seaborn, SciPy), SQL (BigQuery, MySQL)
Tools: Git/GitHub, Jupyter, Streamlit, Docker, APIs (Claude, OpenAI), Tableau

🌐 Contact

Email me at jessicacampbell.astro@gmail.com
Connect with me on LinkedIn at linkedin.com/astrosica

Data Science Portfolio

This repository contains my data science portfolio projects, implemented primarily in Python.

Machine Learning (ML) Projects

Applied ML Projects

End-to-end ML projects in Python including standard ML workflows with an emphasis on comparing multiple algorithms, evaluating tradeoffs, and model interpretability.

  • Predicting Credit Card Approvals: Modelled credit card application approvals using demographic and financial features. Trained logistic regression, KNN, and random forest models to compare performance and interpret key drivers of approval decisions.

Core ML Algorithms

Implementation of foundational ML algorithms in Python, including standard pre-processing, feature engineering, hyperparameter tuning, and model evaluation.

  • Predicting Loan Defaults with Random Forest: Built a random forest model to predict loan default likelihood using financial data. Addressed class imbalance (16% default rate) with hyperparameter tuning (optimizing average precision), threshold optimization (optimizing F2-score), and SMOTE resampling.
  • Classifying anonymized data with KNN: Built a KNN model to classify anonymized data into two categories, demonstrating the impact of feature scaling and k-value tuning.
  • Predicting Ad Clicks with Logistic Regression: Modeled ad-click likelihood using demographic and behaviour information with a logistic regression model. Implemented feature engineering (including cyclical temporal feature mapping), multicollinearity reduction, threshold optimization, and model performance testing.
  • Predicting Synthetic Credit Scores with Linear Regression: Modeled synthetic credit scores based on financial and demographic features using linear regression. Implemented feature engineering, correlation analysis, multicollinearity reduction, and statistical significance testing to evaluate feature importance and improve model interpretability.

Data Visualization and Reporting

Data analytics projects using SQL, Tableau, and Excel, focusing on data storytelling through dashboards and reports.

  • Insurance Analysis: Developed an interactive Tableau dashboard to report and analyze 70K insurance claims to support marketing and budget decisions.
  • Marketing Analysis: Analyzed 100K e-commerce sales records using SQL (Google BigQuery) and Excel to uncover trends in customer behaviour, reporting sales and marketing metrics using an interactive Tableau dashboard.
  • TTC Delay Analysis: Cleaned and analyzed 40K subway delay records for 2022-2023 using SQL and Tableau, assessing YoY KPIs and delay causes and providing performance improvement recommendations.

Pinned Loading

  1. data-science-portfolio data-science-portfolio Public

    Jupyter Notebook 5

  2. W3-DIT W3-DIT Public

    User-based automated data pipeline for data management and analysis of DIT photometry of the W3 giant molecular cloud.

    Python