Skip to content
View gassantos's full-sized avatar
🎯
Focusing
🎯
Focusing

Highlights

  • Pro

Block or report gassantos

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gassantos/README.md

👋 Hi, I'm Gustavo Alexandre

Currently pursuing a PhD in Computing at Federal Fluminense University. Senior Data Science professional with expertise in Machine Learning, Data Analytics, and AI/LLM applications. I combine strong technical foundations with practical experience in both government and academic environments, working on impactful projects involving data-driven decision making, NLP, and explainable AI. Currently focused on optimizing LLM fine-tuning for energy efficiency and building interpretable ML solutions.

📌 Current Focus:

  • 🔬 Researching LLM Fine-tuning with focus on Energy Efficiency and Green AI
  • 🔭 Building AI/Analytics solutions at TCERJ for Auditing & Control (Government sector)
  • 👨‍🏫 Teaching Data Analytics at ESPM and Data Science in UFF's graduate program
  • 🌱 Mastering Cloud Computing (Azure & GCP)
  • 💻 Interesting Self-Analytics, NLP and Explainable AI

🚀 Featured Projects

Otimização de Hiperparâmetros para Modelos de Linguagem

Implementação reproduzível de BERT-PLI para busca em grade exaustiva de hiperparâmetros com execução paralela em GPU, rastreamento de recursos e análise automática de resultados. Executa centenas de combinações de hiperparâmetros com monitoramento de energia.

Tech Stack: Python, PyTorch, Transformers, CodeCarbon, Weights & Biases

Framework para Evolução de Árvores de Decisão

Biblioteca Python para evolução de árvores de decisão utilizando algoritmos genéticos, permitindo otimização automática de modelos interpretáveis.

Tech Stack: Python, Scikit-Learn, NumPy, Genetic Algorithms

Pipeline de Pré-processamento de Dados Jurídicos

Pipeline modular de pré-processamento para textos jurídicos em C++ com orquestração paralela via grafo de dependências. Implementa execução sequencial e paralela com particionamento de dados, alcançando speedup de até 5.24x.

Tech Stack: C++17, CMake, Makefile

Sistema de Fine-tuning para LLMs com Monitoramento Energético

Framework completo de fine-tuning de modelos de linguagem (LLaMA 3.2 3B) com LoRA, incluindo pré-processamento avançado de dados, monitoramento de consumo energético sincronizado e rastreamento de emissões de CO₂.

Tech Stack: Python, LangChain, HuggingFace, PyTorch, Transformers, CodeCarbon


💻 Technology Stack

Python & ML Libraries:

LLM & NLP Frameworks: LangChain HuggingFace Transformers

Databases & Data Tools:

Web & Visualization: Flask Plotly Dash SQL

Cloud & DevOps: Azure GCP Docker

Systems & Low-Level:

Monitoring & Analytics: CodeCarbon Weights & Biases


🎯 Areas of Expertise

Area Description Technologies
Machine Learning Model development, feature engineering, hyperparameter optimization Scikit-learn, XGBoost, LightGBM, DecisionTree
Deep Learning Neural networks, transfer learning, fine-tuning TensorFlow, PyTorch, Transformers
NLP & LLMs Text processing, tokenization, LLM fine-tuning, prompt engineering HuggingFace, LangChain, Gemini, LLaMA
Data Analysis Exploratory analysis, statistical inference, dashboarding Pandas, NumPy, Plotly, Dash
Explainable AI Model interpretability, SHAP, feature importance LIME, SHAP, TreeExplainer
Green AI Energy-efficient training, carbon footprint tracking, sustainable ML CodeCarbon, Energy monitoring
Data Engineering ETL pipelines, data cleaning, preprocessing Python, SQL, Apache tools

📚 Publications & Content

📖 Medium: @gassanttos

Check out my articles on Machine Learning, Data Science, and AI topics!


🤝 Get in Touch

LinkedIn GitHub Medium


🎓 Currently Learning

  • 💚 Sustainable AI and Green Computing practices
  • 🔬 Advanced LLM architectures and techniques
  • 🧬 Graph Neural Networks and knowledge graphs
  • ☁️ Cloud Computing (Azure & GCP)

Open to collaborations on ML/NLP projects, open-source contributions, and knowledge sharing!

Pinned Loading

  1. gridsearch-skyband gridsearch-skyband Public

    The GridSearch Skyband project optimizes language model pipelines through a reproducible PyTorch implementation. It automates exhaustive hyperparameter searches and parallel GPU execution while mon…

    Python

  2. finetuning-energy finetuning-energy Public

    Sistema de fine-tuning para LLM com foco em eficiência energética.

    Python

  3. graph_priority_queue graph_priority_queue Public

    Pipeline de textos jurídicos usando fila de prioridade para escalonar tarefas em um grafo de dependência.

    C++

  4. evolvedtree evolvedtree Public

    It is a machine learning model combines two computational intelligence approaches: Genetic Algorithm and Decision Tree. The nome of model (EvolveDTree) represents a acronymous to "Evolved Decision …

    Jupyter Notebook 6 5