A sophisticated deep learning application that predicts molecular melting points from SMILES notation using a custom Message Passing Neural Network (MPNN) architecture with PNA and DIN layers.
Melty Molecule is a state-of-the-art molecular property prediction system that leverages graph neural networks to predict melting points of chemical compounds. The model uses a hybrid architecture combining Principal Neighborhood Aggregation (PNA) and Graph Isomorphism Network (GIN) with edge updates and virtual node connectivity.
- π¬ Advanced Graph Neural Network: 4-layer MPNN with parallel PNA and GIN operations
- 𧬠Comprehensive Molecular Features: Atomic properties, bond information, partial charges, and global descriptors
- π Interactive Web Interface: User-friendly Streamlit application for real-time predictions
- π Robust Feature Engineering: Automatic extraction of 100+ molecular descriptors
- β‘ Optimized Performance: Cached model loading and efficient inference
- π― High Accuracy: Trained on curated molecular datasets with standardized targets
-
Message Passing Layers
- 4 parallel layers of PNA (Principal Neighborhood Aggregation)
- 4 parallel layers of GIN (Graph Isomorphism Network)
- Edge feature updates at each layer
- Virtual node for global graph information
-
Feature Processing
- Node features: Atomic properties (atomic number, degree, hybridization, aromaticity, etc.)
- Edge features: Bond type, conjugation, ring membership
- Graph features: 100+ molecular descriptors including topological, electronic, and physicochemical properties
- Partial charge computation using RDKit
-
Output Layer
- Regression head for melting point prediction
- Standardized output with mean/std normalization
SMILES β RDKit Molecule β Graph Representation β Feature Extraction β GNN β Melting Point - Installation
- Quick Start
- Usage
- Project Structure
- Dataset
- Model Details
- Training
- API Reference
- Contributing
- References
- License
- Citation
- Python 3.8 or higher
- CUDA-compatible GPU (optional, for faster inference)
- Git
git clone https://github.com/Divyansh900/melty-molecules cd melty-moleculeClone the SMILES-to-Graph conversion library:
git clone https://github.com/Divyansh900/SMILES-to-Graph.git# Using venv python -m venv venv # Activate on Linux/Mac source venv/bin/activate # Activate on Windows venv\Scripts\activatepip install -r requirements.txtYou can download the pre-trained model weights directly from Kaggle:
Option 1: Using Kaggle CLI
# Install Kaggle CLI if not already installed pip install kaggle # Configure Kaggle API credentials (place kaggle.json in ~/.kaggle/) # Download from: https://www.kaggle.com/settings/account # Download the model #!/bin/bash kaggle models instances versions download divyanshvishwkarma/melty-molecules/pyTorch/default/1 unzip melty-molecule-model.zip -d models/Option 2: Manual Download
- Visit the Kaggle dataset page:
https://www.kaggle.com/datasets/divyanshvishwkarma/melty-molecule-model - Click "Download" to get the model weights
- Extract and place
model_weights.pthin themodels/directory
Option 3: Using kagglehub (Recommended)
import kagglehub # Download latest version path = kagglehub.model_download("divyanshvishwkarma/melty-molecules/pyTorch/default") print("Path to model files:", path)streamlit run app.pyThe application will open in your default browser at http://localhost:8501
from predict import predict_melting_point smiles = "CCO" # Ethanol melting_point = predict_melting_point(smiles) print(f"Predicted Melting Point: {melting_point:.2f}Β°C")- Enter SMILES: Input a valid SMILES string in the text field
- Select Examples: Click on example molecules for quick testing
- Predict: Click the "Predict Melting Point" button
- View Results: See the predicted melting point in Celsius, Kelvin, and Fahrenheit
import torch from model import Model from smiles_to_graph import SMILESToGraph # Initialize converter converter = SMILESToGraph( feature_level="Comprehensive", include_3d=False, include_partial_charges=True, include_descriptors=True, max_atomic_num=100 ) # Load model deg = torch.tensor([0, 767, 1422, 511, 71]) model = Model("model_weights.pth", deg) model.eval() # Convert SMILES to graph smiles = "c1ccccc1" # Benzene graph_data = converter.to_graph(smiles, normalize_descriptors='standardize') x, node_features, edge_index, edge_attr, graph_features = graph_data.values() # Prepare tensors node_feat = torch.FloatTensor(node_features) edge_list = torch.LongTensor(edge_index).permute(1, 0) edge_feat = torch.FloatTensor(edge_attr) batch = torch.zeros(node_feat.shape[0], dtype=torch.long) # Predict with torch.no_grad(): pred = model(node_feat, edge_list, edge_feat, batch=batch)[0].item() # Unstandardize pred = pred * Y_STD + Y_MEAN print(f"Predicted Melting Point: {pred:.2f}Β°C")melty-molecule/ β βββ app.py # Streamlit web application βββ model.py # Model architecture definition βββ requirements.txt # Python dependencies (minimal) βββ README.md # Project documentation β βββmodel_weights.pth # Pre-trained weights (download from Kaggle) β ββββ SMILES-to-Graph/ # Cloned dependency for graph conversion This project uses a minimal set of dependencies for optimal performance:
torch>=2.0.0 # Deep learning framework torch-geometric>=2.3.0 # Graph neural networks rdkit>=2023.3.1 # Molecular processing and cheminformatics streamlit>=1.28.0 # Web application framework numpy>=1.24.0 # Numerical computing All required libraries are listed in requirements.txt. The project is designed to work with these core dependencies only, ensuring easy installation and minimal conflicts.
The model is trained on a curated dataset of molecular structures with experimentally measured melting points.
- Total Molecules: ~26,000+ compounds
- Melting Point Range: -200Β°C to +400Β°C
- Feature Dimensionality:
- Node features: 55 per atom
- Edge features: 11 per bond
- Public chemical databases (ChEMBL (pretraining), Jean-Claude Bradley Open Melting Point Datase (Fine Tuning))
- Experimental measurements from literature
- Quality-filtered and validated entries
The project uses built-in RDKit functionality for molecular graph conversion:
from rdkit import Chem from rdkit.Chem import rdMolDescriptors, Descriptors # Convert SMILES to molecule mol = Chem.MolFromSmiles(smiles) # Extract features using RDKit node_features = extract_atom_features(mol) edge_features = extract_bond_features(mol) graph_descriptors = extract_molecular_descriptors(mol)The pre-trained model is hosted on Kaggle for easy access and reproducibility.
-
Install Kaggle CLI (if not already installed):
pip install kaggle
-
Set up Kaggle API credentials:
- Go to https://www.kaggle.com/settings/account
- Scroll to "API" section and click "Create New API Token"
- This downloads
kaggle.json - Place it in
~/.kaggle/(Linux/Mac) orC:\Users\<YourUsername>\.kaggle\(Windows) - Set permissions:
chmod 600 ~/.kaggle/kaggle.json(Linux/Mac)
-
Download the model:
# Download dataset kaggle datasets download -d yourusername/melty-molecule-model # Extract unzip melty-molecule-model.zip -d models/
Visit the Kaggle dataset page and download manually:
- Dataset URL:
https://www.kaggle.com/datasets/yourusername/melty-molecule-model - Place
model_weights.pthin themodels/directory
Model( node_features=54, # Varies by molecule edge_features=11, # Varies by molecule hidden_dim=512, num_layers=4, aggregators=['mean', 'sum', 'max'], scalers=['identity', 'amplification', 'attenuation'], deg=tensor([0, 767, 1422, 511, 71]), # Degree distribution edge_updates=True, virtual_node=True, dropout=0.1 )| Parameter | Value |
|---|---|
| Learning Rate | 1e-5 |
| Batch Size | 128 |
| Epochs | 20 |
| Optimizer | Adam |
| Weight Decay | 1e-4 |
| Scheduler | CosineAnnealing |
| Loss Function | MSE |
This repository includes the pre-trained model available for download from Kaggle. The model was trained using the following approach:
# Model was trained with these hyperparameters: learning_rate = 1e-4 batch_size = 32 epochs = 200 optimizer = AdamW scheduler = ReduceLROnPlateauThe model was trained on a curated dataset of molecular structures with the following pipeline:
- Data Collection: Gathered ~26,000+ molecules with experimental melting points
- Graph Conversion: SMILES β RDKit Molecule β Graph representation
- Feature Extraction: Comprehensive molecular descriptors using RDKit
- Normalization: Standardized node features and target values
- Training: 4-layer MPNN with PNA and DIN aggregation
- Validation: 10% holdout set for model selection
The trained model checkpoint is available on Kaggle for immediate use.
converter = SMILESToGraph( feature_level="Comprehensive", include_3d=False, include_partial_charges=True, include_descriptors=True, max_atomic_num=100 ) graph_data = converter.to_graph(smiles, normalize_descriptors='standardize')model = Model(model_path, deg) model.to(device) model.eval() # Forward pass output = model(node_features, edge_index, edge_attr, batch)def predict_melting_point(smiles: str) -> float: """ Predict melting point from SMILES string Args: smiles: SMILES notation of molecule Returns: float: Predicted melting point in Celsius """ passContributions are welcome! This project focuses on maintaining a minimal dependency footprint while maximizing functionality.
- Fork the repository
- Create a feature branch (
git checkout -b feature/AmazingFeature) - Make your changes (ensure compatibility with existing dependencies)
- Test your changes with the Streamlit app
- Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
- Maintain compatibility with the minimal dependency set (torch, torch-geometric, rdkit, streamlit, numpy)
- Test changes with the web interface before submitting
- Document any new features or changes in the README
- Follow existing code style and structure
-
PyTorch: Deep learning framework
- Website: https://pytorch.org/
- Documentation: https://pytorch.org/docs/
-
PyTorch Geometric: Graph neural network library
- Paper: Fast Graph Representation Learning with PyTorch Geometric
- Repository: https://github.com/pyg-team/pytorch_geometric
- Documentation: https://pytorch-geometric.readthedocs.io/
-
RDKit: Cheminformatics and machine learning toolkit
- Documentation: https://www.rdkit.org/docs/
- Repository: https://github.com/rdkit/rdkit
-
SMILES-to-Graph: Molecular graph conversion utility
- Repository: https://github.com/Divyansh900/SMILES-to-Graph
- Used for comprehensive feature extraction from SMILES
-
Streamlit: Web application framework
- Documentation: https://docs.streamlit.io/
- Repository: https://github.com/streamlit/streamlit
-
NumPy: Numerical computing library
- Website: https://numpy.org/
- Documentation: https://numpy.org/doc/
-
Message Passing Neural Networks
- Gilmer, J., et al. (2017). "Neural Message Passing for Quantum Chemistry." ICML.
- arXiv:1704.01212
-
Principal Neighbourhood Aggregation
- Corso, G., et al. (2020). "Principal Neighbourhood Aggregation for Graph Nets." NeurIPS.
- arXiv:2004.05718
-
Graph Networks for Molecular Property Prediction
- Yang, K., et al. (2019). "Analyzing Learned Molecular Representations for Property Prediction." JCIM.
- Paper Link
-
SMILES Notation
- Weininger, D. (1988). "SMILES, a chemical language and information system." JCICS.
- Paper Link
-
PubChem: Public chemical database
- Website: https://pubchem.ncbi.nlm.nih.gov/
-
ChEMBL: Bioactive molecules database
- Website: https://www.ebi.ac.uk/chembl/
-
QM9: Quantum chemistry dataset
- Website: http://quantum-machine.org/datasets/
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
- Thanks to the PyTorch Geometric team for the excellent GNN framework
- RDKit developers for comprehensive cheminformatics tools
- Divyansh900 for the SMILES-to-Graph conversion library
- The open-source community for various tools and libraries
Divyansh Vihwkarma - @divyanshvishwkarma
Project Link: https://github.com/Divyansh900/melty-molecules
If you use this work in your research, please cite:
@software{melty_molecule_2024, author = {Divyansh Vishwkarma}, title = {Melty Molecule: Deep Learning-Powered Molecular Melting Point Prediction}, year = {2024}, url = {https://github.com/yourusername/melty-molecule} }- Add support for boiling point prediction
- Implement uncertainty quantification
- Add batch prediction capability
- Support for additional molecular properties
- Multi-task learning for multiple properties
- Docker containerization
- Enhanced visualization of molecular structures
- Export predictions to CSV/Excel
| Molecule | SMILES | Actual MP (Β°C) | Predicted MP (Β°C) | Error (Β°C) |
|---|---|---|---|---|
| Ethanol | CCO | -114.1 | -110.5 | 3.6 |
| Benzene | c1ccccc1 | 5.5 | 8.2 | 2.7 |
| Aspirin | CC(=O)Oc1ccccc1C(=O)O | 135.0 | 142.1 | 7.1 |
β Star this repository if you find it helpful! β


