This model is part of the paper "Representation learning for multi-modal spatially resolved transcriptomics data".
Authors: Kalin Nonchev, Sonali Andani, Joanna Ficek-Pascual, Marta Nowak, Bettina Sobottka, Tumor Profiler Consortium, Viktor Hendrik Koelzer, and Gunnar Rätsch
The preprint is available here.
- [03.2026] Towards Cross-Sample Alignment for Multi-Modal Representation Learning in Spatial Transcriptomics will be at ICLR 2026 Learning Meaningful Representations of Life
- [09.2026] AESTETIK now supports multi-modal (e.g., H&E images, spatial transcriptomics) and cross-sample integration using Harmony, scVI, etc.
- [08.2024] AESTETIK secured the 1st place at the Mammoth International Contest On Omics Sciences in Europe 2024 organized by China National GeneBank, BGI Genomics, MGI and CODATA link.
NEW version (June 2025)
- UPDATE: Rewrote AESTETIK using the Lightning framework for improved modularity
- Added: New
fit()/predict()API - Added: Support for processing multiple samples at once
- Removed: Multiple old methods and parameters in AESTETIK
See full changelog for more details.
Do you want to gain a multi-modal understanding of key biological processes through spatial transcriptomics?
We introduce AESTETIK, a convolutional autoencoder model. It jointly integrates transcriptomics and morphology information, on a spot level, and topology, on a neighborhood level, to learn accurate spot representations that capture biological complexity.
Fig. 1 AESTETIK integrates spatial, transcriptomics, and morphology information to learn accurate spot representations. A: Spatial transcriptomics enables in-depth molecular characterization of samples on a morphology and RNA level while preserving spatial location. B: Workflow of AESTETIK. Initially, the transcriptomics and morphology spot representations are preprocessed. Next, a dimensionality reduction technique (e.g., PCA) is applied. Subsequently, the processed spot representations are clustered separately to acquire labels required for the multi-triplet loss. Afterwards, the modality-specific representations are fused through concatenation and the grid per spot is built. This is used as an input for the autoencoder. Lastly, the spatial-, transcriptomics-, and morphology-informed spot representations are obtained and used for downstream tasks such as clustering, morphology analysis, etc.
We can install aestetik directly through pip.
pip install aestetik We can also create a conda environment with the required packages.
conda env create --file=environment.yaml We can also install aestetik offline.
git clone https://github.com/ratschlab/aestetik cd aestetik python setup.py install NB: Please ensure you have installed pyvips depending on your machine's requirements. We suggest installing pyvips through conda:
conda install conda-forge::pyvips Please take a look at our example to get started with AESTETIK.
Here, another example notebook with simulated spatial transcriptomics data.
- Justina Dai, Kalin Nonchev, V. Koelzer, and Gunnar Rätsch "Towards Cross-Sample Alignment for Multi-Modal Representation Learning in Spatial Transcriptomics." bioRxiv (2026). DOI
- Kalin Nonchev, Glib Manaiev, V. Koelzer, and Gunnar Rätsch "DeepSpot2Cell: Predicting Virtual Single-Cell Spatial Transcriptomics from H&E images using Spot-Level Supervision." bioRxiv (2025). DOI
- Liping Kang, Qinglong Zhang, Fan Qian, Junyao Liang, and Xiaohui Wu "Benchmarking computational methods for detecting spatial domains and domain-specific spatially variable genes from spatial transcriptomics data." Nucleic Acids Research (2025). DOI
- Kalin Nonchev, Sebastian Dawo, Karina Silina, H. Moch, S. Andani, Tumor Profiler Consortium, V. H. Koelzer, and Gunnar R¨atsch "DeepSpot: Leveraging Spatial Context for Enhanced Spatial Transcriptomics Prediction from H&E Images." medRxiv (2025). DOI
This list is automatically updated weekly via GitHub Actions using the Semantic Scholar and OpenCitations APIs.
- DeepSpot — Predicts spatial transcriptomics from H&E images at spot-level (Visium) and single-cell (Xenium) resolution. Uses AESTETIK for cross-sample integration.
- DeepSpot2Cell — Predicts virtual single-cell spatial transcriptomics from H&E images using spot-level supervision.
In case you found our work useful, please consider citing us:
@article{nonchev2024representation, title={Representation learning for multi-modal spatially resolved transcriptomics data}, author={Nonchev, Kalin and Andani, Sonali and Ficek-Pascual, Joanna and Nowak, Marta and Sobottka, Bettina and Tumor Profiler Consortium and Koelzer, Viktor Hendrik and Raetsch, Gunnar}, journal={medRxiv}, pages={2024--06}, year={2024}, publisher={Cold Spring Harbor Laboratory Press} } The code for reproducing the paper results can be found here.
In case, you have questions, please get in touch with Kalin Nonchev.


