alvaDesc

Calculate, Explore, and Export Molecular Descriptors and Fingerprints

AlvaDesc is a powerful, user-friendly software for generating high-quality molecular descriptors and fingerprints for cheminformatics, QSAR/QSPR modeling, read-across, and machine-learning workflows. Offering thousands of 2D and 3D descriptors and multiple fingerprint types, it is designed to help you transform chemical structures into meaningful numerical representations through an intuitive and flexible interface.

 

 

Why Use alvaDesc?

Extensive descriptor library: Calculate almost 6,000 well-defined 2D and 3D molecular descriptors covering topological, geometrical, physicochemical and structural classes.

Multiple fingerprint types: Compute customizable path-based and ECFP-style fingerprints along with MACCS keys, providing versatile molecular encodings for similarity searches and predictive modeling.

Reliable and validated implementations: Ensure reproducible descriptor values thanks to thoroughly tested and documented algorithms.

Fast processing of large datasets: Leverage multithreaded execution to deliver the throughput required for intensive computational workflows.

Seamless workflow integration: Work directly with alvaModel, KNIME, Python to streamline your modeling and analysis workflows.

Open data exchange: Export results to CSV, SDF, or SMILES to easily use your data in external data-science environments.

Flexible for many applications: Molecular descriptors and fingerprints are ideal for QSAR/QSPR modeling, chemical similarity, clustering, read-across, virtual screening and general cheminformatics exploration.

Molecular Descriptors

AlvaDesc 3.0

It calculates almost 6,000 descriptors, among them more than 4,000 descriptors are independent of 3-dimensional information such as constitutional, topological, pharmacophore. It includes ETA and Atom-type E-state indices together with functional groups and fragment counts. Additionally, alvaDesc implements an extensive number of 3-dimensional descriptors such as 3D-autocorrelation, Weighted Holistic Invariant Molecular descriptors (WHIM), GETAWAY and Solvent Accessible Surface Area descriptors.

If needed, alvaDesc can calculate partial charges using the Gasteiger’s “Partial Equalization of Orbital Electronegativity” (PEOE) and 3D coordinates using Distance Geometry (DG). The initially generated coordinates are then refined through an optimization phase based on Universal Force Field (UFF).

Molecular Properties, Drug-like and Lead-like Indices

AlvaDesc provides the calculation of several model-based physicochemical properties such as molar refractivity, topological polar surface area (TPSA), molecular volume estimations, three LogP models (Moriguchi, Ghose-Chippen and Wildmann-Crippen octanol-water partition coefficient), a LogP consensus model (LOGPcons) and a LogS aqueous solubility model (ESOL).

In order to get a quantitative estimation of synthetic accessibility of molecules, alvaDesc includes the synthetic accessibility score of drug-like molecules (SAscore).

There is a significative list of drug-like and lead-like alerts including the well-known Lipinski alert index. Considering the importance of drug-likeness when selecting compounds in the early stages of drug discovery, alvaDesc includes the calculation of the quantitative estimate of drug-likeness (QED).

Molecular Fingerprints

AlvaDesc carries out the calculation of MACCS166 fingerprint, Extended Connectivity Fingerprint (ECFP and ECFPV3) and Path Fingerprint (PFP) and allows the customisation and calculation of the most used hashed molecular fingerprints. The calculation of hashed fingerprints can be tuned not only with respect to the fingerprint size, fragment type and dimensions, but also by defining the atom and bond parameters considered during fragment identifications (e.g., atom type, aromaticity, the number of attached hydrogen atoms, connectivity).

 

Structural Patterns

Structural Patterns

AlvaDesc can identify structural pattern occurrences in molecular datasets. A structural pattern refers to a specific arrangement of atoms and bonds within a molecule, identifiable and characterizable as a substructure. These patterns represent recurring molecular features, such as functional groups or distinct atom connectivities. Structural patterns play a pivotal role in cheminformatics, particularly in structure-activity relationship (SAR) studies, pattern recognition, and the prediction of molecular properties and behavior.

In alvaDesc, structural patterns are defined using the SMARTS syntax.

Other Features

A key feature of alvaDesc is its capability to handle both full-connected and non-full-connected molecular structures, such as salts and ionic liquids. All of the molecular descriptor calculation algorithms provide different theoretical approaches for the calculation of molecular descriptors on such structures.

Different tools are provided to carry out a first exploration of your molecular dataset:

  • Molecule structure verification using PubChem and Google Patents services
  • Molecule structure visualisation and filtering
  • Principal Component Analysis (PCA), t-SNE and correlation analysis

Due to its capability of calculating large numbers of molecular descriptors, alvaDesc provides variable reduction tools, including the fast V-WSP method (variable reduction method adapted from space-filling designs).

Video

A short video introduction:

Platforms

The software is 64bit and it’s available for Windows, Linux and macOS. It is provided both as an easy to use command line tool and as an intuitive graphical interface.

With the release of alvaDesc 3.0, users can experience significantly enhanced calculation speeds on M Series processors. Learn more about the benchmarking results and performance improvements here.

Performance comparison of descriptor calculation times for 2D (4,215 descriptors) and all descriptors (5,799 descriptors, including 3D) across three configurations: Intel Mac (x86_64), Apple Silicon via Rosetta 2, and native Apple Silicon. Native Apple Silicon builds significantly reduce computation time, demonstrating the benefits of optimized support for M Series processors.

How to Cite

If you reference alvaDesc in an academic paper or publication, you can find the correct citation for your version by:

  • Running alvaDescGUI and selecting “About alvaDesc” from the menu
  • Using the command alvaDescCLI –cite

Additionally, please consider citing the following papers:

  • Mauri, A. (2020). alvaDesc: A Tool to Calculate and Analyze Molecular Descriptors and Fingerprints. In K. Roy (Ed.), Ecotoxicological QSARs (pp. 801–820). Humana Press Inc. https://doi.org/10.1007/978-1-0716-0150-1_32
  • Mauri, A., & Bertola, M. (2022). Alvascience: A New Software Suite for the QSAR Workflow Applied to the Blood–Brain Barrier Permeability. International Journal of Molecular Sciences, 23(21), 12882. https://doi.org/10.3390/ijms232112882
  • The perfect tool to prepare your molecular dataset for alvaDesc is alvaMolecule
  • Create QSAR/QSPR models with alvaModel starting from an alvaDesc project
  • A tutorial showing how to build a QSAR model using Alvascience tools

AlvaDesc is also available as:

Download

Please, log in in order to access the content.