Running FeatureSpacePlot[mols] gives me an interesting plot and I would like to understand the method.
Does anyone know or can point me to a reference page on how features are extracted for molecules?
Default feature extraction for molecules is the extended connectivity fingerprint, also known as "circular fingerprint" or "Morgan fingerprint". You can access it via undocumented function:
Chemistry`MoleculeFeatureVector[Molecule["glucose"], "ExtendedConnectivity-v1"] (* NumericArray[...] *) or via MoleculeFingerprints Paclet:
PacletInstall["WolframChemistry/MoleculeFingerprints"]; Needs["WolframChemistry`MoleculeFingerprints`"]; WolframChemistry`MoleculeFingerprints`ExtendedConnectivityFingerprint[ Molecule["glucose"]] (* NumericArray[...] *) and the resulting output is a bit vector. It is expected that in version 15.0, this function will be incorporated in the system as MoleculeFingerprint.
Here is a recreation of FeatureSpacePlot for molecules via several underlying steps. Unfortunately, I don't think this is very well documented.
Needs["WolframChemistry`MoleculeFingerprints`"] mols = Molecule /@ {"decane", "L-glucose", "D-glucose", "L-galactose", "D-galactose", "L-fructose", "D-fructose", "nitrogen monoxide", "nitrogen dioxide", "nitric acid"}; (* Calculate molecular fingerprints *) fingerprints = WolframChemistry`MoleculeFingerprints`ExtendedConnectivityFingerprint /@ mols // Normal; (* Standardize features *) standardized = Transpose[Quiet[Check[Standardize[#, Mean, Max], ConstantArray[0., Length[#]]]] & /@ Transpose@N@fingerprints]; (* tSNE reduction *) res = DimensionReduce[standardized, 2, Method -> "TSNE", PerformanceGoal -> "Quality"]; (* Plotting *) Show[FeatureSpacePlot[Legended[mols, "FeatureSpacePlot"], PlotStyle -> GrayLevel[0.8], LabelingFunction -> Function[mol, mol["MolecularFormula"]], PlotMarkers -> {Automatic, Large}], ListPlot[Legended[res, "ExtendedConnectivityFingerprint \[Rule] DimensionReduce"]]]