Function Repository Resource:

MoleculeFingerprintSimilarity

Source Notebook

Measure the similarity between two molecules

Contributed by: Jason Biggs

ResourceFunction["MoleculeFingerprintSimilarity"][mol1,mol2]

returns the fingerprint similarity between molecules mol1 and mol2.

Details and Options

ResourceFunction["MoleculeFingerprintSimilarity"] first encodes molecules into a string of bits, either on or off, and computes the similarity between resulting bit vectors.
ResourceFunction["MoleculeFingerprintSimilarity"] takes the following options:
"FingerprintType""RDKit"the algorithm to use when encoding the molecule
"SimilarityMeasure""Tanimoto"the bit vector similarity measure to use
The option "FingerprintType" can be any of the following:
"AtomPairs"atoms are typed based on atomic number, number of pi electrons, and vertex degree, and all pairs of atom types, together with the distance between them, are hashed and corresponding bits in the fingerprint are set
"MACCSKeys"166 bit structural key descriptors in which each bit is associated with a SMARTS pattern
"MorganConnectivity"extended-connectivity fingerprints, atoms are typed based on atomic number, heavy-atom degree, mass number and ring membership and the neighborhood around the atoms are used to set the bits
"MorganFeatures"atoms are typed based on chemical features, such as H-bond acceptor/donor, aromaticity, acidity, etc.
"TopologicalTorsions"similar to "AtomPairs", but rather than pairs of atoms, all sets of four consecutively bonded atoms are used to generate the bits
"RDKit"identifies all subgraphs within a particular range of sizes, hashes each subgraph to generate a raw bit ID, mods that raw bit ID to fit in the assigned fingerprint size and then sets the corresponding bit
The option "SimilarityMeasure" can be any of the following, where ao indicates the number of on-bits in the bit vector a:
"Asymmetric"(a&b)o/min(ao+bo)
"BraunBlanquet"(a&b)o/max(ao+bo)
"Cosine"(a&b)o /
"Dice"2(a&b)o/(ao+bo)
"Kulczynski"((a&b)o (ao+bo)) / 2aobo
"McConnaughey"((a&b)o(ao+bo)-aobo) / aobo
"Russel"(a&b)o /ao
"Sokal"(a&b)o/(2ao+2bo-3(a&b)o)
"Tanimoto"(a&b)o/(ao+bo+(a&b)o)

Examples

Basic Examples (2) 

Get the fingerprint similarity between two similar molecules:

In[1]:=
m1 = Molecule["caffeine", IncludeHydrogens -> False]; m2 = Molecule["7-(3-Hydroxypropyl)theophylline", IncludeHydrogens -> False]; ResourceFunction["MoleculeFingerprintSimilarity"][m1, m2]
Out[1]=
In[2]:=
MoleculePlot /@ {m1, m2}
Out[2]=

Get the fingerprint similarity between two dissimilar molecules:

In[3]:=
m1 = Molecule["caffeine", IncludeHydrogens -> False]; m3 = Molecule["2,6-diphenylpyridine", IncludeHydrogens -> False]; ResourceFunction["MoleculeFingerprintSimilarity"][m1, m3]
Out[3]=
In[4]:=
MoleculePlot /@ {m1, m3}
Out[4]=

Scope (1) 

MoleculeFingerprintSimilarity works on molecules created from any source, from MoleculeRecognize to Entity:

In[5]:=
ResourceFunction["MoleculeFingerprintSimilarity"][ MoleculeRecognize[\!\(\* GraphicsBox[ TagBox[RasterBox[CompressedData[" 1:eJzt3XtwVGWax/G47h9eAbG8lH8IlqUlNbNAydRqleUCRdUOYxHJCkihKHjJ UCwqKKMUM8yCsJY1JQILRdXIHXGQQZSblwGj4TqDghuILCEZQyMZIZFLBwYm hASf7bcx5JxOn9PnJH366T79/VS9Isl7ut9Ok1+//Zz3PX3H0+Mf+eU/FRQU TLwq9p9HnprU/6WXnpo8pEvsL8NemDh2zAvFz/7ihV8Vjyl+6f6nr4x9sfsV BQVPx9o/x/5fAAAAAAAAAAAAAAAAAAAAkJ8aoiL7vhDZuEFk86ZY2xr7e6VI tEF7ZAAQbg3HRN58WsQs6Xdqw2aKHIxqjxQAwqdynXv+JrY3tmmPGADCo3x5 25xduEGktjY2R26I/RkR2fx22z4LyrVHDgAhUGHP1p/NFHEsO8QyeVGRvf/B 85kcLACEz5rRllx909sxr1hyuM+iYMcHAKEWsc9t99Z7O+x8mf24SJBjBIAQ +9Zybq7PfH/H/te/tB67uiqY8QFA2FWtas3Soe/5O3b3/7Qe+5vtwYwPAMKu /Pftn9NaM3yQzwwHAFyy/XeWOe1uf8d2ZC4NALjk08npmQ8XvhPM+AAg7KxZ +uxGf8da68Nv+JxLAwAuqd9iWX/2W3/HWtcQb/g2mPEBQOidEilqR57a8jvW jgU7SgAIte2WGnFBoYdMjWX38ALO0QFA2iTMiQvuFdlbm7xrtCyhL3NhAEiL aMI+ZdMenSKydZ9I5UGRfVtF5j/Vts8m6sIAkDa1u/xdf3hTpfaIASCctv1R pKi3Q/4WiqzYFr/8JQAgYOYz6g4fFolGL10Pns+mAwAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAyFnTpk2T3r17y+jRoyUSiaS9PwDAmcnRgoKCy2327Nlp7Q8A cFdaWmrL1b59+6a1PwDAHTkMALrIYQDQRQ4DgC5yGAB0kcMAoIscBgBd5DAA 6CKHAUAXOQwAushhANBFDgOALnIYAHSRwwCgixwGAF3kMADoIocBQBc5DAC6 yGEA0EUOA4AuchgAdJHDAKCLHAYAXeQwAOgihwFAFzkMALrIYQDQRQ4DgC5y GAB0kcMAoIscBgBd5DAA6CKHAUAXOQwAushhANBFDgOALnIYAHSRwwCgixwG AF3kMADoIocBQBc5DAC6yGEA0EUOA4AuchgAdJHDyFmNzXLmVFOsNct5Twe0 9G/y2D8dWu/zzNmLHvpfbO3f6KU/woAcRq6qfKtReg5sire3y39I2T+ytFF+ +vOmeFvooX86WO/TtL+l6N9Y0dr/3+Y1Z2SM0EcOI1dZM85vDnvpnw6JOTxk lnu2WnP4sQXkcL4gh5GrcjGHTfujy32Tw/mJHEauytUcNq3aoT85nJ/IYeSq XMzh/oN+rP061CfI4fxEDiNXZSyHzzbLdzVNl5u3dQ9t73PslAsp6xPkcH4i h5GrrBm3qsJD/9X+cvh8XaMsnnQhaV1hyJRmqT7lL/vXHmqWndPtt5dYnyCH 8xM5jFxlzTiTi598fEE+WefQYt/77xFNnnP47+XJ67qJbV2F++3Y5+DmKxfk P/+99fi+M+xza3I4P5HDyFWJtdeeLi0xP11zuNGelT+N5ff28mY5c7ZJDsXy edKghHNujd7G2HKff9/T6DgWcjg/OeXq0aNHpUePHvGvderUSdavX+/aH8g0 p7UIXppbDh/9qPV2f1Icy982PWJza0sWL9zjXC9OXpO+KKVTktcnyOH8lJir Dz74oEyfPl2uuuoq29dNGz16tLz77rvkMLKCNeOmlzTL6ZNN8n1tc9Jm5rKl ltqscw5ftNVw1x5K3stat3DLS8dzg2cbk9YnOprD0WhUli1bJpFIxPex0FNW VmbL1WuvvbZN/lrb9ddfb/v7+PHjtR8C8pT9PF261kvEcnjeBRlZ7F5zsObl I+3JYUlWn4h9sbr9OWwyuHv37vHfyy5dusjSpUt9HQ9dDz/8sFx55ZWu+ZvY rrjiivhzzusutOitH7bPmc1cvH332bY+sdvjPDuZUaNGtfk9nTBhQjseHzLp 5MmTMmbMGMesveGGG+S5556TXr16Ofbp169ffE4NZJpWDlvnwqZ9XON8Wynv s9Fen7A2Pzmc+L7W2nr37h2fKyP7zJkzJ/7exem5Gzt2bDynW5j3OJ07d3bs b2rHPNfIJJUcrmuUYT9vW9ftyH1erk8Utj+HzXzI7f2r+V0354KQHUpKSuSu u+5yfL4eeOAB2b9/f9JjTc5OnTrV9bmeNm1ahh8R8lXGc/hUo4xMmLO6rVnz fp/2+kR/nzm8du1az/VEM/+CnurqaiksLHR8fkytd82aNZ5uy9SEzToJt9sy /zaAIGUyh88fudAmgyvOpnGMiWuWfeRwy7k5r62oqIj3rhl29uxZefnllx2f E7M+YsaMGe26bfM+p1u3bq61Y87jISiZyuGjf2m7TtlLBvu9z8T1E15y2O39 qVszNWPO62TGkiVL5JZbbnF8Lh577LH4fo2Omj17tmvt2Jyz5fUX6Wa9XkSq /cXx/uta+7tdA7jVRSlP3Csyoln+lqIW4XSfqdfW2ddhjFmaovYcm+O4neNJ 1VjbFqxdu3a5rnEw3zN90snkrFlL7PacU5tCuv3ww6WWfhdlz1sJn6UxI9ne Oj3J1qm1p5lz7EgfM7d9/PHHHX/eZm68ePHiQMdg3uukqh1z3hbZrnK1PYPH LM2uPcaJ+2A72kydghpix5kar9teOFMjNrXiTDHn6dxqx+ZcAc87slFjtT2D 3fZpaEm1Tq29dQrmSO1j1ji4nS8dNGhQfK2EFnMewa12bNa5UTtG9miWDePs e9yGTGmSKZMuyCvJ2rgmWbQ1szltarrpzuDE30l4Y9b4mrW+Tj9Ls0bYrBXO BiZn3WpZ5nXYXJsEUHe27RrhVM3t+hLpZr2GRJCNtW3uzB43s9fNLdPM+oVs lKp2bGpUvC+Cqnbk8JjV3j8nqaPau06tPc3kPWvb2po7d278mg9OPzdzrQjr XuRsZd5XudWOzflbaseAXUfXqbWnsbat1ZYtW+See+5x/Fm57UXOVi17pJ1q xy17pHlvBFziti406JbPa9vM+TVzTUqnn42ZU65evVp7mB1iXuPdasfmvRG1 Y0Bca3qZaPm2ts2sL5s0aZLjz+Oaa66RV199VRoaGrSHmjamLsz1NQFnbte2 zGSdIh/O4Zi536233ur4cxgxYoTU1NRoDzMwXF8TcGbmo6Y+4fY7kokW1rVt e/bskT59+jg+7iD2Imcrrq8JpGbmLJq1ijCtbTN7kZ944gnHx3rzzTfLokWL tIepwrz2Dx482PFnw/U1Ad05chjWtr322muue5EnTpwoZ85k05VFdHB9TcAb jTlyrq5t++CDD+SOO+5wfFwPPfSQVFVVaQ8z63B9TcAbjTlyrpy7qaiocH2t yqa9yNmK62sC/mRyjpzNa9vMHrdx48Y5jt28Zs2aNUt7mDnFy2cz5cP6GsAr U8c1a/WDniObuVC2nbeZN2+edO3a1XHMxcXFcvz4ce1h5iyurwn4Y95Tmjmy 23r9dLRsWNPkZS/y3r17tYcZGlxfE/Av6DmyOYeu8Xt3+PDh+BzMaVy33367 rFq1KuPjygdcXxNonyDnyJlc22b2Ik+ePNlxLFdffXV8zhamvcjZKtX1NbP1 uqBANghqjhz02rbly5fLbbfd5nj/w4cPD/Ve5GzldH1N85rvX+z1s3KvyOZN re3LfSK17XzPZd6rmeb1ZbnBZ3+gg4KYIwexts3LXuQdO3ak9T7hT7Lra5p1 b97Fgu+9uRI70Ln961CRrZU+bvJg67H3rvFwQIWl/zs+xg6kR8scOR1ZnK61 bXV1dfLkk0863s9NN90kCxYs6PiDR9qY593ksWmeRWN5eb9L/ia2wvke56uW XB36XgD9gWCYeY2p67mtUfLSOrq27fXXX5frrrvO8fZffPFF9iKHQbSsbc7+ aqFI5eFY1sbCNnpM5Ms/iQxPzGIv1wKx5OognznsqT8QPLM+v6NzZL9r29at Wyd33nmn4+0NGDCAvcih0ZCQr4Uie+udu297y57Fb6f6d0AOIzw6Okf2srYt 1V7ku+++Wz788MMMPWJkRNUqS67eK3LMwzEbJ9hz2/WfFTmMcGrvHNlpbVt9 fb08//zzjsd16tRJZs6cqfBIEbgxlrntvHKPB50SKbIct+Fbl77kMMKtvXNk 69q2+fPny4033ujY95lnnmEvcmhFfMxrE2z/ncfzaX7Pu0XIYeQsv3PkgQMH So8ePRy/f99997EXOfQsGVlQ7O/Q+i2WY3/r8T4KL61F3rgheTPfe+/X5DBy Xssa0vbWkc3nxa1cuVL7YSATzpf5nKtaWfN1mMsaNms/n40cRgiYNWtun7WT 2KZMmSLnzp3THjYypc46p33T58HkMOBHy7p+pznysGHD2Iucj+o+yex8+Gcz W/c4O7XDW8hhhJ75PB1rBvfv3197SFBjzdKX/R1av6v12D7zvd2H3/105DBC asWKFbYcHjlypPaQoOaYfS9zxMehn1rWED+70aUj69aAROQwbBYN9JinVsfs ddytdS59yWEgETkMG9u5ulSZ+qNFRfb1bq7X+/GbqwfIYYQeOYw2/jDYnsWr 9zl0PCXy5i/sfTelym1yGEhEDqOthH3KLWsgNmwTqTx46Zrwya5LPGWzh9u2 5HAhOQwY5DCSOybyy17e1/a+4SWDjQP+cvgf1IcRfuQwXFVuEXnlIef8nbla pNbPBxZZcrjYy3lAS/9h5DDCiRyGN7Gsra299Jl05k+FzwMHwoocBgBd5DAA 6CKHAUAXOQwAushhANBFDgOALnIYAHSRwwCgixwGAF3kMADoIocBQBc5DAC6 yGEA0EUOA4AuchgAdJHDAKCLHAYAXeQwAOgihwFAFzkMALrIYQDQRQ4DgC5y GAB0kcMAoIscBgBd5DAA6CKHAUAXOQwAushhANBFDgOALnIYAHSRwwCgixwG AF3kMADoIocBQBc5DAC6yGEA0EUOA4AuchgAdJHDAKCLHAYAXeQwAOgihwFA FzkMALrIYQDQRQ4DgC5yGAB0kcMAoIscBgBd5DAA6CKHAUAXOQwAushhANBF DgOALnIYAHSRwwCgixwGAF3kMADoIocBQBc5DAC6yGEA0EUOA4AuchgAdJHD AKCLHAYAXeQwAOgihwFAFzkMALpmzZply+FHH31Ue0gAkBei0ahMmDDBlsGm 9ezZU3toABB6c+bMkS5durTJYNMGDBigPTwACK3S0lLp3r170vw1rWvXrlJT U6M9TAAInUgkIv369XPM327dusnatWu1hwkAoeNUA25pnTt3lqlTp2oPEwBC admyZY41YNNGjRoVz2kAQHqZGnDv3r0d87dv375SVlamPUwACB1TAy4qKqIG DAAZZmoL06ZNS1kDpgYBAOnnpQZs5skAgPTyUgM2fQAA6WXmtqNHj3atAS9d ulR7mAAQOi01YKcaBDVgAAiOqQG77UUePHgwNWAACICp77rtRe7Vqxc1YAAI QKoasKlBUAMGgPRLVQM2jRowAATD7HOjBgwAmWeu80ANGAAyz9QWqAEDgI5U NeDx48dTAwaAAKSqAZu9yNSAASD9UtWAzV5kasAAkH5ePpNo9uzZ2sMEgFBy +1x6asAAEJxUn0tPDRgAgsHn0gOADj6XHgD0pKoB87n0ABAMsw4tVQ2Yz6UH kDOaRE5E6+Pt9Hlvh0R/7B895++uThw5In8t2yGV2z+TA9t3SmXZ13Lku3r5 h4/bMHVgpzkwNWAAuehM1R/ky4UFl9pH36TuX73mcv8vNuz3dB+R7Utld8t9 JG1Dpax0v6c8NushqAEDCBNrrvrNYS/9q9b/h0v+JrRVn8tpD2M210CjBgwg LILM4eN/mWPP2RVz5WD5X+VYbb2cqK2TQ+Wfyd7lCVn8p9RjMLUJsw+OdcAA wiDIHP7m/fsv993zkXMN45sPR7Te5oLlctz3owCA3BVcDjfI/pa57oLnJOJ6 DrBWyiy14m98nv8DgFwWaA4vaa037D/kfrtV6yfLVyunS9matXLU47oNAAgD W656qM1eOOyjLmE7RzdU9m3/Wk6QsQBgY8vhxXPlwPbPpaL046TNfG//+494 zuHTkZKk6yJ2r5wrX5fulOpInec1ywAQVrYc9ts81DFOHng/5e2YXD5QXuNr PwcAhEXQORx3rlaqSlLt5TDn8+ZKhHN0APKMLYdX/Vm+P1Mv39c6tGiD1FjX BHvNYYvT0To5VLZDyte9mjyXF7wlR5sCeKAAkKX8nqdr8Lm+IpUT31XJ/1lr zrH2vzu/7/DtAkCuCGrd2umImfP+XspW/FoOfpt6HJHNky/f7hdrv/LzEAAg pwWVw5FPWtes7fm0JuXtXqj5yLJuY5VwxQgA+SKoHK7/eoml5js9xX66hP4e r+MGAGEQ2H6681XyVcJaiOra5F2jkRLbObuynV6uugYA4RDkdX5qE6+3ZtYK r5gu+0o+l8o/75SKknel7J0hCesllktduh4cAOSAoK8/HPlsmvf1yAt+I9Vn 0vGoACB3WHN1z6ce1q0dtpxP87DOzYhGdkv56nEuGfy4lJXslBOsGwaAYMVy NlpbJ8ciR+LXgj8WqYvvDWnUHhcAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA AAAAAAAAAAAAAAAAAAAAAADS4v8BOPMFxg== "], {{0, 144.}, {177., 0}}, {0, 255}, ColorFunction->RGBColor, ImageResolution->144], BoxForm`ImageTag["Byte", ColorSpace -> "RGB", Interleaving -> True], Selectable->False], DefaultBaseStyle->"ImageGraphics", ImageSize->Automatic, ImageSizeRaw->{177., 144.}, PlotRange->{{0, 177.}, {0, 144.}}]\)], Molecule[Entity["Chemical", "LCysteine"]]]
Out[5]=

Options (3) 

The fingerprint method and similarity measure used can greatly affect the calculated similarity. Take two nominally similar molecules:

In[6]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/34dd38ec-ee23-489e-b650-277501df23ef"]
Out[6]=

Measure the similarity using all available fingerprint types and similarity measures:

In[7]:=
data = Table[ ResourceFunction["MoleculeFingerprintSimilarity"][m1, m2, "FingerprintType" -> fp, "SimilarityMeasure" -> sim], {sim, smeasures = {"Tanimoto", "Cosine", "Dice", "Kulczynski", "BraunBlanquet", "Sokal", "McConnaughey", "Asymmetric", "Russel"}}, {fp, fptypes = {"AtomPairs", "MACCSKeys", "MorganConnectivity", "MorganFeatures", "TopologicalTorsions", "RDKit"}}]; MinMax[Flatten[data]]
Out[7]=

Visualize the results in a table:

In[8]:=
TableForm[data, TableHeadings -> {smeasures, fptypes}]
Out[8]=

Properties and Relations (2) 

MoleculeFingerprintSimilarity returns 0 for completely dissimilar molecules:

In[9]:=
ResourceFunction["MoleculeFingerprintSimilarity"][Molecule["methane"], Molecule["ammonia"]]
Out[9]=

MoleculeFingerprintSimilarity returns 1 as the result if the two given molecules are identical:

In[10]:=
ResourceFunction["MoleculeFingerprintSimilarity"][ Molecule["caffeine"], Molecule[Entity["Chemical", "Caffeine"]]]
Out[10]=

Possible Issues (1) 

The presence or absence of explicit hydrogens in the molecular graph can influence the computed similarity:

In[11]:=
ResourceFunction["MoleculeFingerprintSimilarity"][Molecule["benzene"], Molecule["anthracene"]]
Out[11]=
In[12]:=
ResourceFunction["MoleculeFingerprintSimilarity"][ Molecule["benzene", IncludeHydrogens -> False], Molecule["anthracene", IncludeHydrogens -> False]]
Out[12]=

Neat Examples (2) 

Create molecules from the list of central nervous system (CNS) agents obtained from PubChem:

In[13]:=
(* Evaluate this cell to get the example input *) CloudGet["https://www.wolframcloud.com/obj/c0b6cc03-cc36-4e56-82c4-6604e0659e5f"]

Find the five nearest molecules to the tranquilizer diazepam:

In[14]:=
diazepam = Molecule[Entity["Chemical", "Diazepam"], IncludeHydrogens -> False]
Out[14]=
In[15]:=
Nearest[CNSagents, diazepam, 5, DistanceFunction -> (1 - ResourceFunction["MoleculeFingerprintSimilarity"][##] &)]
Out[15]=

Publisher

JasonB

Version History

  • 1.0.1 – 05 October 2021
  • 1.0.0 – 29 July 2020

Source Metadata

Related Resources

License Information