Topological Similarity Searching
Atom pair descriptors are used to search a collection of molecules for similar compounds. This example shows how to compute the descriptor, an association between pairs and the bond distance between them, and use it to compute the distance between two molecules.
Atom pairs are molecular substructures defined by two atoms and the number of bonds along the shortest path between them. The following plot shows three atom pairs, two with four intervening bonds and one with seven intervening bonds.
Define a function to compute all atom pair instances. The function returns an Association with keys that are triples {distance, atom1, atom2} and the values give the number of occurrences of that atom pair.
Compute all the pairs for a molecule. Notice that each atomi is a a triple of the form {"AtomicSymbol", "PiElectronCount", "HeavyAtomCoordinationNumber"}.
The similarity of two molecules is measured by the degree of overlap between their respective multisets of atom pair associations. Create a custom distance function based on the multiset dice dissimilarity.
Create molecules from the list of PubChem central nervous system (CNS) agents (extracted 14 Nov 2018). The PubChem CID is stored in the molecule using MetaInformation.
Create a NearestFunction to the various CNS agents using the molecular distance function.
Diazepam is known to affect the human nervous system, so look for similar molecules.
Find the 10 molecules in the set closest to diazepam.
Use MoleculePlot to visualize the 10 compounds most similar to diazepam.