Adding new features to MDAnalysis


[ #gsoc #mdanalysis #rdkit #python ]

07 August 2020 - Cédric Bouysset

Card image cap

This week I’ve been attending my first ever virtual meeting, the International Symposium on Olfaction and Taste (ISOT for short).
While technically this means I didn’t have a lot of time to implement new things, I still managed to get some work done and I hope that you will learn a lot of things from this blog post.

 

Bringing MD trajectories to RDKit


I finally added coordinates to the converter, so you can now properly read a trajectory:

sel = universe.select_atoms("resname LRT")

for ts in u.trajectory:
    mol = sel.convert_to("RDKIT")

The resulting molecule will contain a conformer with the coordinates of the current frame.
And don’t worry about efficiency, the topology of the molecule is not rebuilt for every frame, it is cached based on the id of the AtomGroup and the arguments that were passed to the converter.

In the end the MDAnalysis AtomGroup is converted to an RDKit molecule once and cached, and to create each conformer the cached molecule is copied and we simply add the current frame’s coordinates.

This pretty much brings an end to the second part of my GSoC project, hooray 🎉

How to create GIFs in Python

Here is a code snippet to create a gif of the trajectory with RDKit:

import MDAnalysis as mda
from rdkit.Chem import Draw
from nglview.datafiles import PDB, XTC

u = mda.Universe(PDB, XTC)
elements = mda.topology.guessers.guess_types(u.atoms.names)
u.add_TopologyAttr('elements', elements)
ag = u.select_atoms("resname LRT")

pngs = []
for ts in u.trajectory:
    mol = ag.convert_to("RDKIT")
    mol = Chem.RemoveHs(mol)
    img = Draw.MolToImage(mol, size=(500,400), legend=f"Frame {ts.frame}")
    pngs.append(img)

img, *imgs = pngs
img.save(fp="traj.gif", format='GIF', append_images=imgs,
         save_all=True, duration=200, loop=0)

The img, *imgs = pngs line is the same as writing img = pngs[0]; imgs = pngs[1:]

Here’s the result (you can convert the .gif to a .webm video online):

Draw.MolToImage returns a PNG image through the PIL library which is automatically installed with RDKit, and PIL can be used for reading and writing images in a variety of formats.
You can modify the time between each frame with the duration parameter by either passing an integer in milliseconds, or a list of int for each frame.

If you’d rather use the SVG drawer to highlight some particular atoms, here’s another snippet. We’ll also rotate the trajectory along the z axis to have a clearer view:

from PIL import Image
import cairosvg
import io

# add a rotation transformation for every timestep of the trajectory
rotate = mda.transformations.rotate.rotateby(30, [0,0,1], ag=ag)
u.trajectory.add_transformations(rotate)

def output_svg(mol, legend="", atoms=[], color=(0,.9,0), size=(700, 400)):
    d = Draw.rdMolDraw2D.MolDraw2DSVG(*size)
    opts = d.drawOptions()
    highlight_atoms = {atom: color for atom in atoms}
    kwargs = dict(
        legend=legend,
        highlightAtoms=atoms,
        highlightAtomColors=highlight_atoms,
    )
    Draw.rdMolDraw2D.PrepareAndDrawMolecule(d, mol, **kwargs)
    d.FinishDrawing()
    return d.GetDrawingText().replace('svg:','')

svgs = []
for ts in u.trajectory:
    mol = ag.convert_to("RDKIT")
    mol = Chem.RemoveHs(mol)
    img = output_svg(mol, legend=f"Frame {ts.frame}", atoms=[8])
    svgs.append(img)

img, *imgs = [Image.open(io.BytesIO(cairosvg.svg2png(bytestring=f))) 
              for f in svgs]
img.save(fp="traj.gif", format='GIF', append_images=imgs,
         save_all=True, duration=200, loop=0)

 

A new API for converters


I’ve also been working on the new API for converters. We currently have converters for ParmEd and RDKit.
We wanted something that could be tab-completed, while not cluttering the namespace of the AtomGroup class (so we can’t just add to_rdkit, to_parmed…etc.) and also keeping the current converter syntax.

Pandas does this really nicely with df.plot, which can both be called with df.plot(kind='scatter', ...) or df.plot.scatter(...), so I decided to follow the same direction.

The trick to using plot as both a method and an object of its own is to use what Pandas calls an accessor. The accessor gives you access to another class in an object oriented manner, so that you don’t have to call df.plot.scatter(df, ...).

Let’s use a simple example to understand how it works, using a very simple molecule class and a class that defines descriptors (based on our favorite chemoinformatics module of course):

from rdkit import Chem
from rdkit.Chem import Descriptors

class DescriptorWrapper:
    """A class to describe molecules"""
    def __init__(self, molecule):
        self.molecule = molecule
    def __call__(self):
        print(f"I am {Chem.MolToInchiKey(self.molecule.rdmol)}, but you can call me {self.molecule.name}")
    def count_atoms(self, heavy_only=False):
        count = self.molecule.rdmol.GetNumHeavyAtoms() if heavy_only else self.molecule.rdmol.GetNumAtoms()
        print(f"I have {count} {'heavy' if heavy_only else ''} atoms")
    def weight(self):
        mw = Descriptors.MolWt(self.molecule.rdmol)
        print(f"I weight {mw:.2f} g/mol")

class Accessor:
    """Gives access to the wrapper class from the parent class"""
    def __init__(self, to_access):
        self.to_access = to_access
    def __get__(self, obj, cls):
        # pass the object that called the class to the wrapper
        return self.to_access(obj)

class Molecule:
    """A class for molecules"""
    def __init__(self, smiles, name):
        self.name = name
        self.rdmol = Chem.MolFromSmiles(smiles)
    describe = Accessor(DescriptorWrapper)

>>> benzene = Molecule("c1ccccc1", name="benzene")
>>> benzene.describe() # __call__ method is used
I am UHOVQNZJYSORNB-UHFFFAOYSA-N, but you can call me benzene
>>> benzene.describe.weight()
I weight 78.11 g/mol
>>> benzene.describe.count_atoms(heavy_only=True)
I have 6 heavy atoms

Every time you go through benzene.describe, it calls the Accessor.__get__(benzene, Molecule) method, which returns DescriptorWrapper(benzene). So in the end, calling benzene.describe() is the same as DescriptorWrapper(benzene)() and so on.

Using the same Accessor class and a ConverterWrapper class which defines the rdkit and parmed methods, I was able to move the code of the old AtomGroup.convert_to method to the ConverterWrapper.__call__ method, and redefine the convert_to method as Accessor(ConverterWrapper). That’s a pretty neat trick to keep methods organized for users!

 

SMARTS selection


Something else I’ve been working on: SMARTS selection!
You can now select atoms based on a SMARTS query, and combine the selection with the other MDAnalysis selections:

u = mda.Universe.from_smiles("Nc1cc(C[C@H]([O-])C=O)c[nH]1")
mol = u.atoms.convert_to.rdkit()
mol.RemoveAllConformers()
for atom in mol.GetAtoms():
    atom.SetProp("atomNote", str(atom.GetIdx()))
mol

A molecule

# carbon atoms in a ring
>>> u.select_atoms("smarts [#6;R]").indices
array([1, 2, 3, 9])
# atoms that are not hydrogen, and not in a ring but connected to a ring 
>>> u.select_atoms("smarts [$([!#1]);$([!R][R])]").indices
array([0, 4])
# carbon and next to a ring
>>> u.select_atoms("type C and smarts [$([!R][R])]").indices
array([4])

Now keep in mind that this selection combines all the resulting matches into one single match of unique atoms. If you want to iterate other C=O bonds match by match for instance, you’d have to use the RDKit converter and then iterate over mol.GetSubstructMatches(Chem.MolFromSmarts('C=O')).

That’s it for this week, I hope you’ve learnt something new 🤓!

Cheers,

Cédric