Tuesday, 7 March 2017

Mapping the Evolution of Enzyme Function

The Institute of Structural and Molecular Biology, which combines the research endeavours of Birkbeck and University College London in these disciplines, runs a programme of weekly research seminars throughout the university terms. Each term’s seminars are linked by a theme, and the theme for the spring term of 2017 has been ‘Bioinformatics and Computational Biology’. Early in February, the Institute was delighted to welcome one of the UK’s foremost structural biologists, Professor Dame Janet Thornton, to give a talk in this series. Thornton was well known to many in the large audience, having spent the whole of the 1980s at Birkbeck, rising to be a professor in the School of Crystallography (now part of Biological Sciences). During the 1990s she held chairs at both Birkbeck and UCL and founded a biotech company, Inpharmatica, before leaving to direct the European Bioinformatics Institute (EBI) at Hinxton, near Cambridge. She has now stepped down from the directorship but maintains an active research group at the EBI.

The topic that Thornton chose to present was one that she had worked on throughout her long career: the structure, function and evolution of the enzymes. When she started studying proteins there were probably about 20 known structures. The PDB now holds well over 120,000 protein structures, and tens of thousands of these are of enzymes, so there is plenty of data to work with.

And enzymes are particularly easy to work with because their functions are so well characterised. Back in the 1960s an Enzyme Commission assigned a set of four numbers (‘EC numbers’) to each enzyme. There are six primary enzyme classes, each of which is divided into sub-classes and sub-sub-classes; the final number is a serial number that defines the enzyme’s substrate. So, for example, phosphoinositide phospholipase C is also known as EC; the 3 indicates that this enzyme is a hydrolase, the 1 that it acts on ester bonds and the 4 that it is a phosphoric diester hydrolase. The other top-level classes are the oxidoreductases (1); the transferases (2); the lyases (4); the isomerases (5); and the ligases (6). EC numbers define enzyme function rigorously, so referencing them in computer programs is straightforward.

Thornton and her group chose to focus on those enzymes that have a well-characterised catalytic function that is mainly involved in small-molecule metabolism. All enzymes with these characteristics were grouped into homologous superfamilies (that is, families of proteins with a clear evolutionary ancestor) and the members of each superfamily were annotated with EC numbers as a proxy for their function. For example, the superfamily of enzymes that are clearly related to phosphoinositide phospholipase C by structure and function includes not only enzymes classified as but also sphingomyelin phosphodiesterases D ( and phosphatidylinositol diacylglycerol-lyases ( The two phosphodiesterases have the same chemistry (as specified by the first three EC numbers) but act on substrates with very different shapes, while the chemistry of the enzymes differs significantly from the others.

In this example, comparing the structures of enzymes with the EC numbers and showed that active site residues involved in their reaction mechanism and the bound metal ion in each one that is necessary for catalysis superimpose very well, but the rest of the active site varied significantly to allow substrates with distinctly different sizes and shapes to bind. In contrast, the lyase has a similar-shaped active site to but no bound metal and different catalytic residues. In this case is likely that a single amino acid change, removing an aspartic acid residue and therefore a negative charge, has removed the ability of the enzyme to bind a metal ion and thus changed the reaction that the enzyme catalyses.

Enough data was available to group the enzymes in this superfamily, and in another 275, into phylogenetic trees to map out the evolutionary route taken within each superfamily and catalogue all possible evolutionary changes of function. Some of these are much more complex than the one outlined above. For example, the analysis showed that five classes of flavin-dependent mono-oxygenases with different chemistry were evolutionarily related. Here, the change in chemistry seems to have arisen not from a simple substitution of one amino acid for another but a change in the multi-domain architecture of the protein.

The group constructed an ‘EC exchange matrix’ from this data to show how many times each top-level EC class had changed into each other class during evolution. While most changes in chemistry left the top-level class – the basic type of the reaction – unchanged, every possible change had occurred at least once in evolutionary history. In fact, 11% of the changes catalogued were changes to top-level class. The diagram below illustrates this data in a series of six circles, one for each ‘original’ enzyme class, with the width of each strip indicating the number of transitions from one class to another: for example, the thick red strip going from the ‘top’ to the ‘bottom’ of the top left-hand circle illustrates that a lot of transitions from oxidoreductases (class 1) to transferases (class 2) have been observed.

An overview of functional evolution in enzymes. © Nicholas Furnham & Sergio Martinez Cuesta, EBI

They then looked in much more detail at the changes observed in the catalytic site of each superfamily during evolution, and found that active sites differ in ‘plasticity’. At one extreme there is the TIM barrel ‘superfold’, which is a scaffold that holds amino acids with different chemistry in similar positions to catalyse many different reaction types. At the other extreme, there are seven superfamilies in which the catalytic residues are 100% conserved. It is interesting to try to correlate sequence similarity with ‘functional similarity’, but this runs into the problem of how to define functional identity. With enzymes, any measure of functional similarity will include a contribution from the chemical similarity of the substrates and this is difficult to gauge, particularly as most of the best computational tools were written for commercial drug discovery and are therefore not in the public domain. Preliminary results suggest that there is some correlation, but it is much weaker than that between sequence and structural similarity.

Thornton summed up her lecture by re-stating that evolutionary changes to enzyme substrate specificity are much commoner than those to basic chemistry. Evolution has, however, given rise to an explosion in enzyme function. The EC system has catalogued a total of 2,994 unique enzyme functions, but only 379 different structures (CATH superfamilies) are known to have enzymatic activity. Most enzyme functions will therefore have evolved from another function, with each catalytic activity arising independently only a few times throughout evolutionary history. The evolutionary relationships within enzyme superfamilies are complex and there are many ways in which their function can diverge.

Much of the work Thornton presented has been described in a 2012 paper in PLoS Computational Biology; its lead author, Nick Furnham from the Thornton group at the EBI, is now a group leader at one of Birkbeck’s neighbouring colleges, the London School of Hygiene and Tropical Medicine. PPS students will learn much more about the structure, function and mechanisms of enzymes in section 10 of the course, ‘Protein Interactions and Function’.

The most recent paper from the Thornton group on this topic is:
Furnham N, Dawson NL, Rahman SA, Thornton JM, Orengo CA. Large-Scale Analysis Exploring Evolution of Catalytic Machineries and Mechanisms in Enzyme Superfamilies. Journal of Molecular Biology 428 (2016) p.253-267