I have lost count of the number of times over the years that I’ve said “Huh, I didn’t think mass spec could do that”. So you’d think that I would be used to this by now, but apparently not, because that was my exact reaction to this new paper. It’s from a team of groups at Leiden, Utrecht, and Jena, and they report a “self-encoded library” technique for some pretty large-scale screening.
It should be noted up front that there are no tags, labels, or isotopic enrichments involved in this. The paper demonstrates screening libraries up to about 500,000 compounds in their native state. These are produced by pretty straightforward solid-phase synthesis techniques, and the paper shows several of them using reactions like ...
I have lost count of the number of times over the years that I’ve said “Huh, I didn’t think mass spec could do that”. So you’d think that I would be used to this by now, but apparently not, because that was my exact reaction to this new paper. It’s from a team of groups at Leiden, Utrecht, and Jena, and they report a “self-encoded library” technique for some pretty large-scale screening.
It should be noted up front that there are no tags, labels, or isotopic enrichments involved in this. The paper demonstrates screening libraries up to about 500,000 compounds in their native state. These are produced by pretty straightforward solid-phase synthesis techniques, and the paper shows several of them using reactions like amide formation, SnAr amine displacement, heterocyclic condensations, palladium-catalyzed couplings and so on. As the world well knows by now, you can make an awful lot of compounds rather quickly through such techniques, and since this isn’t DNA-encoded library technology (where you of course have to use reactions whose conditions are compatible with the oligonucleotide barcode tags), your toolbox is rather large.
These compound libraries are cleaved from the solid beads and then screened against immobilized protein targets using pretty standard affinity-driven techniques to winnow down to the most potent binders. That takes you up to the hit ID step, and this is where traditionally you’ve needed some sort of Secret Sauce to be able to figure out what your hits really are. Any reasonably sized library made by these combinatorial techniques is going to produce a lot of compounds with overlapping molecule weights, so just straight mass spec will not be enough to decide that question. You can use various isotopic labeling schemes to make the signal/noise better, and of course there’s the aforementioned DNA encoding. With DEL you have a unique DNA sequence attached to every single compound, and you use the terrifying powers of PCR and modern sequencing to track down and reveal very minute quantities of the best binders. (That’s why DEL collections can easily get up in to the tens of millions of possibilities per run).
In this case, though, the authors do some preliminary work on a few hundred typical compounds from their libraries, and look at MS-MS fragmentation patterns. They build up a flow-chart library of the most common fragment events for given compound classes, and the resulting software (COMET, the Combinatorial Mass Encoding Tool) looks for hits that have both one of the predicted product masses and an associated fragment mass that matches the software’s predictions. As is usual with such systems, you’re generally willing to throw away a certain number of real hits if you can eliminate a much greater pile of false positives by doing so. But this is the part that I’m surprised works as well as it does! The combination of mass spec hardware throughput and ferocious computational power on the back end comes through again.
The paper demonstrates the effectiveness of this system with a test case (good ol’ carbonic anhydrase) and a much more challenging target (flap endonuclease 1). The former seems to have worked very well, but their initial libraries came up blank against that latter target. That's not really a surprising result, since it doesn’t have much of a small-molecule binding pocket, but a focused 4000-compound library incorporating structural features with a better chance of hitting such a dsDNA binding site yielded two sub-micromolar hits. Overall, it looks like you really can get away with this!
This idea looks particularly promising for focused-library type screening, I would think. The wide variety of chemisty available and the ability to quickly retool to different scaffolds is an advantage - you can get moving on this sort of thing without even having to work up a traditional assay based on FRET or other fluorescent/luminescent methods, and the readout is a very direct one. The authors estimate that under favorable conditions you could push the screening library size up past a million, which isn’t bad, but remember that these are going to be a million broadly related compounds. For that sort of screening you might still be better off with DEL, especially if you already have some of those libraries built (and most especially if you already have experience with that screening technique, which does have its idiosyncracies). A real advantage here is that the chemistry opportunities are much wider and there are far fewer steps in the hit ID part of the process. You do have to put in some work training the COMET system to recognize fragmentation patterns, but that doesn’t look too bad (and that data will be useful for the lifetime of that particular screening set).