12 Comments
User's avatar
Patrick Boyle's avatar

FWIW we did this back in my day at Ginkgo - especially if you always sequence winners. Often stuff coming out of synthesis was sequenced by default - if some library members weren’t perfect we’d log the mutations and move them forward anyway rather than waiting for another round of synthesis just for a few stragglers. Lots of 80/20 decisions in keeping a project moving forward quickly

Daniel Goodwin's avatar

A salty/speculative riff here could be that molecular biology was founded by physicists - we needed their incredible rigor to understand the systems with the tiny little light that we were able to shine on it. That was the “I don’t believe a word of it!” Days led by Max Delbruck et al

But this essay pushes on the idea that our limitation is different today, so the leading mindset is different. “Extreme throughput over extreme precision” def shoves out the physicists and welcome the engineers

Armand B. Cognetta III's avatar

It's been a while since I did any cloning myself, and it was never my fav so I love the idea of making it faster! But IMO for most of the proteins people work on this idea is flawed for a few reasons:

1. GFP fluorescence is too reductionist of a model to decide whether most point mutations are deleterious or not. Real proteins people study can rely on hundreds or thousands of interactions to perform their function, many of these rare edge cases, which can easily shift from single mutations in ways that are not obvious. For enzymes a point mutation might change the catalytic rate for one substrate but not another. Point mutations can affect protein stability, half-life, sub-cellular localization, drug binding, etc. We are very far from being able to build a model that can broadly predicts this.

2. Cloning, while annoying some times, is usually a relatively small part of the study of a given protein. Science is hard enough as it is without adding extra noisy variables just to save a few bucks.

3. >"Selection: if a construct carries a resistance marker and the host survives, the plasmid is intact."

In practice this is often not true.

4. Once a lab finishes a cloning project it is often stored for future use. Every time they re-use they'd have to be ok with new mutations cropping up.

I'm sure there are use cases for this, but for approximately zero of the hundreds of cloning projects I've been tangentially involved in in my career would I be ok with moving forward with constructs that contain random mutations.

Symmetrial's avatar

Tolerance models, great! Variant atlas, cool.

What is the reason it needs to be an AI model that predicts whether a mutation is acceptable?

Why can’t it be a purpose built statistical model?

Not sure why you introduced the term “vibe cloning” at all. Except to sound hip.

AI slop is not “perceived as low quality”, it is low quality. It’s also not “a failure mode and something to be filtered out” that definition belongs to AI hallucination - slop and hallucination are not synonymous.

Mike Minson PhD's avatar

Very rarely was I ever in need of screening more colonies because of a single point mutation. It happened maybe one time in the course of cloning several thousand plasmids. And that case was because the ordered fragment carried the mutation and needed to be reordered. Cloning itself is not an error prone process. So if your solution to addressing cost is by allowing slop, you’ll address less than 0.01% of the problem. On the other hand if you address the base cost of sequencing, OR innovate on parallelization, then you can actually start to reduce the cost of cloning. Additionally a researchers time is more expensive than the cost of a single failed experiment. They burn about 1000$/day

An easy way to reduce costs of sequencing is to use barcoded primers during colony pcr, pool the amplicons, and send the entire lot for one nanopore run (15$ at plasmidsaurus) and then use the raw data to demultiplex your stuff. That way you pay for one reaction, and then parallelize to get more out of the method.

Saurabh Dalvi's avatar

For sequence variations which are permissible across proteins for that we need huge chunk of data I feel but it'll be interesting to do computationally

Charlotte OBrien Gore's avatar

Not sure I agree that the sequencing is the longest part... waiting for bacteria to grow however! Also is shortening cloning times (skipping seq) in an attempt to keep up with faster and faster timelines going to lead to compromises on quality? I also agree with other comments, some cloning needs to be hyper specific and using GFP as an example is perhaps too reductionist.

I like the general thinking though, always pro reducing experiment time lines!

Madan's avatar

cool idea but it makes me uneasy 😅 I'd be paranoid about whether that harmless mutation might be doing something we don't know

zach hensel's avatar

"If your GFP carries a Val→Ile substitution at position 117..."

Position 117 is Asp. GFP probably tolerates a number of substitutions here, but also I suspect based on the structural context that substitutions would probably have significance in many experiments.

And a different semiconservative valine substitution, V68L (one of the mutations in GFPmut2 and a large fraction of engineered GFP variants), would be important to describe as probably significant rather than be buried in the methods or supplement.

I'm on board with sequence-error-tolerant experimental workflows (which was the standard not very long ago), but I'm more careful in my assumptions about how likely errors are to be significant. It certainly can be a good way to save a bit of time with some organization.

FWIW, my approach when cloning changes plasmid size is: (1) colony PCR; (2) start next step and send a single colony for sequencing in parallel; (3) go back to the positive colonies on the rare occasions that the sequence isn't perfect, if it still makes sense to do so.

Tina Austin's avatar

this is so fascinating

Hn's avatar

You should check out the work of Rama ranganathan. His lab does a lot of stuff, including pretty much exactly what you propose here.

Some thoughts regarding cost of sequencing: is it more expensive to verify that your construct has the sequence you expect, or to run subsequent experiments based on trust that it will be okay? Often times the cheaper route is way more expensive in the long run. Interesting essay

Mbwanga Sambata's avatar

Fairly good models of mutational impact already exist, especially for coding sequence (one simple model is called a "codon table"), and researchers already accept imperfect sequences depending on the nature of the project.

Not every sequence will be perfect in a high throughput setting, plus it doesn't matter, esp because you usually have some redundancy built in and because you'll probably validate the most interesting hits. Conversely, it would be foolish not to do due diligence when plasmid is the single point of failure in an expensive and time consuming experiment.

Also sequencing costs are already cheaper and will probably continue decreasing (Quintara does $5/plasmid for simple plasmids) and no one I know picks 10 clones for sequencing (maybe 2 depending on the complexity, plus you can do colony PCR if it's very unreliable).