Main
DNA encodes the information required for biological systems to carry out a broad range of functions. The understanding of this relationship has sparked inquiries across vast fields of biology and biological engineering as we read, edit and write the genetic information of organisms. Great advancements have been made toward these pursuits, from revolutions in DNA reading through long-read sequencing and the ability to generate terabytes of data from a single run1, to the breakthroughs in DNA editing with the major advancements in CRISPR–Cas technologies over the past decade[2](https://www.nature.com/…
Main
DNA encodes the information required for biological systems to carry out a broad range of functions. The understanding of this relationship has sparked inquiries across vast fields of biology and biological engineering as we read, edit and write the genetic information of organisms. Great advancements have been made toward these pursuits, from revolutions in DNA reading through long-read sequencing and the ability to generate terabytes of data from a single run1, to the breakthroughs in DNA editing with the major advancements in CRISPR–Cas technologies over the past decade2. However, writing DNA—the ability to construct DNA of any length, complexity or diversity—lags behind, as DNA oligo synthesis can reach only short lengths and DNA assembly of oligos and short DNA fragments is fundamentally limited3. Although the need for affordable, large and complex synthetic DNA has grown exponentially, DNA construction has not sufficiently improved to meet the scale and efficiency that is required for the age of synthetic genomes, biomaterials, massively multiplexed machine-learning protein language models and directed protein evolution.
De novo construction of DNA relies on synthetic single-stranded DNA (ssDNA) oligos as an input produced either through phosphoramidite synthesis12 or enzymatically using terminal deoxynucleotidyl transferase (TdT)13. Owing to the cyclical nature of the synthesis process and the limited coupling efficiency at each step, the accuracy and yield of synthesized oligos decreases exponentially with increasing length3. Consequently, de novo production of DNA larger than just a few hundred bases requires accurate DNA assembly of these short oligos together in the correct order.
All previous DNA assembly techniques, either those used by nature or those invented by humankind, rely on the native two-way junction (2WJ) between two complementary ssDNA overhangs (o and complementing o*) to guide the assembly, only differing in method of generating o/o* overhangs and whether to use double-stranded DNA (dsDNA) or ssDNA as an input3,4,5,6,7,8,9,10,11 (Extended Data Fig. 1). With this design, the complementation of o/o* directs the assembly and, consequently, o/o* are incorporated as part of the final assembled sequence. Owing to the duality in the function of o/o*, overhang sequences cannot be extensively optimized for mutual exclusivity to maximize assembly efficiency without changing the final synthetic sequence. This intrinsic paradox unavoidably results in misassemblies that compoundingly limit the efficiency, size and complexity of synthetic constructs.
To address this fundamental problem, we invented Sidewinder, a DNA-assembly technique based on the DNA three-way junction (3WJ) that can be reliably applied towards the construction of any DNA sequence without limitation. The 3WJ design enables the assembly to be directed by highly optimized sequences that are not present in the final product, facilitating robust assembly independent of the context of the assembled sequence. Here we demonstrate and characterize Sidewinder construction of synthetic DNA in a variety of contexts, such as large multifragment assemblies, highly complex DNA sequences, parallel assembly of distinct constructs, and combinatorial library construction with high diversity coverage across the length of a gene.
Establishing Sidewinder
Sidewinder is fundamentally different from all previous techniques as it relies on information encoded within a third distinct helix to direct assembly between DNA fragments through the formation of a 3WJ (Fig. 1a). The 3WJ is one of the many unique DNA conformations found among a wide variety of non-canonical or artificial DNA interactions used in DNA nanotechnology14 but it has not been used previously in DNA assembly. The third helix, hereafter the Sidewinder helix, orthogonally winds up on the side of the final assembled sequence. The Sidewinder helix is not part of the final assembled sequence and therefore removes constraints on where assembly occurs, what sequences are being assembled and how many DNA fragments can be assembled at once.
Fig. 1: Sidewinder uses 3WJs for true sequence-independent DNA assembly.
a, Sidewinder directs DNA assembly through the formation of the 3WJ. Sidewinder fragments contain short toehold pairs (t/t*) and long, unique barcode pairs (b/b*) (i). Sidewinder DNA assembly is directed by b/b* (ii). The association of b/b* brings together the two fragments, which are further stabilized by t/t* to form the 3WJ (iii). Toehold t* is ligated to the neighbouring fragment X, irreversibly connecting the two fragments (iv). The barcode helix b/b* is removed to restore the 2WJ, resulting in a scarless assembly (v). b, A two-fragment ligation requires complementary t/t* and b/b*. Fragment Y heteroduplex, tagged with fluorophore Cy3 (right), undergoes an assembly with one of four possible fragment X heteroduplexes (left) with either matching or mismatching toehold and barcode. Only the fragment X with both a complementary barcode and complementary toehold can be successfully ligated to the fluorophore tagged complex (lane 1). Mismatched toehold x and/or mismatched barcode y results in no ligation (lanes 2–4). c, All four reactions from b (lanes 1–4) and the control (C*,* fragment Y alone, matching t/t* and b/b* not applicable (NA)) were run on an unstained TBE-urea denature gel, which enables tracking of the migration of fluorophore-containing molecules only. The gel depicts the ligation efficiency through the difference in migration of unligated product (bottom band, control lane C) compared with the ligated product (top band), which is present only in lane 1. nt, nucleotide.
Sidewinder assembly fragments contain unique terminal secondary structures referred to hereafter as toeholds (t/t*) and Sidewinder barcodes (b/b*) (Fig. 1a). Toeholds t/t* are shortened exposed single-stranded sequences that are found in the final synthetic product—these can be thought of as analogous to 2WJ o/o* overhangs in other techniques (Extended Data Fig. 1). Sidewinder barcodes b/b* are exposed single-stranded sequences that constitute the Sidewinder helix in the 3WJ, are not found in the final product and are highly optimized for each specific assembly (Fig. 1a (i)). When Sidewinder fragments are mixed at a temperature of higher than the melting temperature (Tm) of t/t*, the increased Tm of the longer Sidewinder barcodes b/b* directs Sidewinder fragments to associate specifically (Fig. 1a (ii)). The Sidewinder barcodes b/b* wind up to form the Sidewinder helix, bringing together the complementary, but otherwise unstable at the reaction Tm, toeholds t/t* to further stabilize the 3WJ, leaving only a nick between the two Sidewinder fragments (Fig. 1a (iii)). The nick is then ligated, irreversibly connecting the two fragments together (Fig. 1a (iv) and Extended Data Fig. 2). This 3WJ assembly can then be further processed to remove the Sidewinder helix and scarlessly restore the conventional 2WJ, completing the construction of the synthetic DNA product (Fig. 1a (v)).
With this Sidewinder 3WJ assembly scheme, both complementary toeholds t/t* and complementary Sidewinder barcodes b/b* are required for successful assembly. To demonstrate the feasibility of Sidewinder, four separate two-fragment assemblies were set up with each possible combination of either matching or mismatching toeholds or barcodes (Fig. 1b). Successful conversion of unligated fragments to the ligated product was determined by tracking the migration of a fluorophore-tagged oligo in fragment Y on a TBE–urea denature gel (Fig. 1c). An upper band, which is indicative of ligation at the 3WJ, was seen only for the condition in which the untagged fragment X had both a complementary toehold and complementary barcode to fragment* Y*. All other combinations showed only the lower band, indicative of unligated fragments. Using Sidewinder barcodes b/b*, Sidewinder physically decouples the final assembled sequence from the instructions for assembly. This paradigm enables assembly conditions to be established that provide high exclusivity and specificity for assembly between DNA fragments.
Scaling Sidewinder
Sidewinder can be robustly scaled up to both ends of a large number of DNA fragments, enabling large multifragment assemblies without limitations of conventional methods such as, but not limited to, restriction enzyme recognition sequences or the need to shift junctions to accommodate for orthogonal overhangs for assembly. Despite Sidewinder being theoretically compatible with DNA fragments from any source, here we focus on the assembly of synthetic oligos (Fig. 2). To construct an entirely synthetic sequence using Sidewinder, the length of each Sidewinder fragment is determined by the maximum length of the input oligos; in this study, 120-mers were used due to quality and cost. To design the assembly oligos, the target sequence to be constructed is first bioinformatically split by choosing the positions of the toeholds (Methods). Then barcodes, either prevalidated for exclusivity15 or bespoke designed in-house using NUPACK16,17, are chosen for that particular toehold and combination of toeholds (Methods).
Fig. 2: Sidewinder reliably assembles large multifragment assemblies with high accuracy.
a, Sidewinder fragment i heteroduplexes are generated by annealing ssDNA barcode and coding oligos together to form a stable heteroduplex. b, Sidewinder fragments are individually processed as in a before being mixed together in the assembly reaction (i). The fragments associate with their proper assembly partner through the direction of their high-fidelity barcodes bi/ bi* and are ligated subsequent to the formation of the 3WJs, resulting in the 3WJ assembly (ii). All barcode oligos are either displaced or destroyed through DNA polymerase extension of primer pF, restoring the 2WJ throughout the assembly (iii). This conversion step can be integrated as a part of the selective PCR with primers pF and pR to further amplify the assembled Sidewinder product. c, DNA agarose gel depicting the PCR product comparing the industry standard DNA assembly technique from oligos (PCA) to Sidewinder with increasing assembly size and number of fragments*.* A segment of the LuxABCDE cassette was assembled with 5-, 10- and 20-piece assemblies for both techniques and a 40-piece assembly for Sidewinder only. d, Analysis of the 40-piece Sidewinder assembly Nanopore sequencing reads depicted as a pie chart coloured by the proportion of accurate assemblies (blue) and the proportion of artifacts (grey). e, Analysis of all possible combinations of ligated junctions in the 40-piece Sidewinder assembly Nanopore sequencing data, comparing the number of correctly and incorrectly ligated junctions.
Assembly fragments are then composed of two synthetic oligos. The top ssDNA oligo is considered the barcode oligo and contains Sidewinder barcodes on both ends. The bottom ssDNA oligo is considered the coding oligo and is complementary to the majority of the barcode oligo but shifted slightly to expose the toeholds at both ends of the fragment. The coding oligo is then phosphorylated individually and annealed to the barcode oligo using standard conditions customary to DNA nanotechnology18, resulting in the dsDNA Sidewinder fragment heteroduplex with the unique secondary structures desired at both ends of the assembly fragment (Fig. 2a).
The oligo annealing process is conducted for an arbitrary number of pairs of oligos to generate the Sidewinder fragments (Fig. 2b (i)). PAGE extraction can be performed on the Sidewinder fragments to purify away any unannealed oligos. The individually processed Sidewinder fragments are then mixed, self-assembled and ligated to compose the 3WJ assembly (Fig. 2b (ii)). After the completion of the 3WJ assembly, each of the coding oligos from each of the fragments has been ligated together to compose the uninterrupted synthetic sequence of the final target construct. In a single step, all of the barcode oligos are either displaced or destroyed through DNA polymerase extension of a primer using the now connected coding strand as a template, restoring the 2WJ throughout the assembly, thereby completing the construction (Fig. 2b (iii)). This DNA polymerase extension can be integrated as a part of the selective PCR to further amplify the assembled Sidewinder product.
Using Sidewinder, we first demonstrated robust assemblies of increasing size from 5, 10, 20 and 40 fragments to construct a segment of the LuxABCDE operon. Sidewinder produces a single, strong, target amplicon of the expected size in all reactions with no sign of misassemblies (Fig. 2c). To provide a reference for the impact of this feat, we set up analogous assemblies of the Lux operon with polymerase cycling assembly (PCA)3,4, Gibson assembly6, a 4 bp overhang ligation (analogous to Golden Gate assembly7) and a 10 bp overhang ligation (analogous to Sidewinder without the barcodes) (Extended Data Fig. 3a) for 5-, 10- and 20-piece assemblies. The fragments are then processed to mirror Sidewinder, including the final PCR amplification step (Extended Data Fig. 3b,c). All previous techniques tested fail beyond five-piece assembly, except for the 10 bp overhang assembly, which produces a clean ten-piece product but fails at 20 pieces (Fig. 2c and Extended Data Fig. 3d). We further applied Sidewinder to construct a distinct series of assemblies of an mGL and mScarlet fusion cassette; PCA failed again beyond a five-piece assembly, whereas Sidewinder succeeded for all reaction sizes (Extended Data Fig. 3e). Gel source data are provided in Supplementary Fig. 1.
Although the gels demonstrate a qualitative performance of the assembly, Nanopore sequencing was conducted on our amplicon to quantitatively confirm robust assembly. All Nanopore reads were analysed and assigned to categories on the basis of the characteristics of the read (Methods). We first conducted a fragment-level analysis by compiling and manually analysing Nanopore sequencing reads from the 40-fragment Sidewinder assembly of LuxABC. Out of 609 reads, 12 reads were identified as primer mispriming, constituting the 1.97% PCR artifacts; 2 reads (0.33%) were identified as sequencing artifacts; and 6 reads (0.98%) were identified as barcode artifacts. Sidewinder products constitute all remaining reads, composing 589 out of 609 (96.72%) of the total reads. Notably, 100% of those Sidewinder products were correctly assembled 40-piece constructs with all fragments in the correct order (Fig. 2d). All assignments of the raw reads can be found in the Source data. By contrast, out of over 5,000 reads for the corresponding 20-piece PCA reaction, not a single read was a correct assembly, with the largest partial assembly having only 12 fragments (BioProject: PRJNA1201800).
We further applied a separate analysis pipeline to the dataset to reduce bias from assigning reads at the fragment level. We searched the entirety of the unfiltered raw sequencing data to identify all instances of ligated 3WJs that could result from either correct or incorrect ligation. Together, we identified 22,533 ligated junctions in these sequencing data. All 22,533 were correctly assembled junctions with zero observed misligations (Fig. 2e and Extended Data Fig. 5).
Sidewinder’s large multifragment assemblies enabled by the exclusivity and fidelity of the 3WJ interactions lift the current limitations on the number of long oligos that can be assembled in a single reaction such that the main limitation to the construct size is shifted to errors in oligo synthesis and the likelihood of finding a single mutation-free clone.
Sidewinder constructs complex DNA
In addition to reliably assembling a larger number of fragments far beyond previous methods, Sidewinder enables the construction of complex DNA sequences that are otherwise difficult to assemble. First, we assembled the native coding sequence of the human protein apolipoprotein E (APOE), which has a high proportion of guanine and cytosine (GC) bases across the gene. APOE regulates cholesterol transport and maintains lipid homeostasis in the brain, and has allelic polymorphisms associated with increased risks of Alzheimer’s disease and cerebral amyloid angiopathy19. Its coding sequence is 70% GC with segments of the gene having as high as a 95% GC content (Fig. 3a). With a 12-piece Sidewinder assembly, we produced a single clean product (Fig. 3b), which, when sequenced, has 99.89% Sidewinder products across over 4,500 Nanopore reads with again 100% of these being correct assemblies (Fig. 3c). Applying our junction analysis pipeline identifies a total of 50,636 ligated junctions with all 50,636 being correctly ligated (Fig. 3d), demonstrating a robust capacity for assembling high GC content sequences through the 3WJ.
Fig. 3: Sidewinder reliably assembles complex DNA sequences with high GC content and high repeats.
a, Graphical representation of the local GC content of a 20-nucleotide sliding window in the coding sequence of human APOE (teal) compared with the GC content of the 10-piece assembly of the Lux cassette (grey). b, DNA agarose gel depicting the APOE high-GC assembly after PCR with a single strong target band. c, Nanopore sequencing analysis of the high-GC Sidewinder assembly depicted as a pie chart coloured by the proportion accurate assemblies (teal) and the proportion of artifacts (grey). d, Analysis of all possible combinations of ligated junctions in the high-GC assembly Nanopore sequencing data, comparing the number of correctly and incorrectly ligated junctions. e, Self alignment of a segment of LuxA (i) contrasted to the assembled highly repetitive segment of the h-fibroin protein (ii). Places where at least 8 bases repeat are plotted according to the position in the sequence (x axis) and where it repeats (y axis). The location of the identical toehold t/t* is highlighted in dark purple corresponding to their position in the assembly (iii). f, DNA agarose gel depicting the identical toehold assembly after PCR (top band), as well as minor byproducts (bottom band) resulting from mispriming between fragments F1 and F5 during the PCR step and not from misassembly during the Sidewinder reaction. MM, molecular mass. g, Nanopore sequencing analysis of the gel-extracted identical toehold Sidewinder assembly. The pie chart is coloured by the proportion of Sidewinder products (purple) and the proportion of PCR and sequencing artifacts (grey). h, Analysis of all possible combinations of ligated junctions in the identical toehold assembly Nanopore sequencing data, comparing the number of correctly and incorrectly ligated junctions.
In addition to GC-rich sequences, a segment of the highly repetitive silk protein h-fibroin from Glyphotaelius pellucidus was constructed. Silk proteins are of interest owing to their biodegradability and their highly repetitive sequences, which give rise to their unique mechanical properties and potential applications as a biomaterial20,21,22. This segment of h-fibroin was selected to demonstrate Sidewinder’s ability to handle extremely repetitive DNA sequences, which are notoriously difficult to reliably assemble23,24. To further push the limits of this assembly, the Sidewinder fragments used in this construction were designed to use identical t/t* toehold sequences, which are nested within regions of the construct that are dense with repeats (Fig. 3e and Extended Data Fig. 6a). Without Sidewinder’s signature 3WJ b/b* barcodes, this would be analogous to a multifragment assembly in which the o/o* overhangs are completely identical to each other, the assembly of which would be intrinsically impossible to direct accurately using previous methods.
Sidewinder’s high specificity for proper ligation at the 3WJ enables this five-piece assembly with identical toeholds, resulting in the successful construction of a strong assembly product despite the extreme reaction conditions (Fig. 3f). The DNA agarose gel depicting a strong target band for the five-piece identical toehold construct shows only a single minor byproduct, which appears as a result of a PCR artifact as demonstrated by the same 200 bp byproduct appearing even when the 3WJ is not ligated or only fragment 1 (F1) and F5 are provided as PCR template without ligation. Sanger sequencing of this byproduct indicates that it is the mispriming of the F1 toehold to the F5 toehold, and it was preferentially amplified due to PCR’s bias towards amplification of shorter products25 (Extended Data Fig. 4).
Gel extraction of the correct-size band was used for Nanopore sequencing. The results indicate a highly specific assembly and amplification of the proper five-piece product with an extremely high fidelity—99.52% of Nanopore reads are Sidewinder products, 99.19% of which are correct assemblies and only 0.81% of which show evidence of misligation during assembly (Fig. 3g). The junction analysis identifies only 31 misassembled junctions out of a total of 13,416 sequenced junctions, or 99.77% accuracy per junction for Sidewinder assembly in this deliberately extreme reaction (Fig. 3h).
To provide a reference point for the difficulty of these assemblies, the four previous assembly methods (PCA, Gibson assembly, 4 bp overhang analogue to Golden Gate and 10 bp overhang) were also applied to both of these complex sequences. Analogous fragments were designed using oligos that are compatible with each assembly method as previously described (Extended Data Fig. 3). Again, in all instances for both assemblies, the previous assembly methods do not produce a clean band of the target size (Extended Data Fig. 6b,c). These comparison experiments demonstrate the advantage of Sidewinder’s 3WJ paradigm in assembling complex sequences in addition to large multifragment assemblies.
Sidewinder one-pot parallel assemblies
Sidewinder’s fidelity enables multiple assemblies of distinct constructs simultaneously in the same reaction tube. This could be particularly applicable in the field of AI-facilitated DNA and protein design in which in silico methods generate multiple competing designs that can be difficult and costly to synthesize and evaluate simultaneously in the physical world26,27,28,29.
We combined the Sidewinder fragments for three distinct ten-piece assemblies each encoding different colorimetric phenotypic markers: mScarlet, mGL and the chromoprotein aeBlue30. Sidewinder assemblies were conducted for each of the constructs simultaneously in the same reaction tube (Fig. 4a). Through selective amplification using either specific primer pairs for each construct or a universal primer pair for all three constructs, any of the three individual constructs or the pool of all three constructs can be dialled out producing a single, clean target band in all cases (Fig. 4b).
Fig. 4: Sidewinder independently assembles multiple distinct constructs in one pot with high fidelity.
a, Sidewinder fragments for three ten-piece assemblies corresponding to the phenotypic markers mScarlet, mGL and aeBlue were mixed together in the same reaction tube, where they were assembled in parallel. This reaction mix was then used as the template for a PCR reaction that can individually amplify target constructs or universally amplify the pool of all constructs simultaneously. b, DNA agarose gel showing the final PCR products for each of the individual constructs, as well as all three simultaneously (Pool), with a single strong target band. c, Nanopore sequencing analysis of the individual and pool assemblies. The pie charts are coloured by the proportion of Sidewinder products corresponding to mScarlet (red), mGL (green) and aeBlue (blue), as well as the proportion of PCR and sequencing artifacts and incorrect assemblies (grey). d, Analysis of all possible combinations of ligated junctions for the individually amplified construct’s Nanopore sequencing, comparing the number of correctly and incorrectly ligated junctions. e, Assemblies were cloned and transformed. The pool plate was coloured using superimposed images of the plate under ambient light and a blue light.
Quantitative assessment of the assemblies using Nanopore sequencing analysis of the pre-clonal Sidewinder PCR products indicates a high fidelity for the proper assembly product across each of the different constructs, with 95.19%, 96.23% and 95.81% Sidewinder products for mScarlet, mGL and aeBlue, respectively, with 99.9% of these being correct, exclusive assemblies for each construct (Fig. 4c). For the combined pool of all three constructs, we see a distribution of reads assigned to each of the three constructs with low rates of mispriming and misligation (Fig. 4c). To better simulate pooled conditions, the parallel assembly Sidewinder fragments were not PAGE-extracted before assembly, which is the likely cause for the observed increase in mispriming. Junction analysis shows very low rates of misligation at the 3WJ for each of the individually amplified samples with just 2, 1 and 3 misligated junctions out of over 23,000 junctions for each construct (Fig. 4d).
Escherichia coli transformation of each dialled-out parallelly assembled construct yielded only clones of the expected colour (Fig. 4e). Transformation of the pool yielded a distribution of all three expected phenotypes. A set of 56 non-coloured clones was further characterized by PCR and sequencing to extrapolate the assembly accuracy for the pooled constructs post-transformation (Extended Data Fig. 7). The in vivo characterization supports quantitative assessments of the Nanopore sequencing data, which further align with the qualitative clean gel bands, together demonstrating the robustness of the Sidewinder assembly.
We see higher rates of misligation at the 3WJ for the parallel and h-fibroin assembly, designed using a set of pregenerated orthogonal barcode sequences15, compared with the high-GC assembly and 40-piece assembly, which had bespoke barcode designs using the NUPACK Python package17 (Extended Data Fig. 8). Thus, the minor crosstalk observed between fragments of the different genes in the parallel assembly is probably due to the suboptimal design of the hand-picked barcode pairs and can be easily rectified in future experiments.
Sidewinder constructs DNA libraries
Sidewinder can also assemble defined diversities across a large number of positions along the entire length of a DNA sequence to construct combinatorial libraries. In a combinatorial library, each variable position is diversified and assembled into a synthetic sequence with other diversified positions through DNA assembly. These libraries are then sorted, selected or screened for desired functions12,31,32. This approach is particularly useful in protein engineering in which specific codons are varied at known or predicted residues to achieve a modified or improved protein function. Current methods for constructing combinatorial libraries using existing DNA assembly technologies can be limited in various aspects, such as the theoretical library size, coverage, number of positions diversified simultaneously and accuracy of assembly during construction33,34,35,36,37,38,39.
We used Sidewinder to generate a combinatorial library by designing our assembly fragments to divide the gene for the fluorescent protein eGFP into a ten-piece Sidewinder assembly, whereby predefined codon variations were combinatorially diversified across 17 positions across the entire gene, yielding a theoretical library size of 442,368 possible mutation profiles (Fig. 5a,b). The Sidewinder library assembly resulted in a single strong target band (Fig. 5c) that was then cloned into a plasmid and transformed into E. coli cells. Fractions of the library both before and after cloning were analysed using PacBio sequencing for high-fidelity, single-molecule long-read sequencing40.
Fig. 5: Sidewinder assembles large combinatorial libraries with high coverage.
a, Sidewinder library fragments generated by annealing a barcode oligo to an arbitrary number of coding oligos containing predefined mutations (coloured diamonds). b, Schematic of the ten-piece assembly for the fluorescent protein library (position not to scale). c, DNA agarose gel depicts PCR product of library assembly with a single strong target band. d, PacBio sequencing analysis of the pre-clonal Sidewinder library. The pie chart shows the proportion Sidewinder products (pale orange), partially aligned products (subset of fragments 1–10 in the correct order) and PCR and barcode artifacts (grey). e, Junction analysis of the library PacBio sequencing depicting ligations at the 3WJ. f, Violin plot (n = 646 positions across 3,079,525 molecules) and box and whisker plot (min = 0.9947649; max = 0.9999540; quartile 1 (Q1) = 0.9993220, median = 0.9988624, Q3 = 0.9979082, lower bound = 0.9947649, upper bound = 0.9998737) showing the distribution of per-base accuracies for the oligos used in the assembly, excluding intended library mutation positions and the flanking bases. g, The mutation diversity at the codon level showing pre-cloning experimental distribution (saturated, left) and corresponding theoretical codon distribution (desaturated, right). h, Mutation diversity at the gene level showing the proportion of PacBio reads assigned to each of the possible mutation combinations pre-cloning (pale orange) and post-cloning (orange). i, The sequence space of all possible mutation combinations (grey) and the mutation combinations represented from the pre-clonal sequencing (pale orange), post-clonal sequencing (orange) and those combinations seen in both (dark orange). j, The proportion of all observed variants in the pre-clonal and post-clonal sequencing plotted relative to one another. k, The percentage of diversity achieved considering every combination of N mutation positions across the 17 diversity positions of the Sidewinder library. l, Fluorescence area versus height plots, showing populations positive for blue, green, yellow and red fluorescence. The proportion of hits identified over the threshold is labelled for each colour.
For the pre-clonal Sidewinder assembly, 98.88% (3,832,803 reads) were correct ten-piece assemblies, 0.41% were partially assembled with the correct connection of a subset of the ten pieces, and 0.71% were composed of PCR and barcode artifacts (Fig. 5d). Reassuringly, the high-fidelity PacBio data are consistent with previous Nanopore data, further supporting the robustness of the Sidewinder assembly in all demonstrated circumstances. Further analysing the PacBio dataset for all instances of misligated junctions revealed only 37 misligated junctions out of 35,542,842 total observed junctions (Fig. 5e). This corresponds to a misconnection rate at the 3WJ of just 1 in 960,617 (Extended Data Fig. 8).
The median error rate for the oligos used for the assembly was calculated to be 10−2.943 (1 error in 877 bases, or a 99.886% chance of a base being correct) (Fig. 5f). We see that, for these oligos, the per-base accuracy decreases with increased oligo length but due to the required ligation at the 3WJ, accuracy increases across assembly junctions (Extended Data Fig. 9a). These observations suggest that Sidewinder does not introduce additional errors during the assembly and may subtly improve oligo fidelity. Owing to Sidewinder’s high-fidelity for multifragment assemblies, shorter DNA oligos composing a higher number of Sidewinder fragments may provide an advantage in synthesizing nucleotide-perfect genes. On the basis of the observed per-base error rate, an estimated 44.14% of the eGFP variants constructed are expected to be nucleotide-perfect genes. This theoretical value is compared to a true value of 40.88% nucleotide-perfect post-clonal genes in the PacBio sequencing data. This is contrasted to just 8.2% nucleotide perfect clones reported for a library of a 1 kb gene using PCA33.
The diversity of the combinatorial library can be assessed by analysing the mutation profiles (identity of the deliberately encoded mutations) at the codon level, fragment level and gene level to compare the theoretical and experimental distribution of mutations at each level of library. At the codon level, every codon mutation profile is represented in the library with an average absolute deviation of just 8.23 percentage points from the theoretical proportion of occurrence for that codon (Fig. 5g). At the fragment level, all 82 fragment mutation profiles are represented in the final library, in which generally the distribution of mutation profiles seems to have higher variance for fragments that had a higher number of possible mutation profiles, such as fragment 4 (n = 36), compared with fragments with less possible diversity, such as fragment 2 (n = 2) (Extended Data Fig. 9b). Furthermore, there does not appear to be a decrease in the likelihood of incorporation of a coding oligo when there are more mismatches to the fragment’s barcode oligo (Extended Data Fig. 9c), except for when those diversity positions appear in closer proximity to the junction as with diversity position 15 (Fig. 5g).
At the gene level, out of the 442,368 possible mutation profiles, we observed a nearly identical distribution of occurrences in the mutation profiles of the pre- and post-clonal sequencing and achieved a library coverage of 326,733 and 386,978 variants, respectively, for a combined total of 405,778 variants (307,933 overlap) (Fig. 5h,i). By plotting the proportion of occurrences of the mutation profiles, which are represented in both the pre- and post-clonal sequencing, we see a general trend in which the more highly represented clones before cloning remain highly represented after cloning for this gene (Fig. 5j). The 405,778 variants observed correspond to a total library coverage of >91.7% of the 442,368 possible combinations of the 17 mutation positions. Within these mutation profiles, we observe nearly every possible combination of as many as 15 mutation positions (>99.4%) in the library with continued high representation all the way through every possible combination of 17 positions (Fig. 5k), which is an improvement over a recent comparable construction with Golden Gate34.
We predict that Sidewinder may be suitable for libraries of exceedingly large sizes, primarily limited by the fidelity of the oligos used and the ability to