Enhanced spatial clustering of single-molecule localizations with graph neural networks

Introduction

The identification and analysis of clusters, i.e., data points sharing some similarity, are crucial across many scientific disciplines and technological applications. Clustering algorithms facilitate pattern recognition, data compression, and information retrieval, enabling researchers to uncover hidden structures within complex datasets. A notable application of clustering algorithms is the spatial analysis of single-molecule localization microscopy (SMLM) data1,2,3. Super-resolution techniques, such as stochastic optical reconstruction microscopy (STORM)4, photoactivated localization microscopy (PALM)5, points accumulation for imaging in nanoscale topography (PAINT)6, and their variants, generate spatial point clouds, where each point represents the localization (typically with precision ≲20 nm) of an individual molecule7. These datasets can contain millions of localizations, which allows the application of statistical methods to provide detailed insights into the spatial organization of molecules within biological samples (Fig. 1a). Clustering SMLM data is crucial because it helps identify and group molecules that form specific cellular structures, such as protein nanoclusters8,9,10, chromatin clutches11, focal adhesions12, or nuclear pore complexes13. By clustering these points, researchers can infer molecules’ functional organization and interaction patterns under different conditions or treatments14,[15](https://www.nature.com/articles/s41467-025-65557-7#ref-CR15 “Keary, S., Mateos, N., Campelo, F. & Garcia-Parajo, M. F. Differential spatial regulation and activation of integrin nanoclusters inside focal adhesions. bioRxiv https://doi.org/10.1101/2023.12.16.571970

(2023).“), which is essential for understanding cellular processes at a molecular level.

Fig. 1: Overview of the MIRO-based clustering workflow.

a Illustration of the SMLM image acquisition process for molecules organized in ring-shaped clusters. Molecules appear stochastically as bright fluorescence spots in different frames. The fluorescence intensity profile (inset) is used to precisely determine the molecular centroids. b The cumulative localizations from all frames are then combined to generate the experimental point cloud. c The molecular localizations are represented as a graph that is encoded in a latent representation ({{{\mathcal{G}}}}), combined with a hidden graph ({{{{\mathcal{G}}}}}_{{{{\rm{h}}}}}^{{k}), and recurrently processed through the MIRO block, ({{{\mathcal{M}}}}). The hidden node features are used to minimize the loss functions (e.g., ({{{{\mathcal{L}}}}}_{{{{\rm{spot}}}}}) and ({{{{\mathcal{L}}}}}_{{{{\rm{ring}}}}})) calculated at each step, providing flexibility to use different ground truths across steps and thus enabling the network to collapse structures at various scales. Finally, the collapsed localizations are postprocessed through a conventional clustering algorithm to group those within the same structure. d The core operations of the MIRO block include the concatenation of the input graph ({{{\mathcal{G}}}}) with the hidden graph ({{{{\mathcal{G}}}}}_{{{{\rm{h}}}}}}{k}). The input graph provides semantic information (e.g., the position of localization forming the same cluster, represented by the shaded circle). In contrast, the hidden graph ({{{{\mathcal{G}}}}}_{{{{\rm{h}}}}}^{{k}) captures relational information between adjacent localizations (represented by the purple area). Information is propagated to generate an updated hidden graph ({{{{\mathcal{G}}}}}_{{{{\rm{h}}}}}}{k+1}), which is passed together with ({{{\mathcal{G}}}}) to the next iteration of the MIRO block. A decoder produces displacement vectors from hidden node features that, when summed with the localization coordinates, shift localizations belonging to the same cluster toward a common center, leaving background localizations unaltered.

However, clustering SMLM data presents several challenges. Inherent localization noise, such as false positive identifications, can obscure true molecular patterns. Molecule undercounting and overcounting, where the same molecule is either not detected or detected multiple times due to photophysical effects, can distort the true distribution of molecules9. Molecular structures can be closely spaced and even overlapping, resulting in a high density of localizations that complicates the identification of distinct clusters.

Several algorithms have been specifically proposed for this task16,17,18,[19](#ref-CR19 “Scurll, J. M. et al. StormGraph: a graph-based algorithm for quantitative clustering analysis of diverse single-molecule localization microscopy data. bioRxiv https://www.biorxiv.org/content/early/2020/10/20/515627

(2020).“),20,21,22 and their performance has been recently assessed23. Among the methods evaluated in ref. 23, density-based spatial clustering of applications with noise (DBSCAN)24, one of the most popular algorithms used for SMLM data, has been shown to be adaptable to diverse clustering conditions and to provide close-to-optimal performance, comparable to those obtained with the topological mode analysis tool (ToMATo)21 and kernel density estimation (KDE). DBSCAN was also found to be the most robust to multiple blinking23. More recently, DBSCAN has been shown to achieve significantly higher scores than HDBSCAN25 and OPTICS26 across different cluster types27. However, DBSCAN’s performance is highly dependent on the choice of its two parameters: the maximum distance between two points for them to be considered as part of the same cluster (ε); and the minimum number of points that must be within a point’s ε-neighborhood for that point to be considered a core point and thus form a cluster (minPts). These parameters determine what constitutes a cluster and what constitutes noise. Their choice can significantly affect the resulting clusters, and they require careful dataset-specific settings based on heuristic rules18,28 or further analysis27,29.

Moreover, biological clusters corresponding to supramolecular organizations often have non-trivial shapes, such as focal adhesions12 or nuclear pore complexes13. These structures pose additional challenges due to their irregular or complex geometries. Traditional clustering methods work well with symmetric, simply connected, or convex shapes, but often fail with non-symmetric, irregular, or highly complex distributions. These limitations highlight the necessity for improved clustering techniques that can extract meaningful information from SMLM data, ensuring accurate and reliable insights into the molecular architecture of biological samples.

In this paper, we introduce a novel supervised approach to enhance the versatility of clustering algorithms. Our method, MIRO (Multifunctional Integration through Relational Optimization), employs a few-shot (or one-shot) geometric deep learning framework based on recurrent graph neural networks (rGNNs) to learn a transformation that squeezes elements of complex point clouds around a common center (Fig. 1b, c). To achieve this, MIRO assumes that clusters’ general structure and spatial relationships are preserved within a given dataset and uses relational information to make complex data more suitable for conventional clustering techniques. In this way, MIRO transforms the point clouds so that methods for complete clustering (i.e., assigning every localization to a specific cluster or to the non-clustered group23) can achieve enhanced performance, as we demonstrate for DBSCAN on a wide range of datasets with varied cluster shape and symmetry. By enhancing the spatial separation between localizations in adjacent clusters, as well as between clustered and background localizations, MIRO inherently simplifies the parameter selection for DBSCAN and similar methods. Additionally, the recurrent structure of MIRO facilitates a multifunctional representation framework, enabling the simultaneous handling of heterogeneous analysis tasks, such as multiscale clustering, clustering of differently shaped structures, and node-level classification. Unlike traditional pipelines that treat these tasks separately or require manual tuning, MIRO learns a flexible representation that supports diverse objectives in a unified and scalable manner. This multifunctional capability significantly expands the range of biologically relevant insights that can be extracted from a single experiment.

Following a recent benchmark study23, we provide a comprehensive evaluation of MIRO’s performance across various SMLM experimental scenarios, demonstrating its transformative potential for clustering applications. Furthermore, our analysis extends beyond this benchmark, showing that MIRO significantly improves clustering performance in complex and irregular data scenarios.

Results

MIRO workflow

MIRO uses relational information to transform point clouds to bring together points that belong to the same cluster. It achieves this by using a rGNN, which incorporates several innovative aspects in the architecture, operational mechanisms, and training process, as described here. A detailed description is provided in the “Methods”.

MIRO is built on an rGNN architecture[30](https://www.nature.com/articles/s41467-025-65557-7#ref-CR30 “Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://doi.org/10.48550/arXiv.1806.01261

(2018).“). The input to the neural network is a graph representation of individual molecular localizations derived from SMLM experiments31. As shown in Fig. 1a, these localizations are obtained from multiple fluorescence images of the same field of view (FOV), with each image capturing a sparse number of simultaneously emitting fluorophores. Importantly, fluorophores’ emission is stochastic; therefore, a given fluorophore can be detected in multiple frames or not at all. The images are processed to extract the centroid positions of bright features corresponding to molecular localizations. These positions are then drift-corrected and filtered to remove low-quality localizations. Additionally, localizations that are too close together within the same field are discarded, while those that appear in consecutive frames are merged to ensure an accurate representation of distinct molecules.

In the graph representation, each node is associated with a single molecular localization, while edges capture spatial relationships between nodes within the point cloud (Fig. 1b, c). Edges are derived from a Delaunay triangulation and filtered according to a distance threshold to prevent spurious connections in low-density regions. Absolute positional information is not directly used as a node feature but solely to define connectivity. Instead, node features are encoded using Laplacian positional embeddings[32](https://www.nature.com/articles/s41467-025-65557-7#ref-CR32 “Dwivedi, V. P. & Bresson, X. A generalization of transformer networks to graphs. Preprint at https://doi.org/10.48550/arXiv.2012.09699

(2020).“), while edge features include the Euclidean distance and a direction vector.

To strengthen the ability to capture complex spatial relationships, the graph is encoded into a higher-level representation ({{{\mathcal{G}}}}) through a learnable dense layer followed by ReLU activation. The latter serves as the input of a sequence of recurrent steps, were the same MIRO block ({{{\mathcal{M}}}}) is repeatedly applied, as shown in Fig. 1c. We emphasize that MIRO uses a single-layer architecture. As a result, increasing the number of recurrent steps does not affect the number of learnable parameters. The number of recurrent steps defines the size of the receptive field and therefore needs to be adapted to the density of the point cloud and the complexity of the clustering problem, as discussed in Number of recurrent steps: influence on performance and oversmoothing.

The operations of an MIRO block are schematically illustrated in Fig. 1d. At each recurrent step, the graph ({{{\mathcal{G}}}}) is concatenated with a “hidden” graph ({{{{\mathcal{G}}}}}_{h}^{{k}) having the same structure and with node and edge features initialized to zeros. Similar to the hidden state of a recurrent neural network, ({{{{\mathcal{G}}}}}_{h}}{k}) represents the hidden state of the system and characterizes the underlying processes being modeled, capturing relational information between nearby localizations. Information is propagated to generate an updated hidden graph ({{{{\mathcal{G}}}}}_{h}^{k+1}) that is passed to the next step, together with the unmodified ({{{\mathcal{G}}}}). In contrast to typical message passing schemes[30](https://www.nature.com/articles/s41467-025-65557-7#ref-CR30 “Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://doi.org/10.48550/arXiv.1806.01261

(2018).“),33,34, MIRO omits the concatenation of node features with aggregated messages. Instead, hidden node features are updated solely based on hidden edge features (i.e., the messages) to emphasize the immediate structural context of each node. The hidden node features are further decoded through a learnable dense layer to provide, for each molecular localization, a displacement vector in Cartesian space. These displacements are calculated to minimize a loss function ({{{\mathcal{L}}}}) (see MIRO loss function) that aims to shift localizations belonging to the same cluster toward a common center, while leaving background localization unaltered.

To ensure a meaningful hidden representation and prevent vanishing gradients, at each iteration within an epoch, the loss is further averaged across all recurrent steps[30](https://www.nature.com/articles/s41467-025-65557-7#ref-CR30 “Battaglia, P. W. et al. Relational inductive biases, deep learning, and graph networks. Preprint at https://doi.org/10.48550/arXiv.1806.01261

(2018).“), as schematically shown in Fig. 1c. This approach imposes intermediate corrections to the displacement vectors, helping to maintain the clusters’ structural integrity throughout the recurrent steps. This method also allows for different steps in the process to have different ground truths, enabling the network to learn and adapt to multiscale features—like the circular clusters (({{{{\mathcal{L}}}}}_{{{{\rm{spot}}}}})) and the ring structures (({{{{\mathcal{L}}}}}_{{{{\rm{ring}}}}})) shown in the example of Fig. 1c. Such multiscale training enhances MIRO’s ability to handle varying cluster sizes, shapes, and densities within the same dataset, further improving its robustness and accuracy in clustering complex biological data.

Notably, MIRO’s training can be effectively performed using a single or a few representative clusters (see MIRO training and augmentations). This approach uses the weak conservation of shape and organization within molecular clusters to boost clustering accuracy. By employing a series of augmentations, the algorithm learns to generalize across a given scenario, enabling robust performance even when trained on minimal data.

MIRO enhances DBSCAN performance

To demonstrate the benefits of using MIRO, we first applied it to simulated datasets, as illustrated in Fig. 2. MIRO is designed as a preprocessing step to enhance the performance of subsequent clustering methods. To assess the performance gains introduced by MIRO, we compared the results of DBSCAN both with and without MIRO preprocessing. We selected DBSCAN for this comparison due to its top performance in benchmark studies23,27 and its widespread use in the literature28.

Fig. 2: MIRO clustering performance on simulated datasets.

Each panel represents results obtained for one dataset: a Scenarios 8; b Scenarios 8 with blinking; c Scenarios 5 with blinking; d Scenarios 6 with blinking; e C-shaped clusters; and f ring-shaped clusters. Within each panel, the upper row shows an exemplary FOVs with localizations analyzed by DBSCAN alone (left), DBSCAN with MIRO preprocessing (middle), and the ground truth (right). Localizations are color-coded according to their assigned clusters. The bottom row presents scatter plots of the robust variant of the Adjusted Rand Index (ARI†), the intersection over union (IoU), the Jaccard Index for cluster detection (JIc), and the root mean squared relative error in the number of localizations per cluster (RMSREN) calculated over 47 (50 for e and f) different simulations (filled circles), together with their box-and-whisker plot. The central line represents the median, the box edges represent the first and third quartiles, the whiskers extend to the most extreme data points within 1.5 times the interquartile range, and outliers are shown as empty circles. Statistical significance was assessed through a paired one-sided Wilcoxon test. The number of stars represents the level of statistical significance (*p ≤ 0.5; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001). Exact p values for all statistical comparisons are provided in the accompanying Source Data file. Source data are provided as a Source Data file.

For the benchmark datasets, we used the DBSCAN parameters provided in ref. 23. For MIRO-preprocessed data and other datasets, clustering parameters were optimized using an automated procedure based on the Optuna Python library35, guided by metric-based performance scores. These parameters were consistently applied across all experiments within the same scenario and are summarized in Supplementary Tables 1 and 2. Please refer to DBSCAN parameter selection for a discussion of the parameter choice criteria.

Clustering performance was evaluated using various metrics. The benchmark study23 employed the adjusted Rand index (ARI)36 to evaluate cluster membership and the intersection over union (IoU) to measure the overlap of clusters defined by their convex hulls. However, ARI is known to be highly sensitive to cluster size imbalances37,38, a common issue in SMLM data where non-clustered molecules are often treated as an additional “background” cluster. To handle the effect of imbalance, we employed alternative metrics better suited for these scenarios, including a robust variant of ARI (ARI†)38, adjusted mutual information (AMI)37,39, and ARI calculated excluding non-clustered localizations (ARIc)22. Further details on these metrics can be found in Metrics for performance evaluation.

In addition to these metrics, we used cluster-level metrics such as the Jaccard Index for cluster detection (JIc), the root mean squared relative error in the number of localizations per cluster (RMSREN), and the root mean squared error in cluster centroid position (RMSEx,y).

In our evaluation, MIRO consistently enhances the performance of DBSCAN across all tested scenarios, as shown in Supplementary Tables 3 and 4. While these results are based on few-shot training (three FOVs, comprising 60 to 300 clusters), we also demonstrate that comparable performance can be achieved with single-shot training, as shown in Supplementary Table 5 for representative scenarios.

First, we discuss MIRO’s performance on selected datasets from the benchmark study23, characterized by different cluster density, size, and shape (Table 1 and Fig. 2a–c). For instance, in Scenario 8 (small symmetrical clusters with two different densities), while the scatter and box plots in Fig. 2a show that MIRO only slightly improves the performance of DBSCAN, this improvement is statistically significant. This scenario represents a case where the performance of DBSCAN is close-to-optimal, therefore, it is not surprising that MIRO only makes a small difference. Specifically, MIRO achieves a medium effect size for ARI† (Cohen’s d = 0.5; paired one-sided Wilcoxon test W = 852, n = 47, p = 9.3 × 10−4) and a large effect size for IoU (Cohen’s d = 0.89; W = 993, n = 47, p = 5.3 × 10−7), with this improvement being most pronounced in cluster-level metrics such as JIc (Cohen’s d = 0.92; W = 730, n = 47, p = 1.0 × 10−6) and RMSREN (Cohen’s d = 0.48; W = 21, n = 47, p = 3.2 × 10−12).

As expected, the advantage of using MIRO becomes more evident in more challenging conditions. In Scenario 8 with blinking (Fig. 2b), the increased number of localizations due to molecular overcounting introduces more heterogeneity into the data, but MIRO effectively mitigates this effect and significantly improves DBSCAN’s performance (Cohen’s d = 0.75; W = 956, n = 47, p = 5.9 × 10−6 for ARI†). MIRO further demonstrates its capability to handle additional complexities when, in addition to blinking, the number of clusters is increased, as in Scenario 5 (Cohen’s d = 1.51; W = 1114, n = 47, p = 7.8 × 10−13 for ARI†, Fig. 2c).

To further highlight MIRO’s ability in managing complex cluster geometries, we evaluated its performance under three additional conditions. First, we examined Scenario 6 with blinking, which includes elliptically shaped clusters. Additionally, we simulated data with C-shaped and ring-shaped clusters (Fig. 2d–f). In these scenarios, MIRO produces a marked enhancement in DBSCAN’s performance by consistently transforming elongated and non-convex shapes into well-defined, compact clusters for the further application of DBSCAN.

We further demonstrate that MIRO outperforms recent supervised methods not included in the benchmark, such as an implementation of the GNN-based framework proposed in ref. 22 (MAGIK-S), as shown in Supplementary Table 6. Details of the implementation are provided in Comparison with a supervised graph-based clustering framework.

Simultaneous clustering and classification of different shapes

MIRO offers the capability of simultaneously handling diverse structural patterns by compressing localizations from different cluster shapes into a uniform representation. This capability enables effective clustering using a single set of parameters across different shapes when applied to algorithms like DBSCAN. The unified representation simplifies parameter tuning and enhances clustering performance. However, this transformation can also lead to challenges in subsequent classification, as the uniform collapse of different shapes may obscure their unique features.

However, while transforming various structures into compact forms, MIRO can generate additional output features at the node level that can be used, e.g., for simultaneous cluster shape classification. This dual capability is essential for, e.g., distinguishing among various molecular assemblies within the same biological environment, each exhibiting unique organizational patterns and functional roles.

To evaluate MIRO’s ability to simultaneously cluster and classify different structures, we generated simulated datasets comprising mixtures of circular, elliptical, C-shaped, and ring-shaped clusters (Fig. 3). Each cluster type represents a distinct molecular assembly, characterized by unique spatial properties. MIRO effectively learns to capture these features at the node level. While clustering ensures accurate separation of structures, taking the mode of node-level class predictions within each cluster allows for reliable identification of the corresponding structural type in heterogeneous datasets.

Fig. 3: MIRO’s simultaneous clustering and classification of different shapes.

Results from simulations involving three distinct mixtures of shapes: a–c spots and ellipses, d–f spots and rings, and g–i C-shaped clusters and rings. a, d, g Exemplary fields of view with the mixtures analyzed using DBSCAN with MIRO preprocessing (top) alongside the ground truth (bottom). Localizations are color-coded. In the left column, different colors correspond to different shapes, while non-clustered localizations are shown in gray. In the middle and right columns, localizations forming clusters of specific shapes are color-coded based on their assigned clusters, with other shapes and non-clustered localizations depicted in gray. b, e, h Confusion matrices with the classification accuracy for different structural configurations. The rows represent the true classes, and the columns represent the predicted classes, with F1-scores indicated to assess the overall classification performance. c, f, i Box-and-whisker plots of the robust variant of the Adjusted Rand Index (ARI†), the intersection over union (IoU), the Jaccard Index for cluster detection (JIc), and the root mean squared relative error in the number of localizations per cluster (RMSREN), calculated across 50 simulations (filled circles). The central line in each boxplot represents the median, the box edges correspond to the first and third quartiles, the whiskers extend to the most extreme data points within 1.5 times the interquartile range, and outliers are shown as empty circles. Source data are provided as a Source Data file.

Figure 3 illustrates the results for three distinct mixtures: spots and ellipses (Fig. 3a–c), spots and rings (Fig. 3d–f), and C-shaped clusters and rings (Fig. 3g–i). Overall, the results demonstrate that MIRO’s preprocessing effectively distinguishes between different shapes and accurately assigns localizations to their respective clusters. This enhanced performance is evident in both the confusion matrices (Fig. 3b, e, h), which show higher classification accuracy across all shape combinations, and the clustering metrics (Fig. 3c, f, i). Notably, the clustering metrics indicate that, in several instances, the performance is similar to those obtained for a single shape. This is particularly remarkable considering that no restrictions were imposed on cluster overlap; clusters of different shapes could overlap or be arranged in ways that mimic other shapes, such as aligned spots forming an ellipse or facing C-shapes resembling a ring.

Detecting heterogeneous and dense clusters

In SMLM, fluorophore blinking often results in overcounting, where each molecule produces multiple localizations. This phenomenon creates artificial clusters with dimensions comparable to the localization precision9. Additionally, the natural aggregation of proteins at the nanoscale leads to the formation of structures known as nanoclusters10, which further contributes to clustering.

Accurate clustering analysis is crucial for precisely quantifying the spatial distribution of these nanoclusters. This involves tasks such as determining nanocluster sizes and estimating protein copy numbers within each nanocluster, often in comparison to a reference sample40. High cluster density or supra-cluster organization exacerbates the challenge, as reduced inter-cluster distances and variable localization counts between adjacent clusters can lead to the underestimation of the number of clusters and the overestimation of cluster sizes and molecular content.

MIRO offers substantial improvements for analyzing adjacent clusters in SMLM data. We assessed MIRO’s effectiveness by conducting quantitative tests as a function of the inter-cluster distance. We simulated pairs of clusters with similar sizes but containing different numbers of localizations, located at varying cluster-to-cluster distances. Localizations belonging to the same cluster were spatially arranged according to a 2D Gaussian distribution with width σ. The number of localizations per cluster was drawn from an exponential distribution. Clusters were spaced at various distances as a function of σ. We applied MIRO and DBSCAN to compare the methods’ ability to resolve the clusters, as quantified by the Jaccard Index for cluster detection (JIc). As demonstrated in Fig. 4a, at distances ≤2σ, MIRO significantly improves clustering accuracy compared to DBSCAN, providing a more precise characterization of nanocluster spatial arrangements and thus improving their quantification.

Fig. 4: MIRO improves the quantification of dense and heterogeneous clusters.

a Performance comparison of MIRO and DBSCAN in resolving cluster pairs located at varying distances relative to their radius σ. The panel illustrates three examples with different numbers of localizations. The Jaccard Index for cluster detection (JIc), calculated as a function of distance, demonstrates the superior performance achieved using MIRO over DBSCAN alone. b Localization map obtained from a dSTORM image of integrin α5β1 in HeLa cells, analyzed using MIRO. Clustered localizations are represented by opaque symbols, while semi-transparent symbols represent non-clustered localizations. The numbered panels on the right are zoomed-in views of the regions indicated by the arrows, with different colors representing different clusters identified by MIRO (left column), whereas DBSCAN merges adjacent clusters (right column). Scale bar 5 μm. (Lower inset) Reflection interference contrast image of the cell, darker regions correspond to the membrane adhesion area. Quantification of the clustering obtained by MIRO (orange) and DBSCAN (green): c histogram of cluster radius, d number of localizations per cluster (logarithmic y-scale in the inset), and e the nearest neighbor distance between clusters. Source data are provided as a Source Data file.

Additionally, we applied MIRO to the quantification of molecular organization in experimental data. Using dSTORM images of integrin α5β1 in HeLa cells, we studied receptor organization, which exhibits a spatial hierarchy with molecules arranged in nanoclusters41 that can aggregate to form larger structures that build focal adhesions (FAs)12,[15](https://www.nature.com/articles/s41467-025-65557-7#ref-CR15 “Keary, S., Mateos, N., Campelo, F. & Garcia-Parajo, M. F. Differential spatial regulation and activation of integrin nanoclusters inside focal adhesions. bioRxiv https://doi.org/10.1101/2023.12.16.571970

(2023).“). MIRO processing of molecular localizations allowed for accurate identification of integrin nanoclusters, as shown in Fig. 4b. The cell area, corresponding to the dark region in the reflection interference contrast image (inset of Fig. 4b), reveals a high density of nanoclusters (opaque symbols). The zoomed-in regions 1–3 in Fig. 4b illustrate MIRO’s ability to resolve close individual nanoclusters forming larger structures, whereas DBSCAN merges nearby clusters.

Thanks to the robust identification of the nanoclusters enabled by MIRO, it is then possible to precisely quantify nanocluster size (Fig. 4c), number of localization per nanocluster (Fig. 4d), and distance between nanoclusters (Fig. 4e), providing a more accurate and detailed understanding of molecular organization as compared to DBSCAN alone and underscoring MIRO’s potential for high-resolution analysis of protein complexes in SMLM. Clusters retrieved by MIRO show a monodispersed distribution of radius, centered at ≈38 nm (Fig. 4c), and a distribution of the number of localizations per cluster with an exponential tail with an average of 17.8 (Fig. 4d), whereas DBSCAN shows spurious longer tails in both distributions, due to the merging of adjacent clusters. As a consequence, Fig. 4e shows that the nearest-neighbor distance between nanoclusters calculated on MIRO-processed data has a peak at ≈100 nm, reflecting cluster proximity that DBSCAN misses due to the merging of adjacent clusters.

Multiscale clustering of nuclear pore complex

Molecular complexes often exhibit organization across multiple scales, with the nuclear pore complex (NPC) being a paradigmatic example. The NPC is a large molecular channel embedded in the nuclear envelope, regulating the transport of macromolecules between the nucleus and cytoplasm of eukaryotic cells. The NPC consists of more than 30 proteins and has a precise three-dimensional architecture. One of its key components, Nup96, is present in 32 copies per NPC, forming both a cytoplasmic ring and a nucleoplasmic ring. Each ring features 8 corners, with two Nup96 molecules at each corner. When imaged with SMLM, Nup96-labeled NPCs oriented parallel to the focal plane display an annular structure. Since the two rings are nearly aligned, the eightfold symmetry of the NPC is clearly observable and each of the eight corners thus appears as a small cluster of the localizations generated by four Nup96 molecules. Because of its regular arrangement, Nup96 endogenously tagged with commonly-used labels has been adopted as a reference protein for the quantitative optimization of super-resolution microscopy workflows13.

The characterization of the nuclear pore complexes from SMLM imaging poses a challenge at two different scales: accurate segmentation of the ring structures and precise identification of the corners. Both tasks are typically tackled separately with ad hoc methods, which are often strongly dependent on algorithmic parameters. However, thanks to its sequential architecture, MIRO enables the simultaneous segmentation of rings and corners.

To demonstrate MIRO’s ability to tackle these challenges simultaneously and quantitatively, we first relied on simulations. We generated synthetic localization maps with structures composed of small symmetrical clusters, each with a random number of localizatio

Introduction

Introduction

Results

MIRO workflow

MIRO enhances DBSCAN performance

Simultaneous clustering and classification of different shapes

Detecting heterogeneous and dense clusters

Multiscale clustering of nuclear pore complex

Similar Posts