Temporal fingerprints for identity matching across fully encrypted domains

Introduction

In an era characterized by digital ubiquity, the nature of human interaction has undergone a profound transformation. Historically, our interactions, whether social or commercial, relied on physical encounters, bound to a singular persona—our physical identity. However, with the emergence of online digital platforms, the vast majority of our activities occur online, facilitating the use of different profiles for diverse objectives and platforms. The rapid advancements in generative artificial intelligence (AI) and large language models (LLMs) have further impacted our digital environments, making content creation effortless and instantaneous. In this digitally dispersed and under-regulated landscape, malicious or deceptive use of multiple online identities has become a growing concern. Indeed, governments, corporations, and nations alike face serious threats as entities exploit anonymity and platform diversity to manipulate narratives, influence public opinion, and disrupt markets, posing risks to societal stability, economic integrity1, and national security2,3,4. Detecting and mitigating such threats across platforms presents considerable challenges, as malicious activity is often concealed and fragmented across pseudonymous identities. Addressing these challenges requires robust methodologies for linking profiles associated with the same real-world entities across domains, referred to as the identity matching problem (Network alignment or user identity linkage are terminologies often used for this problem as well).

Extensive research efforts have demonstrated that content and personal attributes, such as age, gender or usernames are effective for matching profiles across domains. These approaches include both supervised5,6,7,8,9 and unsupervised10,11,12,13 models. However, content-based models face significant challenges nowadays. First, the widespread use of generative AI enables creating numerous variations of the same narrative effortlessly, making it increasingly difficult to detect content originating from the same individual. Additionally, the increasing shift toward encrypted platforms often restricts access to user-generated content and private attributes, especially across different platforms, further limiting the effectiveness of these models.

Furthermore, even in anonymized settings, notable cross-domain identity matching capabilities have been achieved by exploiting the network of interactions14,15,16,17,18,19,20,21,22,23 or individual metadata, such as profile trajectories24. However, such structure-based models face several limitations. Different platforms often exhibit distinct types of connections, leading to unique structural patterns that may hinder the effectiveness of these methods. Additionally, the sheer volume of structural information, especially when considering k-hop neighbors and the dynamic nature of connections, makes comprehensive data collection a difficult task. Even when structural data is completely available, running such models at scale is computationally expensive and even infeasible at times.

Building on the concept of “network of networks”25,26,27, we propose that distinct domains are implicitly connected through profiles controlled by the same entity. These hidden connections enable external events to propagate like shock waves, influencing actions across otherwise disconnected platforms. The temporal traces left by these actions, such as the timing of phone calls, digital transactions, or social media activity, offer measurable indicators of the underlying connections, bridging the gap between implicit links and explicit observable data. Bursty patterns of human behavior28, represented by the gap between consecutive activities of the individual (inter-event times), offer a distinct perspective of human dynamics, emphasizing the frequency of activity, rather than precise timing of individual events. These distributions were previously encountered in many and diverse types of activities, including individual mobility patterns, e-mail communications, instant messaging, web browsing, and mobile phone calls29,30,31,32,33,34.

In this study, we demonstrate that beyond conforming to a heavy-tailed distribution, bursty patterns are uniquely personal and can effectively characterize an individual, even upon acting across different platforms. Interestingly, these patterns provide a distinctive signature, better than the network of interactions and the actual timing of activity, enabling the detection of multiple profiles corresponding to the same individual, across different encrypted domains. We demonstrate the bursty model’s performance on two use-cases: across different financial platforms and across different social platforms. First, we show that our model outperforms various state-of-the-art temporal and structural models, presenting an average area under the receiver operating characteristic (ROC) curve (AUC) of 0.86, across two examined financial marketplaces. We further demonstrate its high stability over time, enabling the correct identification of 35% of profiles even after an entire year, by examining at most 10 candidates for each profile. This suggests that the temporal fingerprints last not only across domains but also for long periods of time. We additionally evaluate the model’s scalability, applying it across 500 different market places, encompassing the activity of over 250k daily traders. The identity matching problem across these domains presents a notably low baseline, as merely 3 out of 1000 randomly selected profile pairs actually correspond to the same individual. Nevertheless, our methodology achieves an average AUC of 0.78 and precision of 96% for the top-100 predictions. Finally, we show that the model is generalizable to other types of domains, establishing an AUC of 0.63 for matching identities across Twitter, Telegram and Instagram data, and achieving AUCs of 0.77 and 0.89 for matching identities across different sub-reddits and across different Telegram channels, respectively.

As the digital world enables the effortless creation of different content based on the same narrative content-based identity matching models become increasingly hindered. In parallel, the growing adoption of privacy-preserving measures further restrict access to identifying content. Accurately matching identities across platforms under these constraints becomes more challenging, but is inherently essential for uncovering patterns that extend across fragmented digital environments. By leveraging bursty individual patterns, our model provides a robust solution for linking identities within and across encrypted and content-restricted domains. Additionally, compared to other state-of-the-art temporal and structural methods with significantly higher computational complexity, our approach offers a vital advantage in enabling fast and accurate detection at scale. By enabling accurate identity matching across domains, even when content and network structure are absent, the proposed methodology provides a critical building block for detecting coordinated behavior across fragmented systems. Coupled with the suggested broader perspective on the mechanisms driving cross-domain coordination, this work lays a robust foundation for advancing threat detection methodologies in a rapidly evolving digital landscape, where traditional methods often fall short.

Results

Preliminaries

In this study, we aim at identifying individuals across different encrypted domains. Specifically, we aim at learning an identity matching function:

Definition 1

Given D1, …, D**n different domains, the goal of the cross-domain identity matching problem is learning a function:

$$p:{\bigcup}_{i,j\in [n]}{D}_{i}\times {D}_{j}\to [0,1]$$

such that (p({u}_{{d}_{1}},{v}_{{d}_{2}})) represents the probability that the profiles ({u}_{{d}_{1}}\in {D}_{1}) and ({v}_{{d}_{2}}\in {D}_{2}) are associated with the same real-world individual (({u}_{{d}_{1}}={v}_{{d}_{2}})).

The vanilla inter-event bursty model

We propose exploiting individual temporal data for linking profiles back to the same individual across different domains. Specifically, we analyze individual bursty patterns, manifested as the time difference between any two consecutive activities of each profile. Formally:

Definition 2

Given a time period [τ, τ + Δτ] and a profile u**d in domain D, we denote the sequence of their activity times ({A}_{\tau }^{{u}_{d}}\subset \left[\tau,\tau+\Delta \tau \right]) by:

$${A}_{\tau }^{{{u}_{d}}=({t}_{0}}{{u}_{d}},\ldots,{t}_{m}^{{u}_{d}})$$

(1)

An inter-event time period is defined as the time difference between two consecutive activities of ud:

$$\Delta {t}_{i}^{{{u}_{d}}={t}_{i}}{{u}_{d}}-{t}_{i-1}^{{u}_{d}}$$

(2)

The inter-event time sequence is defined by:

$${S}_{\tau }^{{{u}_{d}}=(\Delta {t}_{1}}{{u}_{d}},\ldots \Delta {t}_{m}^{{u}_{d}})$$

(3)

The cumulative distribution function of the inter-event sequence is defined as:

$${Q}_{\tau }^{{{u}_{d}}(\Delta t)=\frac{| \delta \in {S}_{\tau }}{{u}_{d}}:\delta \le \Delta t| }{m}$$

(4)

Our bursty identity matching function pks is based on the similarity between the established inter-event time distributions of any two profiles, estimated by the Kolmogorov–Smirnov (KS) statistic:

Definition 3

Let ({u}_{{d}_{1}}\in {D}_{1}^{{\tau }) and ({v}_{{d}_{2}}\in {D}_{2}}{\tau }), and their corresponding inter-event time distributions ({Q}_{\tau }^{{{u}_{{d}_{1}}}) and ({Q}_{\tau }}{{v}_{{d}_{2}}}). The KS-statistic is defined as the maximal difference between their distributions:

$$K{S}_{\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})={\sup }_{\Delta t}| {Q}_{\tau }^{{{u}_{{d}_{1}}}(\Delta t)-{Q}_{\tau }}{{v}_{{d}_{2}}}(\Delta t)|$$

(5)

We define the corresponding identity matching function ({p}_{\tau }^{ks}) as:

$${p}_{\tau }^{ks}({u}_{{d}_{1}},{v}_{{d}_{2}})=1-K{S}_{\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})$$

(6)

We postulate that different profiles pertaining to the same individual are bound to exhibit synchronization in their activity dynamics, despite acting on different domains. The motivation to this hypothesis is presented in Fig. 1, observing two pairs of profiles:

A positive pair: ({u}_{{d}_{1}}) and ({u}_{{d}_{2}}), correspond to the same individual u in financial trading markets ({D}_{\tau }^{{1}) and ({D}_{\tau }}{2}) respectively, illustrated illustrated in Fig. 1a (orange and cyan markers).

A negative pair: ({v}_{{d}_{2}}) and ({w}_{{d}_{3}}), corresponding to different individuals in two different financial trading markets (v\in {D}_{\tau }^{{2},,,w\in {D}_{\tau }}{3}), illustrated by red and green markers in Fig. 1a.

Fig. 1: Synchronization of burstiness patterns.

a illustrates the daily networks of three financial trading markets ({D}_{\tau }^{{1}), ({D}_{\tau }}{2}) and ({D}_{\tau }^{3}), each corresponding to the trading of a different crypto-token. Profiles ({u}_{{d}_{1}}) (degree k = 306) and ({u}_{{d}_{2}}) (degree k = 268) correspond to the same individual (illustrated by orange and cyan markers). Profiles ({v}_{{d}_{2}}) (degree k = 249) and ({w}_{{d}_{3}}) (degree k = 275) pertain to different individuals (illustrated by red and green markers). b presents the activity times of ({u}_{{d}_{1}}) and ({u}_{{d}_{2}}), reaching an activity overlap of 37%. d presents the activity times of ({v}_{{d}_{2}}) and ({w}_{{d}_{3}}), reaching an activity overlap of 42%. c depicts the cumulative inter-event times distributions of ({u}_{{d}_{1}}) and ({u}_{{d}_{2}}), exemplifying similar distributions ((K{S}_{\tau }({u}_{{d}_{1}},{u}_{{d}_{2}})=0.031) with a p value of 0.99). e depicts the cumulative inter-event times distributions of ({v}_{{d}_{2}}) and ({w}_{{d}_{3}}), exemplifying significantly different distributions ((K{S}_{\tau }({v}_{{d}_{2}},{w}_{{d}_{3}})=0.47) with a p-value of 5e−27).

Both pairs present similar degrees (illustrated in Fig. 1a) and resembling overlap in activity times: 37% of the time for the positive pair and 42% for the negative pair, illustrated in Fig. 1b, d, respectively. Nevertheless, the negative pair presents significantly different inter-event time distributions, establishing a KS distance of (K{S}_{\tau }({v}_{{d}_{2}},{w}_{{d}_{3}})=0.47) with a p value of 5e−27 (Fig. 1e), while the positive pair presents a high similarity of the inter-event distributions, with (K{S}_{\tau }({u}_{{d}_{1}},{u}_{{d}_{2}})=0.031) and a p-value of 0.99 (illustrated in Fig. 1c). This illustrates that burstiness patterns are able to characterize the profiles pertaining to the same individual better than basic temporal and structural patterns.

Experiments on financial markets

Identifying collusive trading practices, fraudulent accounts, and coordinated market manipulations, often concealed under multiple pseudonymous profiles to evade detection, is essential for ensuring the integrity of financial systems and maintaining market stability. We examine 2k traders transacting on two different financial trading markets, on top of the Ethereum blockchain35,36,37. In this setting, we refer to a financial trading market ({D}_{\tau }^{i}) as encompassing all of the trading activity related to the respective crypto-token c**i and time period [τ, τ + Δτ] where Δτ stands for single day length:

$${D}_{\tau }^{i}={u:u,{{{\rm{bought}}}},{{{\rm{or}}}},{{{\rm{sold}}}},{c}_{i},{{{\rm{in}}}},[\tau,\tau+\Delta \tau ]}$$

(7)

We wish to evaluate the performance of the vanilla inter-event bursty model and compare it to baseline models exploiting structural14,15,19,20 and temporal38,39,40,41 node characteristics (consider section “Methods” for a formal definition). Figure 2 presents the performance analysis of these models. Specifically, we consider 14 days of activity over two examined financial trading markets ({D}_{\tau }^{{1},{D}_{\tau }}{2}) and evaluate the performance on each day separately. Figure 2a presents the comparison of the AUC (ROC curve), and Fig. 2b depicts comparison of the precision established for each threshold of top-ranked profile pairs (({u}_{{d}_{1}},{v}_{{d}_{2}})). Both metrics indicate that structure-based models demonstrates limited efficacy in linking profiles across different domains, highlighting the greater significance of temporality in this context (consider section “Discussion” for a thorough discussion). The vanilla inter-event bursty model outperforms all temporal baselines, suggesting that while the bursty dynamics are closely related to an individual’s actual activity times, the latter is less effective in capturing the nuances required for an accurate individual fingerprint.

Fig. 2: Performance evaluation for the cross-domain identity matching problem.

Comparing the inter-event bursty model (pks) and temporal (blue shaded) and structural (yellow shaded) baselines. a presents the averaged AUC over 14 daily tests, with error bars standing for standard error. b presents the average precision as a function of the examined number of pair candidates (with ±1 standard error in light background, correspondingly). The inter-event bursty model presents higher performance than baseline methods across all examined metrics.

Stability and robustness

We further wish to evaluate the ability to match profiles across domains even after long periods of time. In particular, given a user u with a profile ({u}_{{d}_{1}}\in {D}_{1}^{{t}_{0}}), active during [t0, t0 + Δτ] in domain D1, we define the similarity of ({u}_{{d}_{1}}) to other profiles in the second domain, after a time delay τ as:

$$K{S}_{{t}_{0},\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})={\sup }_{\Delta t}| {Q}_{{t}_{0}}^{{{u}_{{d}_{1}}}(\Delta t)-{Q}_{{t}_{0}+\tau }}{{v}_{{d}_{2}}}(\Delta t)|$$

(8)

The corresponding identity matching function is:

$${p}_{{t}_{0},\tau }^{ks}({u}_{{d}_{1}},{v}_{{d}_{2}})=1-K{S}_{{t}_{0},\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})$$

(9)

We define the identification probability of u at time τ as the probability that ({p}_{{t}_{0},\tau }^{ks}({u}_{{d}_{1}},{u}_{{d}_{2}})) is within the top-k ranked matches for ({u}_{{d}_{1}}). Figure 3a, b present the identification probability as a function of the examined time delay τ, for k = 5 and k = 10 correspondingly. While the identification probability decreases with the length of delay, the performance remains rather high with 25% (35%) of users correctly identified within 5 (10) matches, even after a year long, outperforming baseline temporal models.

Fig. 3: Temporal fingerprints stability.

The identification probability of a user depending on the time delay from their initial observation, for k = 5 and k = 10, in (a, b) correspondingly. Despite the evident decrease of the identification probability as the delay between observations increases, the vanilla bursty model (dark red curve) is able to correctly identify 25% of the users within 5 ranks, and 35% within 10 ranks, after a delay of an entire year, outperforming the baseline temporal models (green-shaded curves). Shaded background represents ±1 standard error.

Next we examine the robustness of our suggested model to data omission. Indeed, incomplete data is a common challenge in today’s era of massive data generation, making it crucial to understand its implications. We randomly omit Δ of each profile’s activity times within the different domains and analyze the omission effect. Figure 4a presents the cumulative inter-event time distributions of two examined profiles, which pertain to the same individual, after randomly omitting Δ ∈ [50%, 75%, 90%] of their activity, compared to their original distributions. Notably, the pre- and post-data omission distributions change significantly, and the similarity between the two profiles decreases with Δ. Figure 4b depicts the change to the average KS distance as a function of the original KS, presenting a non-monotonous effect of data omission on profile similarity. Encouragingly, this indicates that both originally highly similar and highly distinct profiles are more robust to data omission, compared to profile pairs that originally had medium-level similarity, and are more prone to be affected by incomplete data. We further wish to estimate the data omission effect on the performance of the identity matching model. Figure 4c, d depict correspondingly the AUC and precision for each of the different levels of Δ, indicating the decrease caused by incomplete data. Despite this expected decrease in performance, the vanilla inter-event bursty model, even under 90% data omission threshold, outperforms state-of-the-art temporal models, applied with only 50% omission threshold (consider Supplementary Fig. 3). This result underscores the model’s relative robustness and its effectiveness in capturing individual fingerprints, even under severe data limitations.

Fig. 4: Robustness to incomplete data.

a depicts the cumulative inter-event time distributions of two examined profiles, which pertain to the same individual, after randomly omitting Δ ∈ [50%, 75%, 90%] of their activity, compared to their original distributions. Notably, the similarity deteriorates with increase in Δ. b presents the non-monotonous effect of data omission on the average KS distance change as a function of the original KS distance. c, d present the AUC and the precision of pks upon different noise omission thresholds, presenting a slight decrease in performance. Error bars (c) and light shaded background (b, d) represent standard error.

Scalability: Bursty-GNN for temporal similarity networks

To further enhance the vanilla version of inter-event bursty model, we propose employing a temporal graph neural network (TGNN) on top of cross-domain similarity networks ({G}_{ks}^{{\tau }=({V}}{\tau },,{E}^{\tau })) where edges link profiles across different domains, and edge weight is reflected by the KS distance between the inter-event time distributions of any pair of profiles. We employ a supervised learning approach to train the TGNN, using edge labels inferred from the KS statistic metric:

Positive edges: two profiles ({u}_{{d}_{1}}\in {D}_{1}^{{\tau }) and ({v}_{{d}_{2}}\in {D}_{2}}{\tau }) are linked by a positive edge if (K{S}_{\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})\le t{h}^{p}), where thp is a predefined positive threshold.

Negative edges: two profiles ({u}_{{d}_{1}}\in {D}_{1}^{{\tau }) and ({v}_{{d}_{2}}\in {D}_{2}}{\tau }) are linked by a negative edge if (K{S}_{\tau }({u}_{{d}_{1}},{v}_{{d}_{2}})\ge t{h}^{n}), where thn is a predefined negative threshold.

We employed thp = 0.001 and thn = 0.98 as the predefined positive and negative thresholds, respectively. The proposed TGNN setting utilizes labels inferred from the KS statistic and does not rely on actual identity labels. As such, it is applicable to the unsupervised setting we are examining. The Bursty-TGNN learns a latent embedding for all profiles, which is utilized subsequently for a cross-domain edge detection task. Figure 5a illustrates the two-layer Bursty-TGNN employed on the daily similarity networks. An elaborated overview of the TGNN architecture can be found in the Supplementary Materials.

Fig. 5: Scalability and the Bursty-TGNN.

a presents the architecture of the 2-layer TGNN on top of temporal similarity networks. b, c present the average AUC and precision of the Bursty-TGNN, the vanilla bursty model and various temporal baseline models on the multi-domain identity matching problem, with the Bursty-TGNN manifesting an evident enhancement to the top-1000 precision. Error bars (b) and light shaded background (c) represent standard error.

In order to evaluate the Bursty-TGNN, we consider an identity matching problem of higher complexity, where we do not restrict the experiment to the two-domains use-case. This setting, alongside verifying the performance of the Bursty-TGNN, will assist in examining its scalability by identifying profiles across over 500 financial trading markets, while considering over 250k daily users. The performance was evaluated for the vanilla bursty model, the Bursty-TGNN model comparing them with temporal baseline models only. The structural baseline models did not scale effectively, preventing us from evaluating their performance in this challenge.

The identity matching problem across these domains presents a notably low baseline, as merely 3 out of 1000 randomly selected profile pairs actually correspond to the same individual (dashed horizontal red line in Fig. 5c). Figure 5b, c depict, respectively, the average AUC and precision for the multi-domain setting. Notably, up to the top-200 pairs the vanilla inter-event bursty model outperforms the temporal baselines, reaching almost error-less precision on average. Furthermore, the Bursty-TGNN extension (dashed purple curve, Fig. 5c) presents a performance boost when run on the top-1000 inter-event similarity edges (dashed vertical gray line in Fig. 5c marks the top-1000 threshold), underscoring the evolving role of each profile within the overall network and the hidden potential in this dynamic view for solving the identity matching task.

Experiments on social media

Identifying coordinated inauthentic behavior across social media domains is essential, since malicious entities frequently employ bots, troll farms, or fake accounts to avoid detection and reinforce targeted narratives. These tactics are often aimed at shaping public sentiment, influencing elections, and steering political landscapes. Platforms like Twitter (X) and Telegram, widely used for real-time news sharing, opinion formation, and group coordination, are particularly appealing to actors seeking to amplify influence across diverse audiences. A critical step toward detecting such campaigns is the ability to accurately match identities across distinct platforms, especially when pseudonymity and content encryption hinder profile attribution across domains. In this experiment, we examine 266 profiles: 126 Twitter, 120 Telegram and 20 Instagram, corresponding to 131 different entities, on a weekly basis (Δτ = 7 days), for 8 weeks. Figure 6 presents the performance of the vanilla inter-event model and comparisons to the temporal baselines. The vanilla inter-event model outperforms the baseline models, attesting to the generalizability of the model to different platform types. To further evaluate generalizability, we conducted two additional identity matching experiments. The first experiment consists of matching profiles across different Reddit forums (sub-reddits), and the second one involved matching profiles across different Telegram channels. These analyses, presented in Supplementary Fig. 4, yielded higher performance than the Twitter-Telegram-Instagram setting and provide additional evidence for the generalizability of burstiness-based identity matching across diverse social media contexts.

Fig. 6: Performance evaluation of the inter-event similarity method (*pks*) for identity matching across social media platforms and comparison to temporal baselines.

a depicts the averaged AUC over 8 weekly tests, with error bars signifying the standard error. b resents the average precision as a function of the examined number of pair candidates (with ±1 standard error in light background, correspondingly). The inter-event bursty model presents similar AUC and higher precision, comparing with baseline temporal models.

Discussion

The widespread accessibility of generative AI and LLMs has made text generation remarkably fast and effortless, placing powerful tools within the reach of all. These technological advancements have led to a significant rise in manipulative cross-platform activity including the spread of sophisticated influence campaigns, which are now far more challenging to detect using content-based models due to their versatile and easily adaptable nature. Privacy restrictions, though essential for safeguarding individuals, hinder the analysis of cross-domain patterns and correlations, which are crucial for identifying such coordinated activities. Countering these evolving threats across disconnected domains requires accurate identity linkage as a foundational step.

This study underscores the potential of temporal patterns, specifically individual burstiness, as a robust and scalable framework for cross-domain identity matching. We demonstrate that individual bursty dynamics form temporal fingerprints that persist across different platforms and over long periods of time, enabling accurate cross-domain identity matching. Our model outperforms state-of-the-art temporal and structure-based models in two experimental settings (Figs. 2 and 6), proving its generalizability. We demonstrated that the suggested model is stable over time, and is able to correctly identify 35% of the users even after an entire year, by examining at most 10 candidates for each profile (Fig. 3b). We further assessed the model’s robustness to incomplete data and its impact on individual temporal fingerprints. We demonstrated that despite omitting various fractions of the individual’s activity, temporal fingerprints remain similar, in particular for originally highly similar profiles (Fig. 4a, b). Although this affected the predictive ability of the model, it was still able to attain an AUC of 0.82 after an omission of 50% of individual activity (Fig. 4c). Lastly, we demonstrated that our model is highly scalable, outperforming the temporal baseline models in a setting involving 500 distinct domains (Fig. 5b, c), where all examined structure-based models failed to scale effectively. The limitations of structure-based models and the high computational complexity of existing temporal models highlight the advantages of our model as an efficient alternative, enabling real-time deployment on resource-limited devices.

Beyond the practical applications of cross-domain identity matching across encrypted domains, a deeper question emerges: why are temporal signals more informative for linking identities than structural ones? We postulate that structural patterns, influenced by connection type (e.g., trading, social, or professional networks), vary significantly across different platforms, obscuring structural coordination and hindering cross-domain identity matching. In contrast, the manifestation of temporal regularities stems from the interconnected nature of distinct domains. To explain this observation more generally, we turn to a theoretical perspective that captures how coordination can emerge across disconnected systems. Building on the “network of networks” framework25,26,27, we consider coordinating individuals as bridges, implicitly linking seemingly disconnected domains. External events propagate as shock waves through this interconnected structure, while influencing coordinating entities across domains and triggering their actions. These actions, even if not simultaneous, often exhibit similar bursty patterns (see ref. 42 for formal modeling). Cross-domain identity matching, where a single entity controls multiple profiles across distinct platforms, offers a concrete example of this mechanism. The user’s profiles implicitly connect otherwise disconnected domains: activity triggered by interaction or coordination on one platform can lead the same entity to act on another platform, thereby transmitting influence across systems, leading to similar bursty patterns. While previous studies primarily modeled human burstsiness based on isolated individuals, disregarding environmental effects28,43,44,45,46, we offer a broader perspective, attributing aligned bursty behavior to shock waves traversing the network of networks.

These findings carry implications beyond the specific task of identity matching. In network science and computational social science, individual behavior is often modeled through structural relationships or semantic content. Our results suggest an alternative approach, modeling individuals based on the timing and dynamics of their actions. This temporal perspective enables modeling behavioral regularities across platforms, even in the absence of observable connections or shared metadata. By demonstrating that temporal signals are sufficient for identifying persistent behavior, our work positions time as a foundational dimension in understanding individual roles and coordinated activity within complex and fragmented systems.

Finally, it is worth considering whether coordination signals arising from bursty dynamics can be obscured. Our robustness analysis for the identity matching use-case (Fig. 4) reveals that despite omitting significant activity portions, most users remain identifiable. We hypothesize that this robustness may extend to the more general coordination scenario. Specifically, since many online settings are designated for timely responses, obscuring coordination from temporal signals is inherently challenging, as individuals naturally respond promptly to external shocks. Attackers may attempt to distribute actions over time using hidden agents, but such strategies are impractical. For instance, delayed transactions in money laundering raise suspicion, and dispersed actions weaken coordinated attacks’ impact, suggesting that ultimately, attempts to obfuscate coordination may undermine the purpose of the coordinated activity itself.

Limitations and future research

An intrinsic limitation of our study is the need for sufficient individual data to reliably estimate inter-event distributions. Future research should examine how identification probability depends on activity volume and inspected period length. Our preliminary analysis demonstrates that identification probability increases with activity volume (Fig. S2) but decreases with longer inspection periods (Fig. S5), indicating short-term patterns are more effective for matching profiles. The choice of similarity measure also impacts performance, and alternative methods, such as those in ref. 47, could be evaluated. Further improving model scalability, possibly using complexity-reduced versions of the KS statistic48 and optimizing search algorithms, is another promising direction. In addition, while our current approach is fully unsupervised, it would be valuable to explore the effect of incorporating limited supervision, such as fine-tuning with a small set of labeled identity pairs, to further enhance performance in low-resource scenarios. The authors intend to pursue these directions in future research.

Methods

Data

Financial markets data

We consider the Ethereum blockchain35,36,37,49,50 as our financial dataset. This encrypted financial ecosystem enables the trading of tens of thousands of different crypto-tokens, using a single Ethereum wallet. Broadly, a crypto wallet is a digital tool that securely stores and manages the user’s cryptocurrency holdings, allowing the user to send, receive, and monitor their digital assets on blockchain networks. The address of a crypto wallet serves as a unique identifier, similarly to an account number in traditional financial systems. Since a single Ethereum wallet can be employed for the trading of all Ethereum-based crypto-tokens, it can be used as the trader’s identifier across different crypto-domains, for validation purposes (ground truth). We refer to a financial trading market D**i as encompassing all the trading activity related to the respective crypto-token c**i. We consider two different experimental settings over this dataset.

Two-domain setting: Considering 14 days of trading activity across two domains only, encompassing the activity of 2k daily users.

Multi-domain setting: Considering 14 days of trading activity across 512 financial domains, encompassing the activity of 250k daily users.

Both settings contain temporal data on an individual level granularity and network data, where an edge (u, v) ∈ V**i represents that user u sold crypto-token i to user v.

Social platforms

Cross Twitter-Telegram-Instagram: This dataset contains posting activity from 266 user profiles: 126 Twitter

Introduction

Introduction

Results

Preliminaries

Definition 1

The vanilla inter-event bursty model

Definition 2

Definition 3

Experiments on financial markets

Stability and robustness

Scalability: Bursty-GNN for temporal similarity networks

Experiments on social media

Discussion

Limitations and future research

Methods

Data

Financial markets data

Social platforms

Similar Posts