WhatsApp, metadata and privacy: when the problem is not the content but the context

In recent months, an issue that is by no means marginal for those involved in data protection and security has returned to the center of the debate: the possibility of deducing technical and personal information about WhatsApp users, even without any direct interaction with their messages.

Two independent studies have revealed systemic vulnerabilities in the metadata management of the world’s most popular messaging platform, with over 3 billion users. The key point is not “reading messages” – which remain protected by end-to-end encryption – but understanding that content confidentiality does not exhaust privacy protection. Often, what really makes the difference is context.

As we have already highlighted in our article “[Conscious digital communication that respects privacy and the …

As we have already highlighted in our article “Conscious digital communication that respects privacy and the apps or services you choose,” the question “which app is more secure?” is incorrectly posed: security is not the only criterion for an informed choice.

Key points

E2EE ≠ protected metadata: end-to-end encryption protects the content, but not who communicates with whom, when, and from which device
Enumeration = correlation between number ↔ identity/profile: 3.5 billion WhatsApp records associated with phone numbers, profile photos, and “About” text
Fingerprinting = more effective targeting: inferring the operating system and device type facilitates target selection and exploit choice

Two studies, one common problem

The University of Vienna study: 3.5 billion accounts enumerated

In November 2025, a team of researchers from the University of Vienna and SBA Research published a study documenting a vulnerability in WhatsApp’s contact discovery mechanism. By exploiting the absence of effective rate limiting and using a reverse-engineered client called whatsmeow, the researchers were able to query over 100 million phone numbers per hour via high-frequency automated calls to the platform’s APIs.

The exposed data concerned approximately 3.5 billion registered accounts associated with phone numbers. It is important to note that this count may also include inactive accounts or recycled numbers, as highlighted in subsequent analyses. The researchers were able to access phone numbers, timestamps, “About” field text, profile photos, and public keys for E2EE encryption.

According to the study’s authors, if this dataset had been collected and published by malicious actors rather than in a responsible research context, it could have been one of the most significant data leaks ever observed.

The data on profile photos is particularly significant: over 57% of the accounts listed had a publicly visible image, and two-thirds of these contained recognizable human faces. The researchers highlighted how this data could be used to build a reverse phonebook service based on facial recognition techniques (AI/ML).

Gabriel Gegenhuber, a researcher at the University of Vienna and lead author of the study, noted that typically a system should not respond to such a high number of requests in such a short time, especially when they come from a single source.

Meta implemented stricter rate limiting on WhatsApp’s web client starting in October 2025, after researchers reported the issue through the bug bounty program in April of that year.

Tal Be’ery’s research: device fingerprinting

In January 2026, SecurityWeek reported the results of research conducted by Tal Be’ery, co-founder and CTO of the Zengo cryptocurrency wallet. Be’ery demonstrated how the predictable values of cryptographic key IDs assigned by WhatsApp allow technical information about the user’s device to be inferred.

What was demonstrated

Specifically, an attacker can deduce the user’s primary device, the operating system of each connected device (differentiating between iOS and Android), the approximate age of the devices or active sessions, and whether WhatsApp is running via a mobile app or a desktop web browser. This fingerprinting technique can be performed without generating any notifications on the target device.

Meta has started assigning random values to key IDs, initially for Android devices. Be’ery was rewarded through Meta’s bug bounty program, which in 2025 distributed over $4 million for nearly 800 valid reports, bringing the historical total to over $25 million.

Plausible risk scenarios

Based on what has been demonstrated, concrete risk scenarios emerge: knowledge of the operating system and device type can significantly facilitate targeting and reconnaissance activities, reducing the search space for targeted exploits.

WhatsApp zero-days are rare and highly valuable: Rewards of up to $1 million have been publicly announced for 0-click chains on WhatsApp in research contexts and security competitions (e.g., Pwn2Own). In January 2025, WhatsApp stated that approximately 90 journalists and civil society members were targeted with Paragon/Graphite spyware in a “zero-click” attack. The ability to pre-select targets based on the operating system can significantly increase the effectiveness of such attacks.

Definition and characteristics

Metadata is commonly defined as “data about data”: information that describes, contextualizes, or characterizes other data without constituting its main content. In the context of digital communications, metadata includes elements such as sender and recipient, send and receive timestamps, type of device used, geographic location (if available), frequency and duration of communications, technical information about the client and operating system, public cryptographic keys, and session identifiers.

Unlike the content of a message, metadata is often generated automatically and transmitted in clear text or with less protection than the message payload. The communication system must function.

Why metadata is personal data

Legally, the framework is unequivocal: under Article 4 of the GDPR, personal data is any information that makes a natural person identifiable. Metadata fully meets this definition when it allows, directly or indirectly, to trace back to a specific individual.

Metadata is not neutral; it is not “anonymous” by definition, and it becomes particularly invasive when aggregated, correlated, or used to infer additional attributes. Case law and doctrine have progressively recognized that communications metadata can reveal as much—if not more—than the content itself.

Some concrete examples: the frequency of communications between two people can reveal the nature of their relationship; online activity times can indicate habits, time zones, or emotional states; the combination of device, operating system, and usage patterns can create a unique digital fingerprint; presence in specific group chats can reveal political, religious, or sexual affiliations.

Metadata, profile content, and special categories of data

Research by the University of Vienna has highlighted a particularly relevant aspect: the text entered in the “About” field can reveal sensitive information such as sexual orientation, political opinions, religious affiliation, or substance use.

It is important to note that in these cases, it is not so much the “technical” metadata that falls within the scope of Article 9 of the GDPR, but rather the fact that specific profile fields (such as the “About” text) and the inferences derived from the correlation of data may reveal special categories of personal data subject to enhanced protection.

The researchers found that about 29% of accounts had text in their profiles, and in many cases, this text contained sensitive information. In countries with authoritarian regimes or repressive legislation, the exposure of such information can have consequences that go far beyond the digital sphere.

The encryption paradox

End-to-end encryption effectively protects the content of communications. Still, it creates a paradox: while the message is unreadable, everything surrounding it—who communicates with whom, when, how often, from which device—remains potentially exposed.

As Aljosha Judmayer of the University of Vienna summarized, end-to-end encryption protects the content of messages, but not necessarily the associated metadata. Research shows that privacy risks can arise when such metadata is collected and analyzed at scale.

Specific risks in using WhatsApp

In our previous article, “Persisting in the use of WhatsApp: how to unknowingly persevere. The reasons for our ‘No’,” we already analyzed the reasons why we believe WhatsApp is a solution to be avoided. The research discussed here confirms and amplifies those concerns.

Risks to individual security

The exposure of WhatsApp metadata poses real risks to user security.

About targeting attacks, metadata enables the reconnaissance phase typical of advanced attacks. Knowing or inferring details about the technical context allows for selecting more “attractive” targets, choosing compatible exploits (differentiating between iOS/Android), reducing failed attempts, and increasing the likelihood of success for techniques such as social engineering, spear phishing, or exploit chains. The availability of large-scale metadata can also fuel automated analysis and AI/ML pipelines for OSINT, clustering, and profiling, reducing target-selection costs and increasing the effectiveness of targeted campaigns.

An additional risk concerns facial recognition: the possibility of associating profile photos with phone numbers on a billion-scale opens up disturbing scenarios. Researchers have explicitly warned that in the hands of a malicious actor, this data could be used to build a facial recognition-based lookup service—effectively a “reverse phone book”—where individuals and their phone numbers and available metadata can be queried based on their faces.

Risks for exposed categories

For those working in regulated or high-exposure contexts – professionals, institutions, journalists, activists, sensitive roles – the impact can be amplified, because the correlation of metadata becomes an element of physical and personal risk, not just digital.

The research identified millions of accounts registered in countries where WhatsApp is officially banned: 2.3 million in China, 1.6 million in Myanmar, and over 59 million in Iran (before the ban was lifted at the end of 2024). In these contexts, simply using WhatsApp can expose users to legal or persecutory consequences.

Structural risks of the centralized model

WhatsApp operates on a centralized architecture where Meta controls the entire infrastructure: servers, protocol, clients, and—most importantly—metadata. This model has inherent critical issues.

First, there is a single point of failure: a vulnerability in the central system potentially exposes all 3 billion users. Research from the University of Vienna has demonstrated this empirically.

Second, there is a problem of protocol opacity. As the researchers noted, WhatsApp has inherited a responsibility comparable to that of a public telecommunications infrastructure or an Internet standard. However, unlike Internet protocols governed by public RFCs and maintained through collaborative standards, this platform does not offer the same level of transparency or verifiability to facilitate third-party oversight.

Finally, users have limited control over how their data (including metadata) is collected, stored, and used. Privacy settings are limited, and unilateral changes to the terms of service are frequent.

What users can do today

While waiting to migrate to alternative solutions, WhatsApp users can take some immediate steps to reduce their exposure:

Limit the visibility of your profile photo: set visibility to “My contacts” or ‘Nobody’ instead of “Everyone.”
Reduce the information in the “About” field: avoid entering personal, political, religious, or sensitive information.
Avoid photos of your face for exposed roles: journalists, activists, and professionals in sensitive contexts should consider using non-identifiable images
Disable contact synchronization: This reduces the amount of data shared with the platform
Regularly check connected devices: remove web or desktop sessions that are no longer in use
Consider alternatives for sensitive communications: use Signal, Matrix, or XMPP for conversations that require greater protection

Secure and open source alternatives: XMPP and Matrix

Given the risks highlighted above, it is advisable to consider alternatives based on open, decentralized, and verifiable protocols. As we have already illustrated in several previous articles, two solutions stand out as particularly mature: XMPP and Matrix.

XMPP: the open standard for messaging

XMPP (Extensible Messaging and Presence Protocol), originally known as Jabber, is an open communication protocol developed since 1999 and standardized by the IETF in 2004. Its specifications are published as RFC 6120, RFC 6121, and RFC 7622. We discussed it in depth in the article “XMPP: the protocol for secure communication that respects privacy”.

The main features of XMPP include decentralization, with an architecture similar to email, where anyone can manage their own XMPP server, ensuring complete control over their data and metadata. The protocol is based on open standards: the specifications are public, documented, and maintained by the XMPP Standards Foundation through a transparent process. The extensibility of the system allows features to be added through XMPP Extension Protocols (XEP), while maintaining backward compatibility. In terms of security, XMPP supports TLS for transport encryption and SASL for authentication. At the same time, the OMEMO extension (XEP-0384) provides end-to-end encryption based on the Double Ratchet Algorithm (the same as Signal). Federation allows different XMPP servers to communicate with each other, allowing users to choose their provider while maintaining interoperability.

Modern XMPP clients include Conversations for Android, Dino for Linux/desktop, Gajim for Windows/Linux/macOS, Monal for iOS and macOS, and Profanity for terminal.

In our contribution “XMPP is a solution that allows users to have control over their personal information: our choice Snikket,” we described our choice of Snikket, an XMPP-based project that we installed on our servers. As further explored in “To be or not to be dependent on instant messaging apps: that is the question. Choose to be free: Snikket,” Snikket is a concrete example of how it is possible to implement a messaging system that respects the principles of data minimization and user control.

The privacy benefits of XMPP are significant: the protocol does not require a phone number, metadata minimization is built into the protocol (each server manages only its own users’ metadata), and the possibility of self-hosting eliminates dependence on third parties.

Matrix: the protocol for federated communication

Matrix is an open protocol for real-time communication, developed since 2014 and maintained by the Matrix.org Foundation, a non-profit organization. In December 2025, the Matrix 1.17 specification was released, with Matrix 2.0 expected in 2026.

Matrix offers native federation: each organization or individual can manage their own homeserver, which synchronizes with other servers in the federated network. Unlike XMPP, Matrix keeps the conversation history replicated between the servers participating in each room, with modalities that depend on specific configurations.

End-to-end encryption is guaranteed through the Olm and Megolm protocols, which are independently audited implementations of the Double Ratchet (Least Authority in 2022, NCC Group previously). The vodozemac reference implementation is written in Rust and has replaced the previous libolm.

The protocol is interoperable: Matrix supports bridges to other platforms (IRC, Slack, Discord, XMPP, Signal), allowing for gradual migration. In our contribution “How to create a script to update some bridges and plugins for Matrix,” we described how to manage bridges for Telegram and Signal on our Matrix server.

Matrix is also verifiable: all code is open source, specifications are public, and the development process takes place through transparent Matrix Spec Change (MSC) proposals.

Matrix is adopted across European and national institutional contexts, with growing digital sovereignty initiatives and government implementations, including the ZenDiS project in Germany (openDesk). In our contribution “Digital Markets Act (DMA) and interoperability: encryption and privacy at risk for messaging systems?,” we analyzed how Matrix can ensure interoperability while maintaining total security and privacy.

The main Matrix clients include Element (desktop, mobile, web), Element X (new generation, mobile), FluffyChat, Cinny, and Nheko (desktop).

Comparison of alternatives

Feature	WhatsApp	XMPP	Matrix
Architecture	Centralized	Federated	Federated
Protocol	Proprietary	Open (IETF)	Open (Matrix.org)
Self-hosting	No	Yes	Yes
E2EE	Yes (Signal Protocol)	Yes (OMEMO)	Yes (Olm/Megolm)
Requires phone	Yes	No	No
Metadata control	None (user has no control)	Provider-level; Full (self-host)	Provider-level; Full (self-host)*
Public audits / third-party verification	Partial (protocol); limited metadata transparency	Public specifications + open source implementations	Public specifications + open source implementations
Interoperability	No	Yes (gateways)	Yes (bridges)

* In Matrix, metadata control is subject to federation and room configuration policies.

In the WhatsApp / Meta model, the user has no technical or organizational control over metadata: the entire processing is centralized and entirely governed by the service provider.

In contrast, XMPP and Matrix allow users effective control over their personal data even when a provider hosts the account, as the federated model facilitates the exercise of data subject rights and freedom of service choice, in line with the principle expressed in Recital 7 of the GDPR.

Complete control at the infrastructural level, including system metadata, is, however, only achieved through self-hosting, when the communicating party directly controls the technical infrastructure.

Practical recommendations

For those who wish to migrate to more privacy-friendly solutions, here are some practical guidelines.

For individual users, it is advisable to consider a Matrix account on a reliable public homeserver (matrix.org, tchncs.de, nitro.chat) or an XMPP account on a reputable server (conversations.im, jabber.de, disroot.org). The next step is to install a modern client (Element X for Matrix, Conversations for XMPP on Android) and enable end-to-end encryption and device verification.

For organizations, it is advisable to consider self-hosting a Matrix homeserver (Synapse, Dendrite) or an XMPP server (ejabberd, Prosody), defining metadata retention policies consistent with the principle of minimization, and training users on key verification and security best practices.

For high-exposure contexts (journalists, activists, whistleblowers), it is essential to use dedicated, segregated servers, implement Tor or a VPN to mask network metadata, and consider additional protocols. In this regard, in our article “SimpleX Chat: an instant messaging app that respects privacy,” we described another exciting alternative for high-risk contexts.

The vulnerabilities that have emerged refer to three pillars of the GDPR.

The principle of data minimization (Art. 5, para. 1, letter c) establishes that even metadata should be collected and processed only to the extent strictly necessary for the purposes. Research by the University of Vienna has shown that many users publicly expose data that is not necessary for the service’s basic functionality—but the primary responsibility lies with the provider, which does not implement privacy by default.

Data protection by design and by default (Art. 25) implies that privacy is not an accessory but must be incorporated into technical choices. The fact that WhatsApp’s default settings allow profile photos and “About” text to be publicly visible raises questions about compliance with this principle.

Finally, security of processing (Art. 32) and the risk-based approach require that it is not enough to “not have been hacked.” It is necessary to demonstrate the adoption of adequate measures against realistic threats, including those that pass through the collection of indirect signals.

From an accountability perspective, it took Meta almost a year to provide a meaningful response to numerous reports from researchers. Only after receiving a preprint of the paper and a notification of the intention to publish did the company request a conference call and request a postponement of publication.

Conclusion: Rethinking Digital Communication

The lesson that emerges from this research is simple but often overlooked: encryption protects content, but privacy protects the person, and therefore also the context.

Metadata is an integral part of processing and, under certain conditions, becomes the most sensitive layer: not because it “tells” a message, but because it makes the subject more exposed and, at times, more predictable.

In an ecosystem where automation scales collection and AI/ML improves the efficiency of metadata correlation and classification, defending the context has become as important as protecting the content. The two studies analyzed here show that even mature and widely used systems can contain design or implementation flaws with concrete consequences.

The availability of open source, federated, and verifiable alternatives such as XMPP and Matrix now offers a concrete way out of the centralized model. These are not perfect solutions—every system has its own challenges—but they are architectures that give users and organizations back control over their data and metadata.

The choice is no longer just technical: it is an informed decision about the level of autonomy and sovereignty you want to maintain over your digital communications.

Sources:

University of Vienna / SBA Research: “Hey There! I Am Using WhatsApp” - Pre-print paper - https://github.com/AnotherOctopus/whatsapp-research
Meta Bug Bounty: “Celebrating 15 years of Meta’s Bug Bounty Program” (November 2025) - https://bugbounty.meta.com/blog/15th-anniversary-2025/
SecurityWeek: “Researcher Spotlights WhatsApp Metadata Leak as Meta Begins Rolling Out Fixes” (January 5, 2026) - https://www.securityweek.com/researcher-spotlights-whatsapp-metadata-leak-as-meta-begins-rolling-out-fixes/
The Register: “3.5B WhatsApp users’ info scooped through enumeration flaw” (November 19, 2025) - https://www.theregister.com/2025/11/19/whatsapp_enumeration_flaw/
Malwarebytes: “WhatsApp closes loophole that let researchers collect data on 3.5B accounts” (November 25, 2025) - https://www.malwarebytes.com/blog/news/2025/11/whatsapp-closes-loophole-that-let-researchers-collect-data-on-3-5b-accounts
Matrix.org: “Matrix v1.17 specification released” (December 18, 2025) - https://matrix.org/blog/2025/12/18/matrix-v1.17-release/
XMPP Standards Foundation: “Technology Overview” - https://xmpp.org/about/technology-overview/
Regulation (EU) 2016/679 (GDPR) - https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX%3A32016R0679

Related articles on this blog:

Related Hashtag

#WhatsApp #Privacy #Metadata #GDPR #Cybersecurity #DataProtection #E2EE #Meta #BugBounty #Fingerprinting #PrivacyByDesign #DataMinimization #InfoSec #DigitalRights #XMPP #Matrix #OpenSource #Decentralization #Federation #OMEMO #DigitalSovereignty #SecureMessaging #OpenProtocols #SelfHosting #PrivacyFirst #TechPolicy