Fortytwo, a Silicon Valley startup, was founded last year based on the idea that a decentralized swarm of small AI models running on personal computers offers scaling and cost advantages over centralized AI services.
On Friday, the company published benchmark results claiming that its swarm inference scheme outperformed OpenAI’s GPT-5, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.1, and DeepSeek R1 on reasoning tests, specifically GPQA Diamond, MATH-500, AIME 2024, and LiveCodeBench.
The advantage of swarm inference, the company says, is that frontier AI mode…
Fortytwo, a Silicon Valley startup, was founded last year based on the idea that a decentralized swarm of small AI models running on personal computers offers scaling and cost advantages over centralized AI services.
On Friday, the company published benchmark results claiming that its swarm inference scheme outperformed OpenAI’s GPT-5, Google Gemini 2.5 Pro, Anthropic Claude Opus 4.1, and DeepSeek R1 on reasoning tests, specifically GPQA Diamond, MATH-500, AIME 2024, and LiveCodeBench.
The advantage of swarm inference, the company says, is that frontier AI models often become less accurate when “reasoning” – the process by which models solve complex problems by breaking them into a series of smaller steps. One explanation for this is that large models may get stuck in reasoning loops.
Swarm inference supposedly helps avoid this problem by considering responses from multiple smaller models and ranking them by quality to obtain a better answer. Also, it’s supposedly more affordable because it runs on distributed consumer hardware instead of in billion-dollar datacenters.
“Inference through the swarm is up to three times cheaper than frontier reasoning models from OpenAI and Anthropic on a per-token basis,” Ivan Nikitin, co-founder and CEO, told The Register in an email. “Actual cost depends on task complexity.”
Nikitin told The Register in a phone interview that he and his co-founders turned to decentralization not for the sake of novelty, but to address a practical issue: the shortage of centralized computing resources.
During AI projects they worked on in recent years, Nikitin said, they kept running up against usage rate limits, an issue that’s becoming more acute with the increasing popularity of coding models. Developers using coding AI, one of the first markets where LLMs have demonstrated value, can’t make enough requests to these models to meet their professional needs, he said.
“So understanding that right now the centralized AI industry is racing towards multi-billion [dollar] contracts to build new datacenters, nuclear power plants to power them, and so forth, we don’t find that approach sustainable because no matter how many datacenters you build, there’s always going to be more demand,” said Nikitin. “Multistep reasoning is going to demand more. You’re always going to need more and more compute and power to be able to provide value to your customers.”
Nikitin said he and his co-founders realized that people are sitting on vast amounts of latent computing power with home desktop systems that are vastly overpowered for most daily needs. Also, he said, AI technology improvements have shown that small, specialized models can outperform costly frontier models in domain-specific tasks.
“So we thought, how about we unite those two factors and create a network where we can deploy specialized models, but allow them to work together, amplifying each other’s capabilities,” said Nikitin. “So, the network itself becomes a model.”
Nikitin and co-founders Vladyslav Larin and Alexander Firsov outlined their approach in a preprint paper titled “Self-Supervised Inference of Agents in Trustless Environments,” released through ArXiv last year.
“Fortytwo doesn’t rely on a single model,” explained Nikitin. “The network connects many Small Language Models (SLMs) of different types, including open-source models like Qwen3-Coder, Gemma3, and Fortytwo’s own specialized models such as Strand-Rust-Coder-14B. Each node operates as a black box: node operators can run any privately built or downloaded model without revealing it to the network. Only the inferences, not the model weights or data, are shared.”
The main disadvantage is latency.
“Fortytwo optimizes for quality rather than raw speed,” said Nikitin. “It’s better compared to the ‘Thinking’ or ‘Deep Research’ modes found in popular LLM chat applications, where additional processing time yields more accurate reasoning. The networking and Swarm Inference process adds roughly 10–15 seconds of latency for base scenarios, as multiple nodes collaborate and peer-rank their outputs before consensus.”
Privacy is also an issue, though perhaps less so than it is with large AI companies that centralize data gathering and may also have an interest in ad-oriented data collection. On Fortytwo’s decentralized network, a technically knowledgeable node operator could potentially view prompts and responses for a locally running model – at some point, the model needs to see clear text. But this would be a smaller amount of data than would be available to, say, Anthropic, Google, or OpenAI, which as aggregators of prompts and personal data are more obvious privacy solvents for authorities.
Nikitin said Fortytwo is exploring adding noise data to prompts to improve privacy, and also noted that the biz has partnered with Acurast, a decentralized compute network for mobile phones. Phones, he said, have stronger Trusted Execution Environments than desktop hardware, so that might provide a path to implement private inference.
“It’s not going to be fast,” he said. “It’s going to be suitable for deep research tasks where you can wait twenty minutes for the response, but at least you’ll get even better privacy guarantees compared to what centralized AI can give you.”
Share your PC, get some crypto
Nikitin said the company’s vision involves building an open community that gives machine learning engineers and data scientists a way to contribute to cutting-edge AI without having to land $100 million job offers from Meta. The idea is that these individuals will have the opportunity to create specialized models that excel in a particular domain and get rewarded for doing so.
Once the project enters its commercial phase, participants in Fortytwo’s network will be able operate nodes (computers) running a local AI model in exchange for potential compensation in crypto. For API usage, customers will pay the relevant service provider in the appropriate fiat currency. The service provider will pay the Fortytwo network, which will allocate some portion of funds in Fortytwo Network FOR tokens to node operators whose models serve the inference requests.
“Crypto becomes essential for us so that we can create a system without gatekeeping,” said Nikitin, pointing to the political pressures that have separated American AI from Chinese AI from AI initiatives elsewhere in the world. “It wouldn’t be possible without the crypto element to it. We need the blockchain because we need to hold reputation somewhere. We keep the reputation of individual participants fully decentralized so that even if we cease to exist as a company, the network can still continue to operate. So that is the reason why it’s running on crypto rails.”
- AI blew open software security, now OpenAI wants to fix it with an agent called Aardvark
- Meta to sell $30B in bonds to build AI datacenters
- Datacenter biz and nuke startup join forces for Texas AI ranch
- Amazon juggernaut continues hauling in more cash despite recent bad news
Not everyone gets paid, Nikitin said. Inference rounds involve multiple contributions and ranking nodes. The highest peer-scored half of the participating nodes gets a reward for each round, as well as earning reputation points for participating. Nodes that fail to provide relevant, accurate responses will lose reputation, an incentive for node operators to recalibrate or update the model being run.
“Queries come from users and developers who access Fortytwo through API endpoints,” Nikitin explained. “Inference requests are broadcast across the peer-to-peer network, where listening nodes determine whether they have the expertise to contribute. Qualified nodes then self-organize into a subswarm to process the request collaboratively through swarm inference.”
Presently, daily participation in the network through the company’sDevnet Program ragnes from about 200 to 800 comuputers, as can be seen from the network dashboard. The dashboard indicates that the company has distributed more than 145 million FOR tokens, exact fiat value will be determined by the fair market price after the token’s public launch. The network is presently operating on theMonad Testnet, in which assets have no redeemable value.
Nikitin said that it’s still too early to tell what someone might be able to earn participating in this network, but he said the goal is to do a bit better than VastAI, a service that pays participants for access to their GPU.
“In our case, nodes are running in the background,” he said. “So nobody needs to give up their entire machine. They can continue sitting on Google Meet calls with a node running in the background. But their earning is going to be about 10 percent more compared to platforms like VastAI. So it’s definitely going to cover the cost of electricity and give some meaningful passive income from the contributions.”
For someone running a unique, specialized model, he said – one that excels in CT scan analysis, for example – the node operator could get $120 per day. That’s based on simulations run last year, with certain assumptions about the size of the network, but he said he believes the numbers are still realistic.
“Fortytwo can serve as an inference backend for reasoning, coding, medical, deep research, and other tasks demanding high accuracy. API can be integrated into mobile or web applications just like any conventional AI service (OpenAI, Anthropic, Google, Grok, OpenRouter, etc).”
Running a node isn’t supposed to be too demanding on a network participant’s computer, either. Nodes are designed to run in the background and use idle compute without interfering with the user’s daily workload.
“We’ve implemented a dynamic load-balancing system that ensures a node is most active during light tasks such as video calls, web browsing, or working on spreadsheets, but automatically reduces or pauses inference processing when the user performs heavy operations like 4K video editing,” said Nikitin.
“Our goal is to start a grassroots movement where people all over the world, PhD students, people who are just excited about AI and are starting to learn about it, where they can start doing their own functions, their own models, and plugging them into the network,” he said. ®