Last Week in AI #326 - Qualcomm AI Chips, MiniMax M2, Kimi K2 Thinking

Qualcomm unveiled two data center AI accelerators, the AI200 (shipping in 2026) and AI250 (planned for 2027), marking a major pivot from its mobile and wireless roots. Both will be offered in full, liquid‑cooled server racks, matching Nvidia and AMD’s rack‑scale systems that cluster up to 72 accelerators as a single computer for training and serving advanced AI models.

Built on Qualcomm’s Hexagon NPUs originally designed for smartphones, the chips reflect a strategy to scale on‑device AI architecture to data center workloads. Shares rose 11% on the news. The new entrants intensify competition with Nvidia and AMD in the high‑growth AI data center market, where multi‑accelerator, liquid‑cooled racks are standard.

[Alibaba-backed Moonshot releases its second AI update …

Alibaba-backed Moonshot releases its second AI update in four months as China’s AI race heats up

MiniMax introduced MiniMax‑M2, an open MIT‑licensed Mixture of Experts model tuned for coding and agentic workflows, with weights on Hugging Face. Although the model has 229B total parameters, it routes only ~10B active parameters per token to keep memory and tail latency low during plan–act–verify loops across tools like shell, browser, retrieval, and editors. M2 features “interleaved thinking,” emitting internal reasoning in ... blocks that must be preserved across turns; the team warns removing these harms multi‑step and tool‑use performance.

Another notable open-source release this week was Kimi K2 Thinking, an Mixture-of-Experts model with 1 trillion parameters and 32 billion active per inference. Built on the Kimi K2 base and optimized for reasoning and agentic abilities, it supports a 256k context window and uses INT4 quantization for efficiency, with a reported training cost of only $4.6 million. Notably, it can execute 200–300 tool calls autonomously, demonstrating advanced agent capabilities previously seen only in closed models. Despite lacking a full technical report, analyses show Kimi K2 inherits and refines DeepSeek’s architecture—expanding MoE experts and vocabulary while optimizing inference cost—representing both a direct evolution of DeepSeek’s design and a culmination of open-source innovations like FlashAttention and MuonClip.

Udio Says Users Can Download AI Songs for 48 Hours After Backlash to UMG Legal Settlement

Universal Music Group (UMG) settled its copyright lawsuit with AI‑music startup Udio and signed “industry‑first” licensing agreements to power a new AI music platform. The deal includes compensatory payments to UMG and new revenue opportunities for artists and songwriters, with opt‑in controls for different parts of the service. Udio plans to relaunch next year as a subscription platform that lets users customize, stream, and share music within a “walled garden,” strengthened by audio fingerprinting; pricing remains undisclosed.

Udio’s current text‑to‑music generator (known for “BBL Drizzy”) will remain available during the transition, but distribution will be restricted under the new model. Backlash to download restrictions after the UMG deal prompted Udio to reopen downloads for a 48‑hour window starting Nov. 3 so users can export existing songs under prior terms, including commercial rights and creator ownership (with attribution required for free‑tier users).

The settlement ends UMG’s claims that Udio trained on copyrighted catalogs “en masse” and sets up a licensed, closed ecosystem for future AI music creation and monetization.

Microsoft to ship 60,000 Nvidia AI chips to UAE under US-approved deal

Microsoft will spend $15.2 billion in the UAE over four years, backed by U.S.-approved exports of Nvidia’s most advanced AI chips and a large cloud buildout. The Commerce Department issued licenses beginning in September with strict cybersecurity and national security safeguards, enabling shipments of more than 60,000 Nvidia GPUs, including A100, H100, H200, and next‑gen GB300 Grace Blackwell chips.

Microsoft says it has already amassed the equivalent of 21,500 A100‑class GPUs in the UAE to serve models from OpenAI, Anthropic, open‑source providers, and its own stack. The outlay includes a $1.5 billion equity stake in G42, over $4.6 billion in data center capex through 2025, and a further $7.9 billion from 2026–2029, with $5.5 billion for ongoing AI and cloud expansion.

The deal turns the UAE into a test case for U.S. AI export‑control diplomacy and a regional anchor for American AI influence, despite criticism of potential back‑channel risks relative to China restrictions. It also appears to contradict President Trump’s televised comments that the most advanced Nvidia chips would not be exported, though the UAE licenses carry “gold standard” security conditions and tie into the UAE’s pledge to invest $1.4 trillion in U.S. energy and AI‑related projects.

TPU v7, Google’s answer to Nvidia’s Blackwell is nearly here. Google’s next‑gen TPU reportedly matches Nvidia’s latest chips in raw FP8 throughput and memory bandwidth while enabling much larger pod‑scale deployments via a 3D torus plus optical switching fabric that trades low‑hop switch topologies for extreme scalability.

Alibaba-backed Moonshot releases its second AI update in four months as China’s AI race heats up. The updated Kimi K2 Thinking reportedly cost about $4.6 million to train and can autonomously select hundreds of tools to complete tasks, aiming to reduce human intervention.

GitHub is launching a hub for multiple AI coding agents. Copilot subscribers will get a dashboard to run, manage, and compare multiple third‑party coding agents (including Codex, Claude, Jules, xAI, and Devin), alongside features like Plan Mode in VS Code and automated code‑review tooling.

Cursor 2.0 shifts to in-house AI with Composer model and parallel agents. The update swaps in Composer, an in‑house coding model optimized for codebase‑wide search and low latency, and adds an interface that runs up to eight isolated parallel agents with browser integration, sandboxed terminals, and enterprise controls.

Microsoft AI’s first in-house image generator MAI-Image-1 is now available. Microsoft says the model produces faster photorealistic and artistically lit images—especially of food, nature, and landscapes—and has been rolled into Bing Image Creator and Copilot Audio Expressions, with EU availability coming soon.

Google’s AI Mode gets new agentic capabilities to help book event tickets and beauty appointments. The feature can autonomously search multiple sites in real time to find and link you to tickets and beauty or wellness appointments that match specific preferences and constraints.

Canva launches its own design model, adds new AI features to the platform. A new generative design model creates editable, multi‑layered files across formats (from social posts to websites), powers an always‑available assistant that can be @mentioned in projects, and integrates spreadsheets, mini‑app widgets, Affinity tools, and ad analytics into Canva’s workflow.

Sora for Android saw nearly half a million installs on its first day. Appfigures estimates about 470,000 first‑day downloads across several markets (roughly 296,000 in the U.S.), far exceeding early iOS day‑one numbers after OpenAI expanded availability and dropped invites.

Instacart Debuts White-Label AI Shopping Chatbot in Enterprise Push. The assistant—tested on Sprouts’ site and available in Kroger’s iPhone app—provides product recommendations as part of Instacart’s push to sell white‑label e‑commerce AI tools to grocery chains.

Nvidia becomes first public company worth $5 trillion. Investors rallied on expectations of massive AI chip sales, new U.S. supercomputer deals, and strategic investments (including a $1B stake in Nokia and a pledge to invest up to $100B in OpenAI), pushing Nvidia’s stock up more than 50% this year and keeping its GPUs scarce and highly sought for data‑center AI workloads.

Coke’s New AI-Generated Ad Required 100 Staff and 70,000 AI-Generated Clips, and It Still Looks Like Garbage. Despite using over 70,000 AI‑generated clips and around 100 staff, the holiday spot largely avoids human faces, leans on uncanny animal‑filled hyperreal landscapes, and has been widely criticized for disjointed, low‑quality visuals.

Amazon launches AI infrastructure project, to power Anthropic’s Claude model The tech giant had started Project Rainier last year to build an AI compute cluster spread across multiple data centers in the U.S. The computer incorporates nearly half-a-million of Amazon’s in-house Trainium2 chips.

Lambda inks multibillion-dollar AI infrastructure deal with Microsoft. Microsoft will add tens of thousands of Nvidia GPUs, including GB300 NVL72 systems, to expand its AI compute capacity.

Apple Nears $1 Billion-a Year Deal to Use Google AI for Siri. Apple would pay roughly $1 billion annually for access to Google’s 1.2 trillion‑parameter Gemini model to power a planned Siri overhaul.

Google partners with Ambani’s Reliance to offer free AI Pro access to millions of Jio users in India. Eligible Jio subscribers get 18 months of free access to Google’s Gemini 2.5 Pro, expanded AI image/video and Notebook LM usage, 2 TB of cloud storage, and deeper Google Cloud TPU and Gemini Enterprise integration across Reliance’s businesses.

China’s Baidu says weekly robotaxi rides hit 250,000 — same as Alphabet’s Waymo this spring. Baidu reports its Apollo Go service delivered 250,000 fully driverless paid rides per week (totaling 17 million orders and 240 million kilometers) across multiple Chinese cities and several international markets.

Waymo’s robotaxis are coming to three new cities. The company will seek approvals in Nevada and Michigan before launching, plans to add China‑made Zeekr RTs with sixth‑generation driverless tech, and expects to start serving riders in those cities likely next year.

Driverless Tech Firm Pony AI Raises $863 Million in HK Listing. The offering included a 15% overallotment and drew interest from investors including talks of a roughly $100 million participation by Uber; proceeds target scaling Level 4 robotaxi/robotruck services and R&D as Pony AI aims for profitability by 2028–29.

Shopify says AI traffic is up 7x since January, AI-driven orders are up 11x. Partnerships with OpenAI, Perplexity, and Microsoft Copilot—plus internal tools like Scout—are driving rapid growth in AI‑driven traffic and orders by tapping merchant data to embed shopping into AI conversations and guide product decisions.

People Inc. forges AI licensing deal with Microsoft as Google traffic drops. As Google search traffic declines, People Inc. becomes a launch partner in Microsoft’s publisher content marketplace—a pay‑per‑use model where buyers like Copilot can directly compensate publishers for content, amid efforts to block AI crawlers to force licensing talks.

Inception raises $50 million to build diffusion models for code and text. The startup plans diffusion‑based models that refine outputs in parallel rather than sequentially—aiming for faster, more efficient code and large‑text tasks—with its new Mercury model funded by a $50M seed led by Menlo Ventures.

Amazon Sues to Stop Perplexity From Using AI Tool to Buy Stuff. Amazon alleges Perplexity’s Comet agent places orders on behalf of users without properly identifying itself, violating Amazon’s terms and prompting a federal lawsuit accusing the startup of computer fraud.

Scaling Latent Reasoning via Looped Language Models. The authors show that recursively reusing weight‑tied layers with an entropy‑regularized adaptive early‑exit mechanism yields 2–3× parameter‑efficiency gains at scale.

Beyond Multi-Token Prediction: Pretraining LLMs with Future Summaries. Training an auxiliary head to predict a compact learned summary of a long future window—rather than multiple independent future tokens—improves long‑range reasoning and yields up to ~5% gains on math and coding benchmarks at the 8B scale.

Supervised Reinforcement Learning: From Expert Trajectories to Step-wise Reasoning. This method trains models to produce an internal monologue and take discrete intermediate “actions,” providing a dense, similarity‑based reward at each step by comparing predicted actions to decomposed expert actions.

The End of Manual Decoding: Towards Truly End-to-End Language Models. Lightweight prediction heads augment transformers to dynamically set sampling parameters (like temperature and top‑p) at each step so the model controls decoding end‑to‑end, matching or exceeding expert‑tuned baselines and enabling natural‑language steering of sampling.

Continuous Autoregressive Language Models. The approach replaces discrete next‑token prediction with next‑vector prediction by compressing K tokens into continuous vectors via an autoencoder and using a likelihood‑free generative head with new evaluation/sampling methods to reduce autoregressive steps and compute.

Defeating the Training-Inference Mismatch via FP16. Switching mixed‑precision from BF16 to FP16 during RL fine‑tuning reduces numerical rounding errors between training and inference engines, removing the need for importance‑sampling fixes, improving stability, and narrowing the deployment gap.

Kimi Linear: An Expressive, Efficient Attention Architecture. By combining Kimi Delta Attention and Multi‑Head Latent Attention, this hybrid linear attention mechanism boosts efficiency and often matches or exceeds full attention on several tasks.

Remote Labor Index: Measuring AI Automation of Remote Work. This index evaluates AI agents on real‑world freelance projects—comparing AI outputs to human deliverables via manual Elo‑style pairwise judgments—to quantify how much of remote, computer‑based work current models can automate (currently about 2.5%).

arXiv Changes Rules After Getting Spammed With AI-Generated ‘Research’ Papers. arXiv says the change aims to curb low‑effort, AI‑generated submissions—mostly superficial reviews and position pieces lacking substantive discussion of open research problems—by banning such computer science review and position papers.

Studio Ghibli, Bandai Namco, Square Enix demand OpenAI stop using their content to train AI. The groups allege OpenAI used members’ copyrighted works as training data and in Sora 2 outputs without permission, and request the company stop using that content and formally address the copyright concerns.

Character.ai to ban teens from talking to its AI chatbots. Starting Nov. 25, under‑18s will be blocked from conversational chats and limited to generating non‑interactive content like videos amid lawsuits and scrutiny over harmful interactions and impersonation of real victims.

OpenAI Risks Billions as Court Weighs Privilege in Copyright Row. If plaintiffs gain access to internal Slack messages and attorney communications about OpenAI’s deletion of pirated book data, the company could face evidence‑spoliation sanctions and enhanced statutory damages that together may amount to billions.

Stability AI largely wins UK court battle against Getty Images over copyright and trademark. The ruling found Stability did not infringe Getty’s copyrights by training Stable Diffusion on scraped images, though the judge did find limited instances of trademark infringement where Getty watermarks appeared in generated images.

Xania Monet is the first AI-powered artist to debut on a Billboard airplay chart, but she likely won’t be the last. Her chart debut and multimillion‑dollar record deal spotlight growing commercial acceptance of AI‑created performers, even as musicians and industry figures voice ethical and labor concerns.

Jerome Powell says the AI hiring apocalypse is real: ‘Job creation is pretty close to zero’. He warned that firms are citing AI‑driven automation for layoffs and hiring freezes, leaving underlying job creation near zero despite ongoing GDP growth and heavy corporate AI investment.

The A.I.-Profits Drought and the Lessons of History. A Media Lab study and recent evidence suggest generative AI has boosted productivity mainly in narrow, customized use cases and personal “shadow” tools, while many firms face integration and sectoral limits that have constrained broad profit gains.

No posts

Similar Posts