MarkGao's Feed

The Origins of Stochasticity: Comprehensive Investigations on Uncertainty Quantification for Large Language Models

Recent advancements in Large Language Models (LLMs) have enabled sophisticated reasoning and content generation, yet their inherent stochasticity poses significant challenges for ensuring predictive credibility. While traditional uncertainty taxonomy paradigms, such as the dichotomy of aleatoric and epistemic uncertainties, provide conceptual foundations, they often fail to capture the multi-component and multi-stage nature of LLM generation and... Read more ›

🤖AI Agents Product Hunt

Rosply

AI agent that controls your computer autonomously Discussion \| Link Read more ›

Covered by 独立开发者的灵感周刊 - Decohack

🎼Agent Orchestration arXiv·

SPARC: A Multi-Agent System for Electrical Circuit Question Answering

Electrical circuit diagram QA tasks require complex mathematical reasoning, which remains challenging for multimodal LLMs. We present SPARC, a multi-agent system that answers questions over circuit diagrams by grounding reasoning in executable physics-based simulations. SPARC uses LLM agents to synthesize, execute, and analyze simulation programs, improving accuracy and reliability by design. It achieves 83% accuracy, with up to a 58% absolute i... Read more ›

🛠️Developer Tools GitHub·

Show HN: Shumai – open-source Frame.io alternative for creative work

Open-source platform for all your creative work. Contribute to shumaiOne/shumai development by creating an account on GitHub. Read more ›

Covers Get Docker

Discussed on Hacker News

📈Tech Trends Nature

Genetic technologies to enhance crop nutritional value under climate change

At present, more than 700 million people live with caloric hunger, and more than two billion suffer from micronutrient deficiencies, known as ‘hidden hunger’. From an agricultural viewpoint, three major objectives need to be worked towards simultaneously to achieve zero hunger (the United Nations Sustainable Development Goal 2): (1) enhanced yield; (2) higher vitamin and mineral density to sustain recommended daily intake (multi-biofortification); and (3) enhanced climate-change resilience. A... Read more ›

Covered by Phys.org

🤖Multi-Agent Systems arXiv·

BioInsight: Multi-Agent Orchestration for Interactive Biomedical Knowledge Discovery

Biomedical researchers increasingly use AI-generated analyses and reports to interpret protein-level signals, but static outputs are often insufficient for research decision-making, where users need to inspect evidence, assess uncertainty, compare mechanisms, and refine hypotheses. We present \textsc{BioInsight}, a multi-agent system that moves from static biomedical report generation to interactive evidence-centered interactive interface genera... Read more ›

🔬Neurotech arXiv·

BrainAgent: A Large Language Model-Driven Multi-Agent Framework for Autonomous Brain Signal Understanding

Brain-Computer Interfaces (BCIs) and brain signal understanding are pivotal for clinical health and next-generation interactions. Despite this significance, its widespread adoption in real-world scenarios remains restricted, primarily because current analytical paradigms lack sufficient agentic intelligence. First, existing methodologies impose prohibitive technical barriers, requiring extensive specialized expertise. Second, they remain inheren... Read more ›

🤖Agentic Engineering arXiv·

Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention

Runtime oversight for LLM agents is commonly framed as scalar risk prediction: estimate failure likelihood, confidence, or uncertainty, then intervene once the score crosses a threshold. We argue that this framing targets the wrong object for control. The relevant question is not how likely the agent is to fail if it continues, but whether an available intervention would improve the outcome. Two trajectory prefixes can have the same risk estimat... Read more ›

🚀Startups TechCrunch·

4 days left to save up to $190 on TechCrunch Founder Summit 2026

Four days left to save up to $190 on your pass to TechCrunch Founder Summit 2026 - the ultimate founder bootcamp - before Early Bird rates end on June 26 at 11:59 p.m. PT. Register here. Read more ›

🤖claude code prototyper.co·

Show HN: Visual Workspace for Agents Based on Unix

Prototyper is the first visual workspace for your agents and your team. Give Claude Code, Codex, Cursor, and other agents a shared canvas for plans, apps, and diagrams. Read more ›

Discussed on Hacker News

🤖artificial intelligence arXiv·

On the Expressive Power of Weight Quantization in Large Language Models

In recent years, weight quantization that encodes the learnable parameters of large language models in an $n$-bit format has garnered significant attention due to its potential for model compression and inference acceleration. Many practical techniques have been developed; however, the theoretical understanding of many aspects, especially the approximation and degradation of expressive power as the number of quantization bits decreases, remain... Read more ›

🦀openclaw arXiv·

Local LLM Agents as Vulnerable Runtimes:A Source-Code Audit of the Agent Runtime Layer

Local LLM agents such as OpenClaw and Nanobot run on end-user machines and act on host resources - the shell, filesystem, browser, stored credentials, and messaging applications - through natural-language goals. These agents have become privileged software runtimes that mediate between user intent, model outputs, and host-level actions. Existing research characterizes the landscape through prompt injection, malicious skills, marketplace risks,... Read more ›

⚛️Quantum Computing arXiv·

Fine-Tuning Large Language Models for Quantum Reasoning

Large language models (LLMs) exhibit abilities beyond natural language modelling and text generation. Recent advances in their reasoning capabilities have spurred interest in applying LLMs to complex scientific tasks requiring deep domain expertise and sophisticated reasoning. Quantum computing, as a highly specialised field with significant knowledge barriers and hardware constraints, could greatly benefit from such advancements. However, a k... Read more ›

🧠LLMs arXiv·

Context Recycling for Long-Horizon LLM Inference

Large language models (LLMs) exhibit strong capabilities in short-context reasoning but degrade in performance over long conversational horizons due to context window limitations and inefficient token usage. We introduce ContextForge, a system for context recycling that maintains task-relevant information across turns by combining structured query generation, external memory retrieval, and controlled synthesis. The system enables efficient reu... Read more ›

🤖AI Agents Product Hunt

Sidegent

Learn to build AI agents by actually building them Discussion \| Link Read more ›

Covered by 独立开发者的灵感周刊 - Decohack

📈Tech Trends TechCrunch·

Databricks’ former AI chief thinks he can cut AI’s power bill by 1,000x

Un0 is an image-generation system tool that shows for the first time how the company's technology can replicate conventional AI systems. Read more ›

Covered by 3 sources including TNW | Artificial-Intelligence, easternherald.com

🤖Multi-Agent Systems arXiv·

ASALT: Adaptive State Alignment for Lateral Transfer in Multi-agent Reinforcement Learning

Multi-agent reinforcement learning (MARL) addresses the problem of training multiple agents that pursue collaborative, competitive, or mixed objectives. Prior work has investigated transfer learning between source and target domains in MARL; however, the majority of existing approaches impose the constraint that the dimensionalities of the observation space and the global state space must be identical across domains. In this paper, we introduce ... Read more ›

🛠️Developer Tools GitHub·

Show HN: Nub – A Bun-like all-in-one toolkit for Node.js

The fast all-in-one Node.js toolkit. Contribute to nubjs/nub development by creating an account on GitHub. Read more ›

Covers 5 stories including Open Source Vulnerabilities

Covered by tldr.tech

Discussed on Hacker News

🤖Agentic Engineering arXiv·

Hypothesis-Driven Skill Optimization for LLM Agents

External skills can improve action-oriented LLM agents without changing model weights, but persistent skill updates are risky when they are distilled from sparse or noisy trajectories. A plausible reflection may encode a useful procedure, a spurious shortcut, or a rule that the target executor cannot reliably follow. We propose Hypothesis-Driven Skill Optimization (HDSO), a train-free framework in which both the skill curator and the agent execu... Read more ›

🚀Startups TechCrunch·

3 days left to save up to $190 on your TechCrunch Founder Summit 2026 pass

You have just 3 days left to save up to $190 on your pass to TechCrunch Founder Summit 2026 before Early Bird rates end on June 26 at 11:59 p.m. PT. Register here. Read more ›