Automating error analysis for AI agents โ what works and doesn't
๐คGrammar Induction
Flag this post
Reversal Invariance in Autoregressive Language Models
arxiv.orgยท18h
๐คGrammar Induction
Flag this post
PROPEX-RAG: Enhanced GraphRAG using Prompt-Driven Prompt Execution
arxiv.orgยท18h
๐Information Retrieval
Flag this post
Inferring multiple helper Dafny assertions with LLMs
arxiv.orgยท18h
๐ฆRust Verification
Flag this post
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference
arxiv.orgยท18h
๐Learned Metrics
Flag this post
Automated Variant Prioritization via Multi-Modal Feature Fusion and Bayesian Network Inference
๐งฌCopy Number Variants
Flag this post
Leakage-abuse Attack Against Substring-SSE with Partially Known Dataset
arxiv.orgยท18h
๐Homomorphic Encryption
Flag this post
Confounding Factors in Relating Model Performance to Morphology
arxiv.orgยท18h
๐Document Grammar
Flag this post
Temporal Fusion Transformer for Multi-Horizon Probabilistic Forecasting of Weekly Retail Sales
arxiv.orgยท18h
๐Time Series
Flag this post
Ariadne: A Controllable Framework for Probing and Extending VLM Reasoning Boundaries
arxiv.orgยท18h
๐ฏProof Tactics
Flag this post
R3GAN-based Optimal Strategy for Augmenting Small Medical Dataset
arxiv.orgยท1d
๐Vector Dimensionality
Flag this post
Spatial Reasoning Unleashed: Causal Language Models for Smarter Spatial Data
๐ถVoronoi Diagrams
Flag this post
Bayesian Natural Gradient Fine-Tuning of CLIP Models via Kalman Filtering
arxiv.orgยท18h
๐Quantization
Flag this post
Incremental Selection of Most-Filtering Conjectures and Proofs of the Selected Conjectures
arxiv.orgยท18h
๐ฏPerformance Proofs
Flag this post
The Riddle of Reflection: Evaluating Reasoning and Self-Awareness in Multilingual LLMs using Indian Riddles
arxiv.orgยท18h
๐ปLocal LLMs
Flag this post
Adversarial Spatio-Temporal Attention Networks for Epileptic Seizure Forecasting
arxiv.orgยท18h
๐Time Series
Flag this post
Terrain-Enhanced Resolution-aware Refinement Attention for Off-Road Segmentation
arxiv.orgยท18h
๐Learned Metrics
Flag this post
ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation
arxiv.orgยท18h
๐Learned Metrics
Flag this post
Loading...Loading more...