A→Z
A2ZAI

AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

Reward-free Alignment for Conflicting Objectives

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. In particular, w...

Peter Chen, Xiaopeng Li, Xi Chen
Feb 2, 2026
arXivPDF

MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training

Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold...

Dulhan Jayalath, Oiwi Parker Jones
Feb 2, 2026
arXivPDF

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to optimize high-dimensional pixel manifolds that contain many perceptually irrelevant signals, leavin...

Zehong Ma, Ruihan Xu, Shiliang Zhang
Feb 2, 2026
arXivPDF

New explanations and inference for least angle regression

Efron et al. (2004) introduced least angle regression (LAR) as an algorithm for linear predictions, intended as an alternative to forward selection with connections to penalized regression. However, LAR has remained somewhat of a "black box," where some basic behavioral properties of LAR output are ...

Karl B. Gregory, Daniel J. Nordman
Feb 2, 2026
arXivPDF

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient...

Jialiang Zhu, Gongrui Zhang, Xiaolong Ma
Feb 2, 2026
arXivPDF

Expanding the Capabilities of Reinforcement Learning via Text Feedback

The success of RL for LLM post-training stems from an unreasonably uninformative source: a single bit of information per rollout as binary reward or preference label. At the other extreme, distillation offers dense supervision but requires demonstrations, which are costly and difficult to scale. We ...

Yuda Song, Lili Chen, Fahim Tajwar
Feb 2, 2026
arXivPDF

Flow Policy Gradients for Robot Control

Likelihood-based policy gradient methods are the dominant approach for training robot control policies from rewards. These methods rely on differentiable action likelihoods, which constrain policy outputs to simple distributions like Gaussians. In this work, we show how flow matching policy gradient...

Brent Yi, Hongsuk Choi, Himanshu Gaurav Singh
Feb 2, 2026
arXivPDF

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often proves insufficient, and its strictly sequential nature constrains test-time scalability. A potential alterna...

Xiao Liang, Zhong-Zhi Li, Zhenghao Lin
Feb 2, 2026
arXivPDF

AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured A...

Shraddha Barke, Arnav Goyal, Alind Khare
Feb 2, 2026
arXivPDF

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long...

Haozhen Zhang, Quanyu Long, Jianzhu Bao
Feb 2, 2026
arXivPDF

HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos

Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction data or the need for meticulous, task-specific reward engineering, which limits their scalability. ...

Yinhuai Wang, Qihan Zhao, Yuen Fui Lau
Feb 2, 2026
arXivPDF

Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge

Autoregressive large language models (LLMs) have achieved remarkable success in many complex tasks, yet they can still fail in very simple logical reasoning such as the "reversal curse" -- when trained on forward knowledge data of the form "$A \rightarrow B$" (e.g., Alice's husband is Bob), the mode...

Xutao Ma, Yixiao Huang, Hanlin Zhu
Feb 2, 2026

Data from arXiv.org • Updated hourly