AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

Reward-free Alignment for Conflicting Objectives

Direct alignment methods are increasingly used to align large language models (LLMs) with human preferences. However, many real-world alignment problems involve multiple conflicting objectives, where naive aggregation of preferences can lead to unstable training and poor trade-offs. In particular, w...

Peter Chen, Xiaopeng Li, Xi Chen

Feb 2, 2026

arXivPDF

MEG-XL: Data-Efficient Brain-to-Text via Long-Context Pre-Training

Clinical brain-to-text interfaces are designed for paralysed patients who cannot provide extensive training recordings. Pre-training improves data-efficient generalisation by learning statistical priors across subjects, but these priors critically depend on context. While natural speech might unfold...

Dulhan Jayalath, Oiwi Parker Jones

Feb 2, 2026

arXivPDF

PixelGen: Pixel Diffusion Beats Latent Diffusion with Perceptual Loss

Pixel diffusion generates images directly in pixel space in an end-to-end manner, avoiding the artifacts and bottlenecks introduced by VAEs in two-stage latent diffusion. However, it is challenging to optimize high-dimensional pixel manifolds that contain many perceptually irrelevant signals, leavin...

Zehong Ma, Ruihan Xu, Shiliang Zhang

Feb 2, 2026

arXivPDF

New explanations and inference for least angle regression

Efron et al. (2004) introduced least angle regression (LAR) as an algorithm for linear predictions, intended as an alternative to forward selection with connections to penalized regression. However, LAR has remained somewhat of a "black box," where some basic behavioral properties of LAR output are ...

Karl B. Gregory, Daniel J. Nordman

Feb 2, 2026

arXivPDF

RLAnything: Forge Environment, Policy, and Reward Model in Completely Dynamic RL System

We propose RLAnything, a reinforcement learning framework that dynamically forges environment, policy, and reward models through closed-loop optimization, amplifying learning signals and strengthening the overall RL system for any LLM or agentic scenarios. Specifically, the policy is trained with in...

Yinjie Wang, Tianbao Xie, Ke Shen

Feb 2, 2026

arXivPDF

RE-TRAC: REcursive TRAjectory Compression for Deep Search Agents

LLM-based deep research agents are largely built on the ReAct framework. This linear design makes it difficult to revisit earlier states, branch into alternative search directions, or maintain global awareness under long contexts, often leading to local optima, redundant exploration, and inefficient...

Jialiang Zhu, Gongrui Zhang, Xiaolong Ma

Feb 2, 2026

arXivPDF

Expanding the Capabilities of Reinforcement Learning via Text Feedback

The success of RL for LLM post-training stems from an unreasonably uninformative source: a single bit of information per rollout as binary reward or preference label. At the other extreme, distillation offers dense supervision but requires demonstrations, which are costly and difficult to scale. We ...

Yuda Song, Lili Chen, Fahim Tajwar

Feb 2, 2026

arXivPDF

Flow Policy Gradients for Robot Control

Likelihood-based policy gradient methods are the dominant approach for training robot control policies from rewards. These methods rely on differentiable action likelihoods, which constrain policy outputs to simple distributions like Gaussians. In this work, we show how flow matching policy gradient...

Brent Yi, Hongsuk Choi, Himanshu Gaurav Singh

Feb 2, 2026

arXivPDF

Training LLMs for Divide-and-Conquer Reasoning Elevates Test-Time Scalability

Large language models (LLMs) have demonstrated strong reasoning capabilities through step-by-step chain-of-thought (CoT) reasoning. Nevertheless, at the limits of model capability, CoT often proves insufficient, and its strictly sequential nature constrains test-time scalability. A potential alterna...

Xiao Liang, Zhong-Zhi Li, Zhenghao Lin

Feb 2, 2026

arXivPDF

AgentRx: Diagnosing AI Agent Failures from Execution Trajectories

AI agents often fail in ways that are difficult to localize because executions are probabilistic, long-horizon, multi-agent, and mediated by noisy tool outputs. We address this gap by manually annotating failed agent runs and release a novel benchmark of 115 failed trajectories spanning structured A...

Shraddha Barke, Arnav Goyal, Alind Khare

Feb 2, 2026

arXivPDF

MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

Most Large Language Model (LLM) agent memory systems rely on a small set of static, hand-designed operations for extracting memory. These fixed procedures hard-code human priors about what to store and how to revise memory, making them rigid under diverse interaction patterns and inefficient on long...

Haozhen Zhang, Quanyu Long, Jianzhu Bao

Feb 2, 2026

arXivPDF

HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos

Enabling humanoid robots to perform agile and adaptive interactive tasks has long been a core challenge in robotics. Current approaches are bottlenecked by either the scarcity of realistic interaction data or the need for meticulous, task-specific reward engineering, which limits their scalability. ...

Yinhuai Wang, Qihan Zhao, Yuen Fui Lau

Feb 2, 2026

arXivPDF

SPARKLING: Balancing Signal Preservation and Symmetry Breaking for Width-Progressive Learning

Progressive Learning (PL) reduces pre-training computational overhead by gradually increasing model scale. While prior work has extensively explored depth expansion, width expansion remains significantly understudied, with the few existing methods limited to the early stages of training. However, ex...

Qifan Yu, Xinyu Ma, Zhijian Zhuo

Feb 2, 2026

arXivPDF

Multi-head automated segmentation by incorporating detection head into the contextual layer neural network

Deep learning based auto segmentation is increasingly used in radiotherapy, but conventional models often produce anatomically implausible false positives, or hallucinations, in slices lacking target structures. We propose a gated multi-head Transformer architecture based on Swin U-Net, augmented wi...

Edwin Kys, Febian Febian

Feb 2, 2026

arXivPDF

Breaking the Reversal Curse in Autoregressive Language Models via Identity Bridge

Autoregressive large language models (LLMs) have achieved remarkable success in many complex tasks, yet they can still fail in very simple logical reasoning such as the "reversal curse" -- when trained on forward knowledge data of the form "$A \rightarrow B$" (e.g., Alice's husband is Bob), the mode...

Xutao Ma, Yixiao Huang, Hanlin Zhu

Feb 2, 2026

Data from arXiv.org • Updated hourly