AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

Generation Models Know Space: Unleashing Implicit 3D Priors for Scene Understanding

While Multimodal Large Language Models demonstrate impressive semantic capabilities, they often suffer from spatial blindness, struggling with fine-grained geometric reasoning and physical dynamics. Existing solutions typically rely on explicit 3D modalities or complex geometric scaffolding, which a...

Xianjin Wu, Dingkang Liang, Tianrui Feng
Mar 19, 2026
arXivPDF

Matryoshka Gaussian Splatting

The ability to render scenes at adjustable fidelity from a single model, known as level of detail (LoD), is crucial for practical deployment of 3D Gaussian Splatting (3DGS). Existing discrete LoD methods expose only a limited set of operating points, while concurrent continuous LoD approaches enable...

Zhilin Guo, Boqiao Zhang, Hakan Aktas
Mar 19, 2026
arXivPDF

MonoArt: Progressive Structural Reasoning for Monocular Articulated 3D Reconstruction

Reconstructing articulated 3D objects from a single image requires jointly inferring object geometry, part structure, and motion parameters from limited visual evidence. A key difficulty lies in the entanglement between motion cues and object structure, which makes direct articulation regression uns...

Haitian Li, Haozhe Xie, Junxiang Xu
Mar 19, 2026
arXivPDF

NavTrust: Benchmarking Trustworthiness for Embodied Navigation

There are two major categories of embodied navigation: Vision-Language Navigation (VLN), where agents navigate by following natural language instructions; and Object-Goal Navigation (OGN), where agents navigate to a specified target object. However, existing work primarily evaluates model performanc...

Huaide Jiang, Yash Chaudhary, Yuping Wang
Mar 19, 2026
arXivPDF

Under One Sun: Multi-Object Generative Perception of Materials and Illumination

We introduce Multi-Object Generative Perception (MultiGP), a generative inverse rendering method for stochastic sampling of all radiometric constituents -- reflectance, texture, and illumination -- underlying object appearance from a single image. Our key idea to solve this inherently ambiguous radi...

Nobuo Yoshii, Xinran Nicole Han, Ryo Kawahara
Mar 19, 2026
arXivPDF

FinTradeBench: A Financial Reasoning Benchmark for LLMs

Real-world financial decision-making is a challenging problem that requires reasoning over heterogeneous signals, including company fundamentals derived from regulatory filings and trading signals computed from price dynamics. Recently, with the advancement of Large Language Models (LLMs), financial...

Yogesh Agrawal, Aniruddha Dutta, Md Mahadi Hasan
Mar 19, 2026
arXivPDF

F2LLM-v2: Inclusive, Performant, and Efficient Embeddings for a Multilingual World

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2 supports more than 200 languages, with a particular emphasis on p...

Ziyin Zhang, Zihan Liao, Hang Yu
Mar 19, 2026
arXivPDF

Spectrally-Guided Diffusion Noise Schedules

Denoising diffusion models are widely used for high-quality image and video generation. Their performance depends on noise schedules, which define the distribution of noise levels applied during training and the sequence of noise levels traversed during sampling. Noise schedules are typically handcr...

Carlos Esteves, Ameesh Makadia
Mar 19, 2026
arXivPDF

Online Learning and Equilibrium Computation with Ranking Feedback

Online learning in arbitrary, and possibly adversarial, environments has been extensively studied in sequential decision-making, and it is closely connected to equilibrium computation in game theory. Most existing online learning algorithms rely on \emph{numeric} utility feedback from the environmen...

Mingyang Liu, Yongshan Chen, Zhiyuan Fan
Mar 19, 2026

Data from arXiv.org • Updated hourly