AI Research

Latest research papers from arXiv covering machine learning, computer vision, natural language processing, and more.

arXivPDF

JanusMesh: Fast and Zero-Shot 3D Visual Illusion Generation via Cross-Space Denoising

Creating 3D visual illusions, a single 3D mesh that reveals entirely different semantics from various viewing angles, is a fascinating but tough challenge. Existing optimization-based methods are slow and can produce oversaturated colors. In contrast, naive stitching approaches fail to produce geome...

Siang-Ling Zhang, Huai-Hsun Cheng, Tsung-Ju Yang
Jun 18, 2026
arXivPDF

How Transparent is DiffusionGemma?

LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less t...

Joshua Engels, Callum McDougall, Bilal Chughtai
Jun 18, 2026
arXivPDF

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume complementary knowledge ...

Wenhao Chi, Arkaprava Sinha, Dominick Reilly
Jun 18, 2026
arXivPDF

Optimal Deterministic Multicalibration and Omniprediction

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of t...

Georgy Noarov, Aaron Roth
Jun 18, 2026
arXivPDF

Thinking in Boxes: 3D Editing in Real Images Made Easy

Text and 2D-conditioning interfaces provide weak, ambiguous control over spatial transformations in image editing -- particularly under large object motions and camera changes. Prior work has used 3D primitives such as boxes, but only as loose conditioning signals indicating approximate object locat...

Pradhaan S Bhat, Naveen Chandra R, Rishubh Parihar
Jun 18, 2026
arXivPDF

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: the...

Przemyslaw Musialski
Jun 18, 2026
arXivPDF

Predictability as a Fine-Grained Measure for Privacy

Differential privacy (DP) ensures rigorous individual-level privacy guarantees against even the most knowledgeable attackers, but its worst-case nature can impose a costly privacy-accuracy tradeoff. We introduce privacy via predictability, a fine-grained framework that explicitly incorporates the at...

Linda Lu, Karthik Sridharan
Jun 18, 2026
arXivPDF

Current World Models Lack a Persistent State Core

World models are increasingly regarded as a decisive step toward artificial general intelligence, yet modeling the physical world demands more than rendering convincing frames on demand: it requires an internal world state that keeps evolving over time, decoupled from observation, so that objects en...

Jinpeng Lu, Dexu Zhu, Haoyuan Shi
Jun 18, 2026
arXivPDF

Toward Calibrated Mixture-of-Experts Under Distribution Shift

Calibration aligns a model's predictive uncertainty with the frequencies of its empirical outcomes and is important for understanding and trusting reported probabilities. Recent work shows that enforcing calibration at the level of individual predictors can improve ensemble accuracy and calibration,...

Gina Wong, Drew Prinster, Suchi Saria
Jun 18, 2026
arXivPDF

SSD: Spatially Speculative Decoding Accelerates Autoregressive Image Generation

Autoregressive models excel in visual generation by treating images as 1D sequences of discrete tokens, mirroring language modeling. However, this flattening discards the intrinsic 2D spatial locality of visual signals, creating severe computational bottlenecks during inference. We introduce Spatial...

Shilong Xiang, Zirui Zhang, Lijun Yu
Jun 18, 2026
arXivPDF

Multi-Task Bayesian In-Context Learning

Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that d...

Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho
Jun 18, 2026

Data from arXiv.org • Updated hourly