Thursday, January 23, 2025
Today's top trending papers
Can foundation models actively gather information in interactive environments to test hypotheses?
Ke, Nan Rosemary | Sawyer, Danny P. | Soyer, Hubert | Engelcke, Martin | Reichert, David P | Hudson, Drew A. | Reid, John | Lerchner, Alexander | Rezende, Danilo Jimenez | Lillicrap, Timothy P | Mozer, Michael | Wang, Jane X
The study investigates the ability of foundation models to actively gather information to test hypotheses in interactive environments. Results show that while these models perform well in identifying single rewarding features, their performance decreases when identifying conjunctions of rewarding features, with potential improvements seen through self-correction techniques.
How social media reacted: The paper appears to have gained some initial attention within academic and AI research circles on Twitter, with at least two accounts sharing links to the abstract. However, based on the limited data provided, there's no evidence of widespread discussion or commentary about the paper's findings on social media. The reaction seems to be primarily one of information sharing rather than active engagement or critique at this stage.
ProcessBench: Identifying Process Errors in Mathematical Reasoning
Zheng, Chujie | Zhang, Zhenru | Zhang, Beichen | Lin, Runji | Lu, Keming | Yu, Bowen | Liu, Dayiheng | Zhou, Jingren | Lin, Junyang
The paper introduces ProcessBench, a tool for identifying errors in mathematical reasoning by providing test cases with annotated error locations. Evaluation shows that existing process reward models struggle with more challenging math problems, while general language models prompted to critique each step perform better. The hope is that ProcessBench will inspire further research in assessing reasoning processes and overseeing language models effectively.
How social media reacted: The social media reaction to the ProcessBench paper appears limited but generally positive. The paper's author, Chujie Zheng, shared the work with enthusiasm, highlighting 'intriguing observations.' The paper is being circulated through AI-focused Twitter accounts that regularly share new research, indicating interest within the AI research community. However, the available tweets lack detailed commentary or broader discussion from the community, suggesting the paper's impact may still be developing or that more substantive reactions may be occurring on other platforms or in academic circles not captured in these tweets.
Facade: High-Precision Insider Threat Detection Using Deep Contextual Anomaly Detection
Kantchelian, Alex | Neo, Casper | Stevens, Ryan | Kim, Hyungwon | Fu, Zhaohao | Momeni, Sadegh | Huber, Birkett | Bursztein, Elie | Pavlidis, Yanis | Buthpitiya, Senaka | Cochran, Martin | Poletto, Massimiliano
Facade is a high-precision deep-learning-based anomaly detection system deployed at Google since 2018, designed to detect insider threats by considering contextual information surrounding each action. It achieves extremely low false positive rates, with a false positive rate as low as 0.0003% for single rogue actions, making it a unique and effective solution for securing large corporate environments.
Normalizing Flows are Capable Generative Models
Zhai, Shuangfei | Zhang, Ruixiang | Nakkiran, Preetum | Berthelot, David | Gu, Jiatao | Zheng, Huangjie | Chen, Tianrong | Bautista, Miguel Angel | Jaitly, Navdeep | Susskind, Josh
How social media reacted: The initial social media reaction to the paper "Normalizing Flows are Capable Generative Models" is positive and focused on spreading awareness. The paper's release was announced by one of the authors, highlighting both the paper and associated code. AI/ML news bots quickly picked up and shared the paper, summarizing its key technical contributions. These include the use of Transformers with Masked Autoregressive Flows, new benchmarks in likelihood estimation, and quality comparable to diffusion models. The rapid sharing by multiple ML-focused accounts suggests the paper is viewed as significant within the field. However, at this early stage, there is not yet visible in-depth discussion or critical commentary on social media.
Flow Matching Guide and Code
Lipman, Yaron | Havasi, Marton | Holderrieth, Peter | Shaul, Neta | Le, Matt | Karrer, Brian | Chen, Ricky T. Q. | Lopez-Paz, David | Ben-Hamu, Heli | Gat, Itai
The Flow Matching Guide and Code provides an in-depth overview of the Flow Matching framework, which has excelled in generative modeling across different domains. It covers the mathematical principles, design options, and expansions of FM, offering a PyTorch package with practical examples to assist researchers at all levels in comprehending, implementing, and advancing FM.
How social media reacted: The social media reaction to the Flow Matching Guide and Code paper is predominantly positive, with significant engagement from the paper's authors. They are actively promoting the comprehensive guide, the accompanying codebase, and an upcoming tutorial at NeurIPS 2024. There's particular interest in the visual content of the paper, with one user sharing and appreciating the figures. The reaction highlights the paper's aim to serve both novice and experienced researchers in understanding and applying Flow Matching across various domains.
Advancing Extended Reality with 3D Gaussian Splatting: Innovations and Prospects
Qiu, Shi | Xie, Binzhu | Liu, Qixuan | Heng, Pheng-Ann
The paper explores the potential of 3D Gaussian Splatting (3DGS) to enhance Extended Reality (XR) applications by reviewing existing research and proposing future directions. It highlights the underexplored opportunities for leveraging 3DGS innovations to advance XR technology and provides a roadmap for integrating cutting-edge 3DGS techniques into XR development.
Reputation Management in the ChatGPT Era
Edwards, Lilian | Binns, Reuben
This paper discusses the challenges of reputation management in the era of Generative AI, highlighting the potential reputational and privacy harms caused by AI-generated content about individuals. It explores the legal tools available, such as libel and data protection laws, and suggests that while these laws offer some protection, a more systemic approach may be needed to safeguard individuals against AI-generated content.
On Zarankiewicz's Problem for Intersection Hypergraphs of Geometric Objects
Chan, Timothy M. | Keller, Chaya | Smorodinsky, Shakhar
The paper explores Zarankiewicz's problem for intersection hypergraphs of geometric objects, providing sharp bounds for families of axis-parallel boxes and pseudo-discs in $\mathbb{R}^d$. The results improve upon previous bounds and utilize a combination of combinatorial and geometric techniques, including shallow cuttings and biclique covers.
A Comparative Study of Learning Paradigms in Large Language Models via Intrinsic Dimension
Janapati, Saahith | Ji, Yangfeng
This study compares the effects of supervised fine-tuning (SFT) and in-context learning (ICL) on Large Language Models (LLMs) using Intrinsic Dimension (ID) to estimate the complexity of hidden representations. The research shows that ICL consistently leads to higher intrinsic dimensionality in LLM representations compared to SFT, indicating that ICL generates representations in higher dimensional manifolds within the embedding space.
SafeWorld: Geo-Diverse Safety Alignment
Yin, Da | Qiu, Haoyi | Huang, Kung-Hsiang | Chang, Kai-Wei | Peng, Nanyun
SafeWorld introduces a benchmark called SafeWorld to evaluate Large Language Models' ability to generate culturally sensitive and legally compliant responses across diverse global contexts. The study reveals that current models struggle to meet these criteria, and proposes a training method called Direct Preference Optimization to enhance alignment with geo-diverse safety standards, resulting in improved performance compared to existing models.
How many continuous measurements are needed to learn a vector?
Krieg, David | Novak, Erich | Ullrich, Mario
The study shows that vectors in $\mathbb{R}^m$ can be accurately recovered with just $\lceil \log_2(m+1)\rceil +1$ adaptive continuous measurements. The findings have implications for infinite-dimensional approximation tasks and are discussed in detail.
World-Consistent Data Generation for Vision-and-Language Navigation
Zhong, Yu | Zhang, Rui | Zhang, Zihao | Wang, Shuo | Fang, Chuan | Zhang, Xishan | Guo, Jiaming | Peng, Shaohui | Huang, Di | Yan, Yanyang | Hu, Xing | Tan, Ping | Guo, Qi
The study addresses the challenge of data scarcity in Vision-and-Language Navigation (VLN) by proposing a world-consistent data generation framework (WCGEN) that enhances the generalization of agents to novel environments. The framework consists of two stages - trajectory and viewpoint - which ensure spatial coherency and wraparound consistency, leading to improved performance on navigation tasks and enhanced generalization ability of VLN agents to unseen environments.
On the Bidirected Cut Relaxation for Steiner Forest
Byrka, Jarosław | Grandoni, Fabrizio | Traub, Vera
The Steiner Forest problem involves finding the cheapest forest where specified pairs of vertices are in the same connected component, with the current best approximation factor being 2. A new LP relaxation approach, inspired by the Bidirected Cut Relaxation for Steiner Tree, shows promising properties and allows rounding half-integral solutions with a cost increase of at most 16/9.
Cram\'er-Rao Bound Analysis and Beamforming Design for 3D Extended Target in ISAC: From Optimization to Learning Approaches
Wang, Yiqiu | Tao, Meixia | Sun, Shu | Cao, Wei
This paper presents a study on an integrated sensing and communication system where a base station transmits a signal for multi-user communication and extended target sensing. The research introduces novel closed-form Cram{\'e}r-Rao bounds for estimating parameters of three-dimensional extended targets and proposes two beamforming design problems, which are solved using optimization algorithms. Additionally, an unsupervised learning-based approach is introduced through an ISAC graph neural network to address the beamforming design problems, showing improved trade-offs between communication and sensing performance.
How social media reacted: There is minimal social media reaction to this paper so far. The only relevant tweets are automated shares of the paper's title and link, likely from bots that post new arxiv submissions. This lack of broader engagement is expected given the paper's highly technical nature and recent publication date. More substantive discussion, if any, would likely occur in specialized academic or industry forums rather than general social media platforms.
Simulating Human-like Daily Activities with Desire-driven Autonomy
Wang, Yiding | Chen, Yuxuan | Zhong, Fangwei | Ma, Long | Wang, Yizhou
This paper introduces a Desire-driven Autonomous Agent (D2A) framework that enables a Large Language Model-based agent to autonomously simulate human-like daily activities based on intrinsic motivations. By incorporating a motivational framework inspired by the Theory of Needs, the D2A agent can propose and select tasks aligned with desires such as social interaction and self-care, resulting in coherent and contextually relevant activities with enhanced rationality compared to other frameworks.
Gated Delta Networks: Improving Mamba2 with Delta Rule
Yang, Songlin | Kautz, Jan | Hatamizadeh, Ali
The study introduces Gated Delta Networks, a novel architecture that combines gating for memory control and the delta update rule for precise memory modifications to improve performance in retrieval and long-context tasks. The proposed Gated DeltaNet outperforms existing models like Mamba2 and DeltaNet across various benchmarks by enabling rapid memory erasure and targeted updates, and further enhances performance through hybrid architectures with sliding window attention or Mamba2 layers for improved training efficiency and task performance.
PrEditor3D: Fast and Precise 3D Shape Editing
Erkoç, Ziya | Gümeli, Can | Wang, Chaoyang | Nießner, Matthias | Dai, Angela | Wonka, Peter | Lee, Hsin-Ying | Zhuang, Peiye
The study introduces a training-free method for fast and precise 3D shape editing, allowing users to edit a single shape within minutes with alignment to prompts and preservation of unaltered regions. By projecting the 3D object onto 4-view images and utilizing user-guided text prompts and rough masks, a 3D segmentation pipeline is employed to detect and merge edited areas, resulting in superior editing quality compared to existing methods.
A Finite Volume Method for Elastic Waves in Heterogeneous, Anisotropic and Fractured Media
Jacobsen, Ingrid Kristine | Berre, Inga | Nordbotten, Jan Martin | Stefansson, Ivar
The study presents the cell-centered finite volume method MPSA-Newmark for solving the elastic wave equation in heterogeneous, anisotropic, and fractured media, incorporating absorbing boundary conditions to limit reflections. Numerical convergence analyses demonstrate expected convergence rates in time and space, with verification through simulation examples showcasing the method's versatility in handling various media types.
Stochastic LQR Design With Disturbance Preview
Liu, Jietian | Lessard, Laurent | Seiler, Peter
This paper explores the stochastic LQR problem with a finite number of disturbance preview steps, deriving solutions for both finite and infinite horizons using linear, time-varying dynamics and costs. The proofs are based on the principle of optimality and nested information structure, demonstrating that the finite preview controller approaches the optimal noncausal controller as the preview horizon increases.
Small Languages, Big Models: A Study of Continual Training on Languages of Norway
Samuel, David | Mikhailov, Vladislav | Velldal, Erik | Øvrelid, Lilja | Charpentier, Lucas Georges Gabriel | Kutuzov, Andrey
The study explores the challenge of training large language models for less widely spoken languages like Norwegian and low-resource languages like Sámi. The researchers propose a three-stage continual training approach and experiment with combining causal and masked language modeling to create a more flexible model, resulting in the release of NorMistral-11B, a large generative language model for Norwegian Bokmål, Nynorsk, and Northern Sámi with 11.4 billion parameters.
I Don't Know: Explicit Modeling of Uncertainty with an [IDK] Token
Cohen, Roi | Dobler, Konstantin | Biran, Eden | de Melo, Gerard
This work introduces a novel calibration method to address hallucinations in large language models by adding an [IDK] token to the vocabulary and shifting probability mass to it for incorrect predictions, allowing the model to express uncertainty explicitly. The proposed method enables models to better indicate uncertainty without significant loss of knowledge, as demonstrated through evaluations across various architectures and factual downstream tasks.
An Adaptively Inexact Method for Bilevel Learning Using Primal-Dual Style Differentiation
Bogensperger, Lea | Ehrhardt, Matthias J. | Pock, Thomas | Salehi, Mohammad Sadegh | Wong, Hok Shing
The study introduces an adaptively inexact method for bilevel learning, focusing on optimizing linear operators through a loss function dependent on a convex optimization problem's minimizer. By employing the 'piggyback' iterative algorithm to compute gradients, the approach addresses the numerical solution of the lower-level problem, offering an a-posteriori error bound for estimating hypergradient accuracy and suggesting adaptive step-size selection for efficient upper-level optimization.
Policy Agnostic RL: Offline RL and Online RL Fine-Tuning of Any Class and Backbone
Mark, Max Sobol | Gao, Tian | Sampaio, Georgia Gabriela | Srirama, Mohan Kumar | Sharma, Archit | Finn, Chelsea | Kumar, Aviral
The paper introduces a new approach called policy-agnostic RL (PA-RL) that allows for training multiple policy classes with varying architectures and sizes, addressing the challenge of poor performance when the policy class changes in reinforcement learning. PA-RL enables fine-tuning of diffusion and transformer policies, improving performance and sample efficiency compared to existing methods, as demonstrated by successfully fine-tuning a generalist robot policy in the real world.
BoRA: Bi-dimensional Weight-Decomposed Low-Rank Adaptation
Wang, Qiushi | Fan, Yuchen | Bao, Junwei | Jiang, Hongfei | Song, Yang
BoRA is a novel extension of Low-Rank Adaptation (LoRA) and Weight-Decomposed Low-Rank Adaptation (DoRA) that optimizes weight matrices symmetrically by adjusting both column-wise and row-wise magnitudes. Through extensive experiments, BoRA outperforms existing Parameter-Efficient Fine-Tuning methods like LoRA and DoRA, showcasing superior results across different benchmarks.
Active Learning with Context Sampling and One-vs-Rest Entropy for Semantic Segmentation
Wu, Fei | Marquez-Neila, Pablo | Rafi-Tarii, Hedyeh | Sznitman, Raphael
The study introduces OREAL, a novel patch-based Active Learning method for multi-class semantic segmentation that improves boundary detection by aggregating pixel-wise uncertainty scores. It also introduces one-vs-rest entropy, a new uncertainty score function that balances class-wise uncertainties during dataset creation, validated through experiments on various datasets and model architectures.
EFX Allocations on Some Multi-graph Classes
Bhaskar, Umang | Pandit, Yeshwant
Recent work has shown the existence of EFX allocations for graphical valuations in fair division, where each item has zero value for all agents except those at its end-points. This extends to multi-graphs, with EFX allocations existing and computable in polynomial time for agents with cancellable valuations in bipartite multi-graphs, multi-trees with monotone valuations, and multi-graphs with a certain girth related to their chromatic number.
TriDi: Trilateral Diffusion of 3D Humans, Objects, and Interactions
Petrov, Ilya A. | Marin, Riccardo | Chibane, Julian | Pons-Moll, Gerard
TriDi is a unified model for modeling 3D human-object interactions in any direction, generating Human, Object, and Interaction modalities simultaneously using a three-way diffusion process. By combining text descriptions and contact maps in a shared latent space, TriDi surpasses specialized baselines in terms of both qualitative and quantitative metrics, demonstrating better diversity and applicability to various tasks such as scene population and generalization to unseen object geometry.
Neo-FREE: Policy Composition Through Thousand Brains And Free Energy Optimization
Rossi, Francesca | Garrabé, Émiland | Russo, Giovanni
The Neo-FREE control architecture, inspired by the Thousand Brains Theory and Free Energy Principle, optimally composes control primitives for tackling tasks by linearly combining functional units through a gating mechanism that minimizes variational free energy. This approach recasts the problem as a convex finite-horizon optimal control problem, demonstrating effectiveness in robot navigation experiments in complex environments.
On Random Batch Methods (RBM) for interacting particle systems driven by L\'evy processes
Liu, Jian-Guo | Wang, Yuliang
The paper introduces the Random Batch Method for interacting particle systems driven by L\'evy noises (RBM-L\'evy) as an extension of the original RBM algorithm, aiming to reduce computational costs from $O(N^2)$ to $O(pN) per time step by grouping particles into batches for interactions. The proposed method maintains convergence to the original system dynamics even in the presence of L\'evy jumps, with rigorous proof provided in Wasserstein distance and numerical examples demonstrating the convergence rate.
Understanding Factual Recall in Transformers via Associative Memories
Nichani, Eshaan | Lee, Jason D. | Bietti, Alberto
The study explores how transformers can achieve optimal factual recall by utilizing associative memories, demonstrating that shallow transformers can effectively store information through a combination of linear and MLP associative memories. The research shows that transformers with a single layer of self-attention followed by an MLP can achieve 100% accuracy on a synthetic factual recall task when the parameters scale linearly with the number of facts, offering insights into the model's learning behavior through gradient flow trajectory analysis.
How social media reacted: The paper 'Understanding Factual Recall in Transformers via Associative Memories' has received limited engagement on social media based on the available data. It has been shared by academic-focused Twitter accounts that disseminate recent research papers, but there is no evidence of widespread discussion or commentary. The tweets that do mention the paper simply share its existence or provide a brief summary of its key findings, suggesting that the paper's impact on social media has been minimal so far.