Functional Continuous Decomposition
Researchers developed Functional Continuous Decomposition (FCD), a JAX-accelerated framework that uses Levenberg-Marquardt optimization to perform C^1 continuous, parametric fitting of time-series data into multiple temporal modes, improving analysis and feature extraction for machine learning models.
VecGlypher: Unified Vector Glyph Generation with Language Models
VecGlypher is a multimodal language model that directly generates high-fidelity vector glyphs (font characters) from text descriptions or image examples by autoregressively emitting SVG path tokens, trained on a large dataset of fonts in a two-stage process.
SeaCache: Spectral-Evolution-Aware Cache for Accelerating Diffusion Models
This paper introduces SeaCache, a training-free caching strategy for diffusion models that accelerates inference by reusing intermediate outputs based on a novel spectral-evolution-aware filter, which disentangles content from noise by focusing on low-frequency structures early and high-frequency details later.
Image Generation with a Sphere Encoder
Researchers developed the Sphere Encoder, a generative model that maps images to a spherical latent space and can generate high-quality images in a single pass, rivaling multi-step diffusion models with significantly fewer computational steps.
NanoKnow: How to Know What Your Language Model Knows
Researchers created a benchmark, NanoKnow, using LLMs with fully open pre-training data to analyze how pre-training data frequency and external evidence influence model accuracy and knowledge utilization.
GUI-Libra: Training Native GUI Agents to Reason and Act with Action-aware Supervision and Partially Verifiable RL
This paper introduces GUI-Libra, a training recipe that uses action-aware supervised fine-tuning and a KL-regularized reinforcement learning approach to improve native GUI agents' reasoning and task completion by addressing data scarcity and partial verifiability issues.
SPOC: Safety-Aware Planning Under Partial Observability And Physical Constraints
This paper introduces SPOC, a new benchmark to evaluate large language models' ability to plan safely in real-world-like environments with partial information and physical limitations, revealing current models struggle with implicit safety constraints.
Dynamic Symmetric Point Tracking: Tackling Non-ideal Reference in Analog In-memory Training
This paper introduces a dynamic method to estimate and correct for the 'symmetric point' bias in analog in-memory computing devices during model training, improving accuracy without costly pre-calibration.
AHAN: Asymmetric Hierarchical Attention Network for Identical Twin Face Verification
This paper introduces AHAN, a neural network that uses hierarchical attention on facial regions and cross-attention between face halves to better distinguish identical twins, achieving 92.3% accuracy.
Archetypal Graph Generative Models: Explainable and Identifiable Communities via Anchor-Dominant Convex Hulls
This paper introduces GraphHull, a generative model that uses global archetypes and local prototypes, represented as convex hulls, to explain community structures and improve performance in tasks like link prediction and community detection.
Easy3E: Feed-Forward 3D Asset Editing via Rectified Voxel Flow
This paper introduces a feed-forward framework that uses a voxel flow-based deformation and a normal-guided appearance prior to enable fast, consistent, and high-fidelity 3D model editing from a single input view.
EPSVec: Efficient and Private Synthetic Data Generation via Dataset Vectors
This paper introduces EPSVec, a method that uses 'dataset vectors' to steer large language models for efficient and private synthetic data generation, decoupling privacy costs from generation and improving fidelity in low-data scenarios.
AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression
AngelSlim is a toolkit that combines various techniques like quantization, speculative decoding, and token pruning to make large AI models smaller and faster for industrial deployment, including the first viable 2-bit large model.
PSF-Med: Measuring and Explaining Paraphrase Sensitivity in Medical Vision Language Models
Researchers developed a benchmark to measure how medical vision-language models flip their 'yes/no' answers to rephrased questions, finding high flip rates and that some models maintain consistency even without an image, indicating reliance on language priors rather than visual grounding.
Towards Controllable Video Synthesis of Routine and Rare OR Events
Researchers developed a video diffusion framework that transforms operating room scenes into abstract geometric representations to synthesize controllable, realistic surgical videos, including rare safety-critical events, outperforming standard video generation methods and enabling training of AI models for near-miss detection.
Event-Driven On-Sensor Locomotion Mode Recognition Using a Shank-Mounted IMU with Embedded Machine Learning for Exoskeleton Control
This paper demonstrates an activity recognition system that classifies locomotion modes (stance, walking, stairs) directly within a shank-mounted IMU, reducing power consumption and latency for exoskeleton control by only sending activity labels to the microcontroller.
ARLArena: A Unified Framework for Stable Agentic Reinforcement Learning
This paper introduces ARLArena, a framework to analyze and stabilize agentic reinforcement learning by dissecting policy gradient methods into four dimensions, leading to SAMPO, a new optimization method that achieves consistent stability and strong performance.
Alignment-Weighted DPO: A principled reasoning approach to improve safety alignment
This paper introduces Alignment-Weighted DPO, a method that uses a Chain-of-Thought fine-tuning dataset with step-by-step rationales to teach LLMs to produce principled safety refusals by assigning different preference weights to reasoning and final-answer segments during optimization.