Evaluating Evidence Grounding Under User Pressure in Instruction-Tuned Language Models
Authors: Sai Koneru, Elphin Joe, Christine Kirchhoff, Jian Wu, Sarah Rajtmajer
First: 2026-03-20T17:38:23+00:00 · Latest: 2026-03-20T17:38:23+00:00
Abstract
In contested domains, instruction-tuned language models must balance user-alignment pressures against faithfulness to the in-context evidence. To evaluate this tension, we introduce a controlled epistemic-conflict framework grounded in the U.S. National Climate Assessment. We conduct fine-grained ablations over evidence composition and uncertainty cues across 19 instruction-tuned models spanning 0.27B to 32B parameters. Across neutral prompts, richer evidence generally improves evidence-consistent accuracy and ordinal scoring performance. Under user pressure, however, evidence does not reliably prevent user-aligned reversals in this controlled fixed-evidence setting. We report three primary failure modes. First, we identify a negative partial-evidence interaction, where adding epistemic nuance, specifically research gaps, is associated with increased susceptibility to sycophancy in families like Llama-3 and Gemma-3. Second, robustness scales non-monotonically: within some families, certain low-to-mid scale models are especially sensitive to adversarial user pressure. Third, models differ in distributional concentration under conflict: some instruction-tuned models maintain sharply peaked ordinal distributions under pressure, while others are substantially more dispersed; in scale-matched Qwen comparisons, reasoning-distilled variants (DeepSeek-R1-Qwen) exhibit consistently higher dispersion than their instruction-tuned counterparts. These findings suggest that, in a controlled fixed-evidence setting, providing richer in-context evidence alone offers no guarantee against user pressure without explicit training for epistemic integrity.
Summary / 总结
In contested domains, instruction-tuned language models must balance user-alignment pressures against faithfulness to the in-context evidence.
Conditioning Protein Generation via Hopfield Pattern Multiplicity
Authors: Jeffrey D. Varner
First: 2026-03-20T16:38:15+00:00 · Latest: 2026-03-20T16:38:15+00:00
Abstract
Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored sequences equally and cannot direct generation toward a functional subset of interest. We show that a single scalar parameter, added as a bias to the sampler's attention logits, continuously shifts generation from the full family toward a user-specified subset, with no retraining and no change to the model architecture. A practitioner supplies a small set of sequences (for example, hits from a binding screen) and a multiplicity ratio that controls how strongly generation favors them. The method is agnostic to what the subset represents: binding, stability, specificity, or any other property. We find that the conditioning is exact at the level of the sampler's internal representation, but that the decoded sequence phenotype can fall short because the dimensionality reduction used to encode sequences does not always preserve the residue-level variation that defines the functional split. We term this discrepancy the calibration gap and show that it is predicted by a simple geometric measure of how well the encoding separates the functional subset from the rest of the family. Experiments on five Pfam families (Kunitz, SH3, WW, Homeobox, and Forkhead domains) confirm the monotonic relationship between separation and gap across a fourfold range of geometries. Applied to omega-conotoxin peptides targeting a calcium channel involved in pain signaling, curated seeding from 23 characterized binders produces over a thousand candidates that preserve the primary pharmacophore and all experimentally identified binding determinants. These results show that stochastic attention enables practitioners to expand a handful of experimentally characterized sequences into diverse candidate libraries without retraining a generative model.
Summary / 总结
Protein sequence generation via stochastic attention produces plausible family members from small alignments without training, but treats all stored sequences equally and cannot direct generation toward a functional subset of interest.
Fine-tuning Timeseries Predictors Using Reinforcement Learning
Authors: Hugo Cazaux, Ralph Rudd, Hlynur Stefánsson, Sverrir Ólafsson, Eyjólfur Ingi Ásgeirsson
First: 2026-03-20T15:44:40+00:00 · Latest: 2026-03-20T15:44:40+00:00
Abstract
This chapter presents three major reinforcement learning algorithms used for fine-tuning financial forecasters. We propose a clear implementation plan for backpropagating the loss of a reinforcement learning task to a model trained using supervised learning, and compare the performance before and after the fine-tuning. We find an increase in performance after fine-tuning, and transfer learning properties to the models, indicating the benefits of fine-tuning. We also highlight the tuning process and empirical results for future implementation by practitioners.
Summary / 总结
This chapter presents three major reinforcement learning algorithms used for fine-tuning financial forecasters.
Integrating Meta-Features with Knowledge Graph Embeddings for Meta-Learning
Authors: Antonis Klironomos, Ioannis Dasoulas, Francesco Periti, Mohamed Gad-Elrab, Heiko Paulheim, Anastasia Dimou, Evgeny Kharlamov
First: 2026-03-20T11:57:25+00:00 · Latest: 2026-03-20T11:57:25+00:00
Abstract
The vast collection of machine learning records available on the web presents a significant opportunity for meta-learning, where past experiments are leveraged to improve performance. Two crucial meta-learning tasks are pipeline performance estimation (PPE), which predicts pipeline performance on target datasets, and dataset performance-based similarity estimation (DPSE), which identifies datasets with similar performance patterns. Existing approaches primarily rely on dataset meta-features (e.g., number of instances, class entropy, etc.) to represent datasets numerically and approximate these meta-learning tasks. However, these approaches often overlook the wealth of past experimental results and pipeline metadata available. This limits their ability to capture dataset - pipeline interactions that reveal performance similarity patterns. In this work, we propose KGmetaSP, a knowledge-graph-embeddings approach that leverages existing experiment data to capture these interactions and improve both PPE and DPSE. We represent datasets and pipelines within a unified knowledge graph (KG) and derive embeddings that support pipeline-agnostic meta-models for PPE and distance-based retrieval for DPSE. To validate our approach, we construct a large-scale benchmark comprising 144,177 OpenML experiments, enabling a rich cross-dataset evaluation. KGmetaSP enables accurate PPE using a single pipeline-agnostic meta-model and improves DPSE over baselines. The proposed KGmetaSP, KG, and benchmark are released, establishing a new reference point for meta-learning and demonstrating how consolidating open experiment data into a unified KG advances the field.
Summary / 总结
The vast collection of machine learning records available on the web presents a significant opportunity for meta-learning, where past experiments are leveraged to improve performance.
FedPDPO: Federated Personalized Direct Preference Optimization for Large Language Model Alignment
Authors: Kewen Zhu, Liping Yi, Zhiming Zhao, Zhuang Qi, Han Yu, Qinghua Hu
First: 2026-03-20T08:24:49+00:00 · Latest: 2026-03-20T08:24:49+00:00
Comments: under review
Abstract
Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data. Direct Preference Optimization (DPO) offers an efficient alternative to reinforcement learning with human feedback (RLHF), but its direct application in FL suffers from severe performance degradation under non-IID data and limited generalization of implicit rewards. To bridge this gap, we propose FedPDPO (Federated Personalized Direct Preference Optimization), a personalized federated framework for preference alignment of LLMs. It adopts a parameter-efficient fine-tuning architecture where each client maintains a frozen pretrained LLM backbone augmented with a Low-Rank Adaptation (LoRA) adapter, enabling communication-efficient aggregation. To address non-IID heterogeneity, we devise (1) the globally shared LoRA adapter with the personalized client-specific LLM head. Moreover, we introduce (2) a personalized DPO training strategy with a client-specific explicit reward head to complement implicit rewards and further alleviate non-IID heterogeneity, and (3) a bottleneck adapter to balance global and local features. We provide theoretical analysis establishing the probabilistic foundation and soundness. Extensive experiments on multiple preference datasets demonstrate state-of-the-art performance, achieving up to 4.80% average accuracy improvements in federated intra-domain and cross-domain settings.
Summary / 总结
Aligning large language models (LLMs) with human preferences in federated learning (FL) is challenging due to decentralized, privacy-sensitive, and highly non-IID preference data.
A Phylogenetic Approach to Genomic Language Modeling
Authors: Carlos Albors, Jianan Canal Li, Gonzalo Benegas, Chengzhong Ye, Yun S. Song
First: 2025-03-04T06:53:03+00:00 · Latest: 2026-03-20T00:45:19+00:00
Comments: 15 pages, 7 figures
Abstract
Genomic language models (gLMs) have shown mostly modest success in identifying evolutionarily constrained elements in mammalian genomes. To address this issue, we introduce a novel framework for training gLMs that explicitly models nucleotide evolution on phylogenetic trees using multispecies whole-genome alignments. Our approach integrates an alignment into the loss function during training but does not require it for making predictions, thereby enhancing the model's applicability. We applied this framework to train PhyloGPN, a model that excels at predicting functionally disruptive variants from a single sequence alone and demonstrates strong transfer learning capabilities.
Summary / 总结
Genomic language models (gLMs) have shown mostly modest success in identifying evolutionarily constrained elements in mammalian genomes.
Investigating layer-selective transfer learning of QAOA parameters for Max-Cut problem
Authors: Francesco Aldo Venturelli, Sreetama Das, Filippo Caruso
Venue: Phys. Rev. A 112, 042428 (2025)
First: 2024-12-30T16:41:16+00:00 · Latest: 2026-03-19T23:45:10+00:00
Comments: 11 pages, 9 figures. Revised version
Abstract
The quantum approximate optimization algorithm (QAOA) is a variational quantum algorithm (VQA) ideal for noisy intermediate-scale quantum (NISQ) processors, and is highly successful in solving combinatorial optimization problems (COPs). It has been observed that the optimal parameters obtained from one instance of a COP can be transferred to another instance, resulting in generally good solutions for the latter. In this work, we propose a refinement scheme in which only a subset of QAOA layers is optimized following parameter transfer, with a focus on the Max-Cut problem. Our motivation is to reduce the complexity of the loss landscape when optimizing all the layers of high-depth QAOA circuits, as well as to reduce the optimization time. We investigate the potential hierarchical roles of different layers and analyze how the approximation ratio scales with increasing problem size. Our findings indicate that the selective layer optimization scheme offers a favorable trade-off between solution quality and computational time, and can be more beneficial than full optimization at a lower optimization time.
Summary / 总结
The quantum approximate optimization algorithm (QAOA) is a variational quantum algorithm (VQA) ideal for noisy intermediate-scale quantum (NISQ) processors, and is highly successful in solving combinatorial optimization problems (COPs).
ICLAD: In-Context Learning for Unified Tabular Anomaly Detection Across Supervision Regimes
Authors: Jack Yi Wei, Narges Armanfard
First: 2026-03-19T21:49:04+00:00 · Latest: 2026-03-19T21:49:04+00:00
Comments: 33 pages, 17 figures
Abstract
Anomaly detection on tabular data is commonly studied under three supervision regimes, including one-class settings that assume access to anomaly-free training samples, fully unsupervised settings with unlabeled and potentially contaminated training data, and semi-supervised settings with limited anomaly labels. Existing deep learning approaches typically train dataset-specific models under the assumption of a single supervision regime, which limits their ability to leverage shared structures across anomaly detection tasks and to adapt to different supervision levels. We propose ICLAD, an in-context learning foundation model for tabular anomaly detection that generalizes across both datasets and supervision regimes. ICLAD is trained via meta-learning on synthetic tabular anomaly detection tasks, and at inference time, the model assigns anomaly scores by conditioning on the training set without updating model weights. Comprehensive experiments on 57 tabular datasets from ADBench show that our method achieves state-of-the-art performance across three supervision regimes, establishing a unified framework for tabular anomaly detection.
Summary / 总结
Anomaly detection on tabular data is commonly studied under three supervision regimes, including one-class settings that assume access to anomaly-free training samples, fully unsupervised settings with unlabeled and potentially contaminated training data, and semi-supervised settings with limited anomaly labels.
Any-Subgroup Equivariant Networks via Symmetry Breaking
Authors: Abhinav Goel, Derek Lim, Hannah Lawrence, Stefanie Jegelka, Ningyuan Huang
Venue: ICLR 2026
First: 2026-03-19T21:30:41+00:00 · Latest: 2026-03-19T21:30:41+00:00
Comments: Accepted at ICLR 2026
Abstract
The inclusion of symmetries as an inductive bias, known as equivariance, often improves generalization on geometric data (e.g. grids, sets, and graphs). However, equivariant architectures are usually highly constrained, designed for symmetries chosen a priori, and not applicable to datasets with other symmetries. This precludes the development of flexible, multi-modal foundation models capable of processing diverse data equivariantly. In this work, we build a single model -- the Any-Subgroup Equivariant Network (ASEN) -- that can be simultaneously equivariant to several groups, simply by modulating a certain auxiliary input feature. In particular, we start with a fully permutation-equivariant base model, and then obtain subgroup equivariance by using a symmetry-breaking input whose automorphism group is that subgroup. However, finding an input with the desired automorphism group is computationally hard. We overcome this by relaxing from exact to approximate symmetry breaking, leveraging the notion of 2-closure to derive fast algorithms. Theoretically, we show that our subgroup-equivariant networks can simulate equivariant MLPs, and their universality can be guaranteed if the base model is universal. Empirically, we validate our method on symmetry selection for graph and image tasks, as well as multitask and transfer learning for sequence tasks, showing that a single network equivariant to multiple permutation subgroups outperforms both separate equivariant models and a single non-equivariant model.
Summary / 总结
The inclusion of symmetries as an inductive bias, known as equivariance, often improves generalization on geometric data (e.g.
Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids
Authors: Lucas Ferraz, Ana F. Rodrigues, Pedro Giesteira Cotovio, Mafalda Ventura, Gabriela Silva, Ana Sofia Coroadinha, Miguel Machuqueiro, Catia Pesquita
First: 2026-03-19T21:11:17+00:00 · Latest: 2026-03-19T21:11:17+00:00
Abstract
Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential. A central challenge in AAV bioengineering, as in protein design more broadly, is the vast sequence design space relative to the scale of feasible experimental screening. Machine-guided generative approaches provide a powerful means of navigating this landscape and proposing novel protein sequences that satisfy functional constraints. Here, we develop a generative design framework based on protein language models and reinforcement learning to generate highly novel yet functionally plausible AAV capsids. A pretrained model was fine-tuned on experimentally validated capsid sequences to learn patterns associated with viability. Reinforcement learning was then used to guide sequence generation, with a reward function that jointly promoted predicted viability and sequence novelty, thereby enabling exploration beyond regions represented in the training data. Comparative analyses showed that fine-tuning alone produces sequences with high predicted viability but remains biased toward the training distribution, whereas reinforcement learining-guided generation reaches more distant regions of sequence space while maintaining high predicted viability. Finally, we propose a candidate selection strategy that integrates predicted viability, sequence novelty, and biophysical properties to prioritize variants for downstream evaluation. This work establishes a framework for the generative exploration of protein sequence space and advances the application of generative protein language models to AAV bioengineering.
Summary / 总结
Adeno-associated viral (AAV) vectors are widely used delivery platforms in gene therapy, and the design of improved capsids is key to expanding their therapeutic potential.
Adaptive Relative Pose Estimation Framework with Dual Noise Tuning for Safe Approaching Maneuvers
Authors: Batu Candan, Murat Berke Oktay, Simone Servadio
First: 2025-07-22T04:13:03+00:00 · Latest: 2026-03-19T21:10:25+00:00
Abstract
Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal (ADR) missions targeting tumbling derelict satellites such as ESA's ENVISAT. This work presents a complete pipeline integrating advanced computer vision techniques with adaptive nonlinear filtering to address this challenge. A Convolutional Neural Network (CNN), enhanced with image preprocessing, detects structural markers (corners) from chaser imagery, whose 2D coordinates are converted to 3D measurements using camera modeling. These measurements are fused within an Unscented Kalman Filter (UKF) framework, selected for its ability to handle nonlinear relative dynamics, to estimate the full relative pose. Key contributions include the integrated system architecture and a dual adaptive strategy within the UKF: dynamic tuning of the measurement noise covariance compensates for varying CNN measurement uncertainty, while adaptive tuning of the process noise covariance, utilizing measurement residual analysis, accounts for unmodeled dynamics or maneuvers online. This dual adaptation enhances robustness against both measurement imperfections and dynamic model uncertainties. The performance of the proposed adaptive integrated system is evaluated through high-fidelity simulations using a realistic ENVISAT model, comparing estimates against ground truth under various conditions, including measurement outages. This comprehensive approach offers an enhanced solution for robust onboard relative navigation, significantly advancing the capabilities required for safe proximity operations during ADR missions.
Summary / 总结
Accurate and robust relative pose estimation is crucial for enabling challenging Active Debris Removal (ADR) missions targeting tumbling derelict satellites such as ESA's ENVISAT.
Steering Awareness: Detecting Activation Steering from Within
Authors: Joshua Fonseca Rivera, David Demitri Africa
First: 2025-11-26T13:49:43+00:00 · Latest: 2026-03-19T17:37:06+00:00
Abstract
Activation steering -- adding a vector to a model's residual stream to modify its behavior -- is widely used in safety evaluations as if the model cannot detect the intervention. We test this assumption, introducing steering awareness: a model's ability to infer, during its own forward pass, that a steering vector was injected and what concept it encodes. After fine-tuning, seven instruction-tuned models develop strong steering awareness on held-out concepts; the best reaches 95.5% detection, 71.2% concept identification, and zero false positives on clean inputs. This generalizes to unseen steering vector construction methods when their directions have high cosine similarity to the training distribution but not otherwise, indicating a geometric detector rather than a generic anomaly detector. Surprisingly, detection does not confer resistance; on both factual and safety benchmarks, detection-trained models are consistently more susceptible to steering than their base counterparts. Mechanistically, steering awareness arises not from a localized circuit, but from a distributed transformation that progressively rotates diverse injected vectors into a shared detection direction. Activation steering should therefore not be considered an invisible intervention in safety evaluations.
Summary / 总结
Activation steering -- adding a vector to a model's residual stream to modify its behavior -- is widely used in safety evaluations as if the model cannot detect the intervention.
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control
Authors: Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman
First: 2026-03-19T16:55:33+00:00 · Latest: 2026-03-19T16:55:33+00:00
Comments: Submitted to IEEE Transactions on Computational Social Systems. 17 pages, 9 figures, 10 tables
Abstract
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% MAPE for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.
Summary / 总结
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods.
Infherno: End-to-end Agent-based FHIR Resource Synthesis from Free-form Clinical Notes
Authors: Johann Frei, Nils Feldhus, Lisa Raithel, Roland Roller, Alexander Meyer, Frank Kramer
First: 2025-07-16T14:06:51+00:00 · Latest: 2026-03-19T16:52:38+00:00
Comments: EACL 2026 System Demonstrations | Code: https://github.com/j-frei/Infherno | Demo: https://infherno.misit-augsburg.de
Abstract
For clinical data integration and healthcare services, the HL7 FHIR standard has established itself as a desirable format for interoperability between complex health data. Previous attempts at automating the translation from free-form clinical notes into structured FHIR resources address narrowly defined tasks and rely on modular approaches or LLMs with instruction tuning and constrained decoding. As those solutions frequently suffer from limited generalizability and structural inconformity, we propose an end-to-end framework powered by LLM agents, code execution, and healthcare terminology database tools to address these issues. Our solution, called Infherno, is designed to adhere to the FHIR document schema and competes well with a human baseline in predicting FHIR resources from unstructured text. The implementation features a front end for custom and synthetic data and both local and proprietary models, supporting clinical data integration processes and interoperability across institutions. Gemini 2.5-Pro excels in our evaluation on synthetic and clinical datasets, yet ambiguity and feasibility of collecting ground-truth data remain open problems.
Summary / 总结
For clinical data integration and healthcare services, the HL7 FHIR standard has established itself as a desirable format for interoperability between complex health data.
Multifidelity Simulation-based Inference for Computationally Expensive Simulators
Authors: Anastasia N. Krouglova, Hayden R. Johnson, Basile Confavreux, Michael Deistler, Pedro J. Gonçalves
Venue: ICLR 2026
First: 2025-02-12T13:59:22+00:00 · Latest: 2026-03-19T13:55:55+00:00
Comments: Accepted at ICLR 2026. Available at OpenReview: https://openreview.net/pdf?id=bj0dcKp9t6
Abstract
Across many domains of science, stochastic models are an essential tool to understand the mechanisms underlying empirically observed data. Models can be of different levels of detail and accuracy, with models of high-fidelity (i.e., high accuracy) to the phenomena under study being often preferable. However, inferring parameters of high-fidelity models via simulation-based inference is challenging, especially when the simulator is computationally expensive. We introduce a multifidelity approach to neural posterior estimation that uses transfer learning to leverage inexpensive low-fidelity simulations to efficiently infer parameters of high-fidelity simulators. Our method applies the multifidelity scheme to both amortized and non-amortized neural posterior estimation. We further improve simulation efficiency by introducing a sequential variant that uses an acquisition function targeting the predictive uncertainty of the density estimator to adaptively select high-fidelity parameters. On established benchmark and neuroscience tasks, our approaches require up to two orders of magnitude fewer high-fidelity simulations than current methods, while showing comparable performance. Overall, our approaches open new opportunities to perform efficient Bayesian inference on computationally expensive simulators.
Summary / 总结
Across many domains of science, stochastic models are an essential tool to understand the mechanisms underlying empirically observed data.
Zipper-LoRA: Dynamic Parameter Decoupling for Speech-LLM based Multilingual Speech Recognition
Authors: Yuxiang Mei, Delai Qiu, Shengping Liu, Jiaen Liang, Yanhua Long
First: 2026-03-18T10:04:50+00:00 · Latest: 2026-03-19T13:47:01+00:00
Comments: 13 pages, 8 figures
Abstract
Speech Large Language Models (Speech-LLMs) have emerged as a powerful approach for automatic speech recognition (ASR) by aligning speech encoders with large language models. However, adapting these systems to multilingual settings with imbalanced data distributions remains challenging. In such scenarios, a stability-plasticity dilemma often arises: fully shared Parameter-Efficient Fine-Tuning (PEFT) can cause negative inter-lingual interference for under-represented languages, while fully language-specific tuning limits the cross-lingual beneficial knowledge transfer needed for low-resource tasks. To address this, we propose Zipper-LoRA, a novel rank-level decoupling framework with three variants (Static, Hard, and Soft) that dynamically synthesizes LoRA updates from shared and language-specific subspaces. By using a lightweight language-conditioned router, Zipper-LoRA dynamically controls the contribution of each subspace at the LoRA rank level, enabling fine-grained sharing where languages are compatible and strict decoupling when conflicts occur. To further stabilize optimization under imbalanced data, we propose a two-stage training strategy with an Initial-B warm start that significantly accelerates convergence. Experiments on a 12-language mixed-resource setting show that Zipper-LoRA consistently outperforms both fully shared and independent baselines, particularly in extremely low-resource scenarios. Moreover, we demonstrate that these gains are robust across both chunked and non-chunked encoder configurations, confirming the framework's reliability for practical, large-scale multilingual ASR. Our code and data will be available at https://github.com/YuCeong-May/Zipper-LoRA for reproducibility.
Summary / 总结
Speech Large Language Models (Speech-LLMs) have emerged as a powerful approach for automatic speech recognition (ASR) by aligning speech encoders with large language models.
Transfer Learning for Neutrino Scattering: Domain Adaptation with GANs
Authors: Jose L. Bonilla, Krzysztof M. Graczyk, Artur M. Ankowski, Rwik Dharmapal Banerjee, Beata E. Kowal, Hemant Prasad, Jan T. Sobczyk
Venue: Phys.Rev.D 113 (2026) 5, 053001
First: 2025-08-18T15:08:13+00:00 · Latest: 2026-03-19T11:45:48+00:00
Comments: 23 pages, 22 figures, together with supplement, as published in Phys. Rev. D
Abstract
Transfer learning (TL) is used to extrapolate the physics information encoded in a Generative Adversarial Network (GAN) trained on synthetic neutrino-carbon inclusive scattering data to related processes such as neutrino-argon and antineutrino-carbon interactions. We investigate how much of the underlying lepton-nucleus dynamics is shared across different targets and processes. We also assess the effectiveness of TL when training data is obtained from a different neutrino-nucleus interaction model. Our results show that TL not only reproduces key features of lepton kinematics, including the quasielastic and $Δ$-resonance peaks, but also significantly outperforms generative models trained from scratch. Using data sets of 10,000 and 100,000 events, we find that TL maintains high accuracy even with limited statistics. Our findings demonstrate that TL provides a well-motivated and efficient framework for modeling (anti)neutrino-nucleus interactions and for constructing next-generation neutrino-scattering event generators, particularly valuable when experimental data are sparse.
Summary / 总结
Transfer learning (TL) is used to extrapolate the physics information encoded in a Generative Adversarial Network (GAN) trained on synthetic neutrino-carbon inclusive scattering data to related processes such as neutrino-argon and antineutrino-carbon interactions.
SIA: A Synthesize-Inject-Align Framework for Knowledge-Grounded and Secure E-commerce Search LLMs with Industrial Deployment
Authors: Zhouwei Zhai, Mengxiang Chen, Anmeng Zhang
First: 2026-03-17T05:41:22+00:00 · Latest: 2026-03-19T08:59:40+00:00
Abstract
Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations. However, their industrial deployment is hindered by two critical challenges: (1) knowledge hallucination due to insufficient encoding of dynamic, fine-grained product knowledge, and (2) security vulnerabilities under jailbreak attacks that threaten compliance. To address these issues, we propose SIA--a Synthesize-Inject-Align framework for building knowledgeable and secure e-commerce search LLMs. Our approach first synthesizes high-quality natural language corpus by combining structured knowledge graphs with unstructured behavioral logs, augmented with reasoning chains and safety-aware data. We then introduce a parameter-efficient pre-training strategy based on Depth Up-Scaling to inject domain knowledge while preserving general capabilities. Finally, a dual-path alignment method via multi-task instruction tuning and adversarial training strengthens both task performance and safety robustness. The framework has been deployed at JD.com, China's largest self-operated e-commerce platform, where A/B tests across five core search scenarios demonstrate significant improvements in key business metrics, validating its industrial effectiveness and scalability.
Summary / 总结
Large language models offer transformative potential for e-commerce search by enabling intent-aware recommendations.
SwiftGS: Episodic Priors for Immediate Satellite Surface Recovery
Authors: Rong Fu, Jiekai Wu, Haiyun Wei, Xiaowen Ma, Shiyin Lin, Kangan Qian, Chuang Liu, Jianyuan Ni, Simon James Fong
First: 2026-03-19T08:59:07+00:00 · Latest: 2026-03-19T08:59:07+00:00
Comments: 24 pages, 6 figures
Abstract
Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization. We introduce SwiftGS, a meta-learned system that reconstructs 3D surfaces in a single forward pass by predicting geometry-radiation-decoupled Gaussian primitives together with a lightweight SDF, replacing expensive per-scene fitting with episodic training that captures transferable priors. The model couples a differentiable physics graph for projection, illumination, and sensor response with spatial gating that blends sparse Gaussian detail and global SDF structure, and incorporates semantic-geometric fusion, conditional lightweight task heads, and multi-view supervision from a frozen geometric teacher under an uncertainty-aware multi-task loss. At inference, SwiftGS operates zero-shot with optional compact calibration and achieves accurate DSM reconstruction and view-consistent rendering at significantly reduced computational cost, with ablations highlighting the benefits of the hybrid representation, physics-aware rendering, and episodic meta-training.
Summary / 总结
Rapid, large-scale 3D reconstruction from multi-date satellite imagery is vital for environmental monitoring, urban planning, and disaster response, yet remains difficult due to illumination changes, sensor heterogeneity, and the cost of per-scene optimization.
Representation Finetuning for Continual Learning
Authors: Haihua Luo, Xuming Ran, Tommi Kärkkäinen, Huiyan Xue, Zhonghua Chen, Qi Xu, Fengyu Cong
First: 2026-03-11T18:15:31+00:00 · Latest: 2026-03-19T06:17:22+00:00
Abstract
The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams. While pre-trained models have shown powerful performance in continual learning, they still require finetuning to adapt effectively to downstream tasks. However, prevailing Parameter-Efficient Fine-Tuning (PEFT) methods operate through empirical, black-box optimization at the weight level. These approaches lack explicit control over representation drift, leading to sensitivity to domain shifts and catastrophic forgetting in continual learning scenarios. In this work, we introduce Continual Representation Learning (CoRe), a novel framework that for the first time shifts the finetuning paradigm from weight space to representation space. Unlike conventional methods, CoRe performs task-specific interventions within a low-rank linear subspace of hidden representations, adopting a learning process with explicit objectives, which ensures stability for past tasks while maintaining plasticity for new ones. By constraining updates to a low-rank subspace, CoRe achieves exceptional parameter efficiency. Extensive experiments across multiple continual learning benchmarks demonstrate that CoRe not only preserves parameter efficiency but also significantly outperforms existing state-of-the-art methods. Our work introduces representation finetuning as a new, more effective and interpretable paradigm for continual learning.
Summary / 总结
The world is inherently dynamic, and continual learning aims to enable models to adapt to ever-evolving data streams.
Expert Personas Improve LLM Alignment but Damage Accuracy: Bootstrapping Intent-Based Persona Routing with PRISM
Authors: Zizhao Hu, Mohammad Rostami, Jesse Thomason
First: 2026-03-19T05:28:21+00:00 · Latest: 2026-03-19T05:28:21+00:00
Abstract
Persona prompting can steer LLM generation towards a domain-specific tone and pattern. This behavior enables use cases in multi-agent systems where diverse interactions are crucial and human-centered tasks require high-level human alignment. Prior works provide mixed opinions on their utility: some report performance gains when using expert personas for certain domains and their contribution to data diversity in synthetic data creation, while others find near-zero or negative impact on general utility. To fully leverage the benefits of the LLM persona and avoid its harmfulness, a more comprehensive investigation of the mechanism is crucial. In this work, we study how model optimization, task type, prompt length, and placement can impact expert persona effectiveness across instruction-tuned and reasoning LLMs, and provide insight into conditions under which expert personas fail and succeed. Based on our findings, we developed a pipeline to fully leverage the benefits of an expert persona, named PRISM (Persona Routing via Intent-based Self-Modeling), which self-distills an intent-conditioned expert persona into a gated LoRA adapter through a bootstrapping process that requires no external data, models, or knowledge. PRISM enhances human preference and safety alignment on generative tasks while maintaining accuracy on discriminative tasks across all models, with minimal memory and computing overhead.
Summary / 总结
Persona prompting can steer LLM generation towards a domain-specific tone and pattern.
Interpretable Prostate Cancer Detection using a Small Cohort of MRI Images
Authors: Vahid Monfared, Mohammad Hadi Gharib, Ali Sabri, Maryam Shahali, Farid Rashidi, Amit Mehta, Reza Rawassizadeh
First: 2026-03-19T03:40:07+00:00 · Latest: 2026-03-19T03:40:07+00:00
Comments: 26 pages, 5 figures, 7 tables
Abstract
Prostate cancer is a leading cause of mortality in men, yet interpretation of T2-weighted prostate MRI remains challenging due to subtle and heterogeneous lesions. We developed an interpretable framework for automatic cancer detection using a small dataset of 162 T2-weighted images (102 cancer, 60 normal), addressing data scarcity through transfer learning and augmentation. We performed a comprehensive comparison of Vision Transformers (ViT, Swin), CNNs (ResNet18), and classical methods (Logistic Regression, SVM, HOG+SVM). Transfer-learned ResNet18 achieved the best performance (90.9% accuracy, 95.2% sensitivity, AUC 0.905) with only 11M parameters, while Vision Transformers showed lower performance despite substantially higher complexity. Notably, HOG+SVM achieved comparable accuracy (AUC 0.917), highlighting the effectiveness of handcrafted features in small datasets. Unlike state-of-the-art approaches relying on biparametric MRI (T2+DWI) and large cohorts, our method achieves competitive performance using only T2-weighted images, reducing acquisition complexity and computational cost. In a reader study of 22 cases, five radiologists achieved a mean sensitivity of 67.5% (Fleiss Kappa = 0.524), compared to 95.2% for the AI model, suggesting potential for AI-assisted screening to reduce missed cancers and improve consistency. Code and data are publicly available.
Summary / 总结
Prostate cancer is a leading cause of mortality in men, yet interpretation of T2-weighted prostate MRI remains challenging due to subtle and heterogeneous lesions.
Mind the Rarities: Can Rare Skin Diseases Be Reliably Diagnosed via Diagnostic Reasoning?
Authors: Yang Liu, Jiyao Yang, Hongjin Zhao, Xiaoyong Li, Yanzhe Ji, Xingjian Li, Runmin Jiang, Tianyang Wang, Saeed Anwar, Dongwoo Kim, Yue Yao, Zhenyue Qin, Min Xu
First: 2026-03-19T02:25:36+00:00 · Latest: 2026-03-19T02:25:36+00:00
Abstract
Large vision-language models (LVLMs) demonstrate strong performance in dermatology; however, evaluating diagnostic reasoning for rare conditions remains largely unexplored. Existing benchmarks focus on common diseases and assess only final accuracy, overlooking the clinical reasoning process, which is critical for complex cases. We address this gap by constructing DermCase, a long-context benchmark derived from peer-reviewed case reports. Our dataset contains 26,030 multi-modal image-text pairs and 6,354 clinically challenging cases, each annotated with comprehensive clinical information and step-by-step reasoning chains. To enable reliable evaluation, we establish DermLIP-based similarity metrics that achieve stronger alignment with dermatologists for assessing differential diagnosis quality. Benchmarking 22 leading LVLMs exposes significant deficiencies across diagnosis accuracy, differential diagnosis, and clinical reasoning. Fine-tuning experiments demonstrate that instruction tuning substantially improves performance while Direct Preference Optimization (DPO) yields minimal gains. Systematic error analysis further reveals critical limitations in current models' reasoning capabilities.
Summary / 总结
Large vision-language models (LVLMs) demonstrate strong performance in dermatology; however, evaluating diagnostic reasoning for rare conditions remains largely unexplored.
1S-DAug: One-Shot Data Augmentation for Robust Few-Shot Generalization
Authors: Yunwei Bai, Ying Kiat Tan, Yao Shu, Tsuhan Chen
First: 2026-01-27T08:01:47+00:00 · Latest: 2026-03-19T00:00:52+00:00
Abstract
Few-shot learning (FSL) challenges model generalization to novel classes based on just a few shots of labeled examples, a testbed where traditional test-time augmentations fail to be effective. We introduce 1S-DAug, a one-shot generative augmentation operator that synthesizes diverse yet faithful variants from just one example image at test time. 1S-DAug couples traditional geometric perturbations with controlled noise injection and a denoising diffusion process conditioned on the original image. The generated images are then encoded and aggregated, alongside the original image, into a combined representation for more robust FSL predictions. Integrated as a training-free model-agnostic plugin, 1S-DAug consistently improves FSL across standard benchmarks of 4 different datasets without any model parameter update, including achieving up to 20\% proportional accuracy improvement on the miniImagenet 5-way-1-shot benchmark. Code will be released.
Summary / 总结
Few-shot learning (FSL) challenges model generalization to novel classes based on just a few shots of labeled examples, a testbed where traditional test-time augmentations fail to be effective.
CeRA: Breaking the Linear Ceiling of Low-Rank Adaptation via Manifold Expansion
Authors: Hung-Hsuan Chen
First: 2026-02-26T11:55:25+00:00 · Latest: 2026-03-18T23:49:58+00:00
Abstract
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a critical ``linear ceiling'' in complex reasoning tasks: simply increasing the rank yields diminishing returns due to intrinsic linear constraints. We introduce CeRA (Capacity-enhanced Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and structural dropout to induce manifold expansion. On the SlimOrca benchmark, CeRA breaks this linear barrier: at rank 64 (PPL 3.89), it outperforms LoRA at rank 512 (PPL 3.90), demonstrating superior spectral efficiency. This advantage generalizes to downstream mathematical reasoning. We reveal a profound impact of task complexity: In fundamental arithmetic (GSM8K), CeRA matches standard baselines, but in the highly complex MATH dataset, CeRA demonstrates extreme parameter efficiency. Remarkably, CeRA at rank 64 (Pass@1 16.36\%) outperforms LoRA at rank 512 (Pass@1 15.72\%), achieving superior reasoning accuracy with only 1/8 of the parameter budget. Mechanism analysis via Singular Value Decomposition (SVD) confirms that CeRA activates the dormant tail of the singular value spectrum, effectively preventing the rank collapse observed in linear methods.
Summary / 总结
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT).
EDM-ARS: A Domain-Specific Multi-Agent System for Automated Educational Data Mining Research
Authors: Chenguang Pan, Zhou Zhang, Weixuan Xiao, Chengyuan Yao
First: 2026-03-18T20:45:45+00:00 · Latest: 2026-03-18T20:45:45+00:00
Abstract
In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research. We conceptualize EDM-ARS as a general framework for domain-aware automated research pipelines, where educational expertise is embedded into each stage of the research lifecycle. As a first instantiation of this framework, we focus on predictive modeling tasks. Within this scope, EDM-ARS orchestrates five specialized LLM-powered agents (ProblemFormulator, DataEngineer, Analyst, Critic, and Writer) through a state-machine coordinator that supports revision loops, checkpoint-based recovery, and sandboxed code execution. Given a research prompt and a dataset, EDM-ARS produces a complete LaTeX manuscript with real Semantic Scholar citations, validated machine learning analyses, and automated methodological peer review. We also provide a detailed description of the system architecture, the three-tier data registry design that encodes educational domain expertise, the specification of each agent, the inter-agent communication protocol, and mechanisms for error-handling and self-correction. Finally, we discuss current limitations, including single-dataset scope and formulaic paper output, and outline a phased roadmap toward causal inference, transfer learning, psychometric, and multi-dataset generalization. EDM-ARS is released as an open-source project to support the educational research community.
Summary / 总结
In this technical report, we present the Educational Data Mining Automated Research System (EDM-ARS), a domain-specific multi-agent pipeline that automates end-to-end educational data mining (EDM) research.
Signal in the Noise: Polysemantic Interference Transfers and Predicts Cross-Model Influence
Authors: Bofan Gong, Shiyang Lai, James Evans, Dawn Song
Venue: ICLR 2026
First: 2025-05-16T18:20:42+00:00 · Latest: 2026-03-18T17:55:08+00:00
Abstract
Polysemanticity is pervasive in language models and remains a major challenge for interpretation and model behavioral control. Leveraging sparse autoencoders (SAEs), we map the polysemantic topology of two small models (Pythia-70M and GPT-2-Small) to identify SAE feature pairs that are semantically unrelated yet exhibit interference within models. We intervene at four foci (prompt, token, feature, neuron) and measure induced shifts in the next-token prediction distribution, uncovering polysemantic structures that expose a systematic vulnerability in these models. Critically, interventions distilled from counterintuitive interference patterns shared by two small models transfer reliably to larger instruction-tuned models (Llama-3.1-8B/70B-Instruct and Gemma-2-9B-Instruct), yielding predictable behavioral shifts without access to model internals. These findings challenge the view that polysemanticity is purely stochastic, demonstrating instead that interference structures generalize across scale and family. Such generalization suggests a convergent, higher-order organization of internal representations, which is only weakly aligned with intuition and structured by latent regularities, offering new possibilities for both black-box control and theoretical insight into human and artificial cognition.
Summary / 总结
Polysemanticity is pervasive in language models and remains a major challenge for interpretation and model behavioral control.
Integrating Arithmetic Learning Improves Mathematical Reasoning in Smaller Models
Authors: Neeraj Gangwar, Suma P Bhat, Nickvash Kani
First: 2025-02-18T13:43:06+00:00 · Latest: 2026-03-18T17:43:04+00:00
Comments: Accepted to LREC 2026
Abstract
While large models pre-trained on high-quality data exhibit excellent performance on mathematical reasoning (e.g., GSM8k, MultiArith), it remains challenging to specialize smaller models for these tasks. Common approaches to address this challenge include knowledge distillation from large teacher models and data augmentation (e.g., rephrasing questions and generating synthetic solutions). Despite these efforts, smaller models struggle with arithmetic computations, leading to errors in mathematical reasoning. In this work, we leverage a synthetic arithmetic dataset generated programmatically to enhance the reasoning capabilities of smaller models. We investigate two key approaches to incorporate this dataset: (1) intermediate fine-tuning, in which a model is fine-tuned on the arithmetic dataset before training it on a reasoning dataset, and (2) integrating the arithmetic dataset into an instruction-tuning mixture, allowing the model to learn arithmetic skills alongside general instruction-following abilities. Our experiments on multiple reasoning benchmarks demonstrate that incorporating an arithmetic dataset, whether through targeted fine-tuning or within an instruction-tuning mixture, enhances models' arithmetic capabilities, thereby improving their mathematical reasoning performance.
Summary / 总结
While large models pre-trained on high-quality data exhibit excellent performance on mathematical reasoning (e.g., GSM8k, MultiArith), it remains challenging to specialize smaller models for these tasks.
Beyond Muon: MUD (MomentUm Decorrelation) for Faster Transformer Training
Authors: Ben S. Southworth, Stephen Thomas
First: 2026-03-18T17:37:31+00:00 · Latest: 2026-03-18T17:37:31+00:00
Abstract
Orthogonalized-momentum optimizers such as Muon improve transformer training by approximately whitening/orthogonalizing matrix-valued momentum updates via a short polar-decomposition iteration. However, polar-factor approximations typically require multiple large matrix multiplications, and the resulting overhead can be substantial and hardware-dependent. We introduce MUD (MomentUm Decorrelation), a complementary whitening approach that replaces Muon's polar update with a triangular (Cholesky-like) whitening surrogate inspired by classical Gram--Schmidt and Gauss-Seidel ideas. We show that row-orthonormal matrices are fixed points of the MUD map, relate the inner step to symmetric Gauss-Seidel preconditioning of the Gram matrix, and prove quadratic local convergence near the fixed point. In terms of time-to-perplexity, MUD yields consistent 10-50\% wall-clock improvements over tuned AdamW and Muon in time-to-perplexity, typically converging slightly slower per step than Muon but with substantially lower optimizer overhead -- relative to Muon, MUD improves peak tokens/s by roughly $1.3-2.6\times$ across most settings and up to nearly $3\times$ on GPT-2 large on an A100. We also demonstrate training a ESM-2 150M protein language model, where MUD matches Muon-level validation perplexity in significantly less wall-clock time.
Summary / 总结
Orthogonalized-momentum optimizers such as Muon improve transformer training by approximately whitening/orthogonalizing matrix-valued momentum updates via a short polar-decomposition iteration.
Gender Disambiguation in Machine Translation: Diagnostic Evaluation in Decoder-Only Architectures
Authors: Chiara Manna, Hosein Mohebbi, Afra Alishahi, Frédéric Blain, Eva Vanmassenhove
First: 2026-03-18T17:26:36+00:00 · Latest: 2026-03-18T17:26:36+00:00
Abstract
While Large Language Models achieve state-of-the-art results across a wide range of NLP tasks, they remain prone to systematic biases. Among these, gender bias is particularly salient in MT, due to systematic differences across languages in whether and how gender is marked. As a result, translation often requires disambiguating implicit source signals into explicit gender-marked forms. In this context, standard benchmarks may capture broad disparities but fail to reflect the full complexity of gender bias in modern MT. In this paper, we extend recent frameworks on bias evaluation by: (i) introducing a novel measure coined "Prior Bias", capturing a model's default gender assumptions, and (ii) applying the framework to decoder-only MT models. Our results show that, despite their scale and state-of-the-art status, decoder-only models do not generally outperform encoder-decoder architectures on gender-specific metrics; however, post-training (e.g., instruction tuning) not only improves contextual awareness but also reduces the masculine Prior Bias.
Summary / 总结
While Large Language Models achieve state-of-the-art results across a wide range of NLP tasks, they remain prone to systematic biases.