Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Authors: Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar
Venue: ICLR 2026
First: 2025-06-18T05:48:05+00:00 · Latest: 2026-03-06T17:55:20+00:00
Comments: ICLR 2026. Code available at https://github.com/Ksartik/sysformer
Abstract
As large language models (LLMs) are deployed in safety-critical settings, it is essential to ensure that their responses comply with safety standards. Prior research has revealed that LLMs often fail to grasp the notion of safe behaviors, resulting in either unjustified refusals to harmless prompts or the generation of harmful content. While substantial efforts have been made to improve their robustness, existing defenses often rely on costly fine-tuning of model parameters or employ suboptimal heuristic techniques. In this work, we take a novel approach to safeguard LLMs by learning to adapt the system prompts in instruction-tuned LLMs. While LLMs are typically pre-trained to follow a fixed system prompt, we investigate the impact of tailoring the system prompt to each specific user input on the safety of the responses. To this end, we propose $\textbf{Sysformer}$, a trans$\textbf{former}$ model that updates an initial $\textbf{sys}$tem prompt to a more robust system prompt in the LLM input embedding space while attending to the user prompt. While keeping the LLM parameters frozen, the Sysformer is trained to refuse to respond to a set of harmful prompts while responding ideally to a set of safe ones. Through extensive experiments on $5$ LLMs from different families and $2$ recent benchmarks, we demonstrate that Sysformer can significantly enhance the robustness of LLMs, leading to upto $80\%$ gain in the refusal rate on harmful prompts while enhancing the compliance with the safe prompts by upto $90\%$. Results also generalize well to sophisticated jailbreaking attacks, making LLMs upto $100\%$ more robust against different attack strategies. We hope our findings lead to cheaper safeguarding of LLMs and motivate future investigations into designing variable system prompts.
Summary / 总结
As large language models (LLMs) are deployed in safety-critical settings, it is essential to ensure that their responses comply with safety standards.
NOBLE: Accelerating Transformers with Nonlinear Low-Rank Branches
Authors: Ethan Smith
First: 2026-03-06T17:22:04+00:00 · Latest: 2026-03-06T17:22:04+00:00
Comments: 14 pages, 5 figures, 5 tables
Abstract
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers. Unlike LoRA and other parameter-efficient fine-tuning (PEFT) methods, NOBLE is designed for pretraining from scratch. The branch is a permanent part of the architecture as opposed to an adapter for finetuning on top of frozen weights. The branch computes σ(xWdown)Wup where σ is a learnable nonlinearity. We evaluate several activation functions and find that CosNet, a two-layer cosine nonlinearity with learnable frequency and phase with a linear projection in between them in the bottleneck space, performs best. NOBLE achieves substantial improvements with minimal overhead: up to 1.47x step speedup to reach baseline eval loss (up to 32% fewer training steps), with as low as 4% additional parameters and 7% step time overhead, resulting in up to 1.22x net wallclock speedup. Experiments on LLMs (250M and 1.5B parameters), BERT, VQGAN, and ViT consistently show improved training efficiency. We identify one caveat: Mixup/CutMix augmentation interferes with NOBLE's benefits in Imagenet classification along with other stochastic augmentations, but when disabled, ViT also improves. This discrepancy is possibly explained by regularization techniques that encourage smoother fits to the target function while NOBLE may specialize more in sharper aspects of the target function.
Summary / 总结
We introduce NOBLE (Nonlinear lOw-rank Branch for Linear Enhancement), an architectural augmentation that adds nonlinear low-rank branches to transformer linear layers.
How Reliable is Language Model Micro-Benchmarking?
Authors: Gregory Yauney, Shahzaib Saqib Warraich, Swabha Swayamdipta
Venue: ICLR 2026
First: 2025-10-09T18:37:03+00:00 · Latest: 2026-03-06T16:42:35+00:00
Comments: Published at ICLR 2026
Abstract
Micro-benchmarking offers a solution to the often prohibitive time and cost of language model development: evaluate on a very small subset of existing benchmarks. Can these micro-benchmarks, however, rank models as consistently as the full benchmarks they replace? And can they rank models more consistently than selecting a random subset of data points? In many scenarios, we find that the answer is no. We introduce a meta-evaluation measure for micro-benchmarking which investigates how well a micro-benchmark can rank two models as a function of their performance difference on the full benchmark. This approach can determine which model pairs can be ranked correctly by a micro-benchmark, allowing for a finer-grained analysis of the trade-off between micro-benchmark size and reliability. Prior work has suggested selecting as few as 10 examples; we find that no micro-benchmarking method can consistently rank model pairs 3.5 points of accuracy apart on MMLU-Pro or 4 points apart on BIG-bench Hard. In order to consistently rank model pairs with relatively similar performances, we show that often as many as 250 examples must be selected, at which point random sampling is competitive with existing micro-benchmarking methods. When comparing only 8B instruction-tuned models on MMLU-Pro micro-benchmarks with 25 examples, we find that more than half of pairwise comparisons are not likely to be preserved. Our work provides actionable guidance for both micro-benchmark users and developers in navigating the trade-off between evaluation efficiency and reliability.
Summary / 总结
Micro-benchmarking offers a solution to the often prohibitive time and cost of language model development: evaluate on a very small subset of existing benchmarks.
From Prompting to Preference Optimization: A Comparative Study of LLM-based Automated Essay Scoring
Authors: Minh Hoang Nguyen, Vu Hoang Pham, Xuan Thanh Huynh, Phuc Hong Mai, Vinh The Nguyen, Quang Nhut Huynh, Huy Tien Nguyen, Tung Le
First: 2026-03-06T15:59:32+00:00 · Latest: 2026-03-06T15:59:32+00:00
Comments: 19 pages, 10 figures, 7 tables
Abstract
Large language models (LLMs) have recently reshaped Automated Essay Scoring (AES), yet prior studies typically examine individual techniques in isolation, limiting understanding of their relative merits for English as a Second Language (L2) writing. To bridge this gap, we presents a comprehensive comparison of major LLM-based AES paradigms on IELTS Writing Task~2. On this unified benchmark, we evaluate four approaches: (i) encoder-based classification fine-tuning, (ii) zero- and few-shot prompting, (iii) instruction tuning and Retrieval-Augmented Generation (RAG), and (iv) Supervised Fine-Tuning combined with Direct Preference Optimization (DPO) and RAG. Our results reveal clear accuracy-cost-robustness trade-offs across methods, the best configuration, integrating k-SFT and RAG, achieves the strongest overall results with F1-Score 93%. This study offers the first unified empirical comparison of modern LLM-based AES strategies for English L2, promising potential in auto-grading writing tasks. Code is public at https://github.com/MinhNguyenDS/LLM_AES-EnL2
Summary / 总结
Large language models (LLMs) have recently reshaped Automated Essay Scoring (AES), yet prior studies typically examine individual techniques in isolation, limiting understanding of their relative merits for English as a Second Language (L2) writing.
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization
Authors: Yao Ni, Shan Zhang, Piotr Koniusz
Venue: NeurIPS 2024 spotlight
First: 2024-09-25T17:56:00+00:00 · Latest: 2026-03-06T15:00:32+00:00
Comments: Accepted by NeurIPS 2024 as a spotlight
Abstract
Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained transformers to downstream tasks. However, the optimization of tasks performance often comes at the cost of generalizability in fine-tuned models. To address this issue, we theoretically connect smaller weight gradient norms during training and larger datasets to the improvements in model generalization. Motivated by this connection, we propose reducing gradient norms for enhanced generalization and aligning fine-tuned model with the pre-trained counterpart to retain knowledge from large-scale pre-training data. Yet, naive alignment does not guarantee gradient reduction and can potentially cause gradient explosion, complicating efforts to manage gradients. To address such an issue, we propose PACE, marrying generalization of PArameter-efficient fine-tuning with Consistency rEgularization. We perturb features learned from the adapter with the multiplicative noise and ensure the fine-tuned model remains consistent for same sample under different perturbations. Theoretical analysis shows that PACE not only implicitly regularizes gradients for enhanced generalization, but also implicitly aligns the fine-tuned and pre-trained models to retain knowledge. Experimental evidence supports our theories. PACE surpasses existing PEFT methods in visual adaptation tasks (VTAB-1k, FGVC, few-shot learning, domain adaptation) showcasing its potential for resource-efficient fine-tuning. It also improves LoRA in text classification (GLUE) and mathematical reasoning (GSM-8K). The code is available at https://github.com/MaxwellYaoNi/PACE
Summary / 总结
Parameter-Efficient Fine-Tuning (PEFT) effectively adapts pre-trained transformers to downstream tasks.
FLoRG: Federated Fine-tuning with Low-rank Gram Matrices and Procrustes Alignment
Authors: Chuiyang Meng, Ming Tang, Vincent W. S. Wong
First: 2026-02-19T05:35:23+00:00 · Latest: 2026-03-06T07:52:15+00:00
Abstract
Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently. Federated learning (FL) further facilitates this process by enabling collaborative fine-tuning across distributed clients without sharing private data. However, the use of two separate low-rank matrices in LoRA for federated fine-tuning introduces two types of challenges. First, aggregation error can arise from separately aggregating the two low-rank matrices. Second, even if the server aggregates the product of two low-rank matrices, it needs to decompose the aggregated matrix back into low-rank matrices. Since the decomposition is not unique, it can lead to decomposition drift. To tackle the aforementioned challenges, we propose federated low-rank Gram-matrix aggregation (FLoRG), a federated fine-tuning framework which employs a single low-rank matrix for fine-tuning and aggregates its Gram matrix (i.e., the matrix of inner products of its column vectors). FLoRG can eliminate the aggregation error and reduce the communication overhead. It also minimizes the decomposition drift by introducing a Procrustes alignment approach which aligns the decomposed matrix between consecutive fine-tuning rounds for consistent updates. We theoretically analyze the convergence of FLoRG and prove that adopting the Procrustes alignment results in a tighter convergence bound. Experimental results across multiple LLM fine-tuning benchmarks demonstrate that FLoRG outperforms five state-of-the-art baseline schemes by providing higher downstream task accuracy and can reduce the communication overhead by up to 2041$\times$.
Summary / 总结
Parameter-efficient fine-tuning techniques such as low-rank adaptation (LoRA) enable large language models (LLMs) to adapt to downstream tasks efficiently.
Rigidity-Aware Geometric Pretraining for Protein Design and Conformational Ensembles
Authors: Zhanghan Ni, Yanjing Li, Zeju Qiu, Bernhard Schölkopf, Hongyu Guo, Weiyang Liu, Shengchao Liu
First: 2026-03-02T21:32:30+00:00 · Latest: 2026-03-06T07:05:13+00:00
Comments: The Fourteenth International Conference on Learning Representations; Code available at: https://github.com/ZhanghanNi/RigidSSL.git
Abstract
Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures. However, current approaches face three key limitations: (1) Existing methods cannot jointly learn protein geometry and design tasks, where pretraining can be a solution; (2) Current pretraining methods mostly rely on local, non-rigid atomic representations for property prediction downstream tasks, limiting global geometric understanding for protein generation tasks; and (3) Existing approaches have yet to effectively model the rich dynamic and conformational information of protein structures. To overcome these issues, we introduce $\textbf{RigidSSL}$ ($\textit{Rigidity-Aware Self-Supervised Learning}$), a geometric pretraining framework that front-loads geometry learning prior to generative finetuning. Phase I (RigidSSL-Perturb) learns geometric priors from 432K structures from the AlphaFold Protein Structure Database with simulated perturbations. Phase II (RigidSSL-MD) refines these representations on 1.3K molecular dynamics trajectories to capture physically realistic transitions. Underpinning both phases is a bi-directional, rigidity-aware flow matching objective that jointly optimizes translational and rotational dynamics to maximize mutual information between conformations. Empirically, RigidSSL variants improve designability by up to 43% while enhancing novelty and diversity in unconditional generation. Furthermore, RigidSSL-Perturb improves the success rate by 5.8% in zero-shot motif scaffolding and RigidSSL-MD captures more biophysically realistic conformational ensembles in G protein-coupled receptor modeling.
Summary / 总结
Generative models have recently advanced $\textit{de novo}$ protein design by learning the statistical regularities of natural structures.
Improving the accuracy of physics-informed neural networks via last-layer retraining
Authors: Saad Qadeer, Panos Stinis
First: 2026-03-04T23:28:19+00:00 · Latest: 2026-03-06T02:46:39+00:00
Comments: Approved for release by Pacific Northwest National Laboratory
Abstract
Physics-informed neural networks (PINNs) are a versatile tool in the burgeoning field of scientific machine learning for solving partial differential equations (PDEs). However, determining suitable training strategies for them is not obvious, with the result that they typically yield moderately accurate solutions. In this article, we propose a method for improving the accuracy of PINNs by coupling them with a post-processing step that seeks the best approximation in a function space associated with the network. We find that our method yields errors four to five orders of magnitude lower than those of the parent PINNs across architectures and dimensions. Moreover, we can reuse the basis functions for the linear space in more complex settings, such as time-dependent and nonlinear problems, allowing for transfer learning. Our approach also provides a residual-based metric that allows us to optimally choose the number of basis functions employed.
Summary / 总结
Physics-informed neural networks (PINNs) are a versatile tool in the burgeoning field of scientific machine learning for solving partial differential equations (PDEs).
Self-Auditing Parameter-Efficient Fine-Tuning for Few-Shot 3D Medical Image Segmentation
Authors: Son Thai Ly, Hien V. Nguyen
First: 2026-03-06T02:17:12+00:00 · Latest: 2026-03-06T02:17:12+00:00
Abstract
Adapting foundation models to new clinical sites remains challenging in practice. Domain shift and scarce annotations must be handled by experts, yet many clinical groups do not have ready access to skilled AI engineers to tune adapter designs and training recipes. As a result, adaptation cycles can stretch from weeks to months, particularly in few-shot settings. Existing PEFT methods either require manual adapter configuration or automated searches that are computationally infeasible in few-shot 3D settings. We propose SEA-PEFT (SElf-Auditing Parameter-Efficient Fine-Tuning) to automate this process. SEA-PEFT treats adapter configuration as an online allocation problem solved during fine-tuning rather than through manual, fixed-topology choices. SEA-PEFT uses a search-audit-allocate loop that trains active adapters, estimates each adapter's Dice utility by momentarily toggling it off, and then reselects the active set under a parameter budget using a greedy knapsack allocator. Exponential Moving Average and Interquartile Range smoothing, together with a Finite-State Ranking controller, stabilize the loop and improve reliability in high-noise few-shot regimes. On TotalSegmentator and FLARE'22, SEA-PEFT improves mean Dice by 2.4--2.8 points over the strongest fixed-topology PEFT baselines across 1/5/10-shot settings while training <1% of parameters. For reproducibility purposes, we made our code publicly available at https://github.com/tsly123/SEA_PEFT
Summary / 总结
Adapting foundation models to new clinical sites remains challenging in practice.
MIRACL: A Diverse Meta-Reinforcement Learning for Multi-Objective Multi-Echelon Combinatorial Supply Chain Optimisation
Authors: Rifny Rachman, Josh Tingey, Richard Allmendinger, Wei Pan, Pradyumn Shukla, Bahrul Ilmi Nasution
First: 2026-03-05T23:42:26+00:00 · Latest: 2026-03-05T23:42:26+00:00
Abstract
Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives. However, its deployment in dynamic environments is hindered by the need for task-specific retraining and substantial computational cost. We introduce MIRACL (Meta multI-objective Reinforcement leArning with Composite Learning), a hierarchical Meta-MORL framework that allows for a few-shot generalisation across diverse tasks. MIRACL decomposes each task into structured subproblems for efficient policy adaptation and meta-learns a global policy across tasks using a Pareto-based adaptation strategy to encourage diversity in meta-training and fine-tuning. To our knowledge, this is the first integration of Meta-MORL with such mechanisms in combinatorial optimisation. Although validated in the supply chain domain, MIRACL is theoretically domain-agnostic and applicable to broader dynamic multi-objective decision-making problems. Empirical evaluations show that MIRACL outperforms conventional MORL baselines in simple to moderate tasks, achieving up to 10% higher hypervolume and 5% better expected utility. These results underscore the potential of MIRACL for robust, efficient adaptation in multi-objective problems.
Summary / 总结
Multi-objective reinforcement learning (MORL) is effective for multi-echelon combinatorial supply chain optimisation, where tasks involve high dimensionality, uncertainty, and competing objectives.
TCR-EML: Explainable Model Layers for TCR-pMHC Prediction
Authors: Jiarui Li, Zixiang Yin, Zhengming Ding, Samuel J. Landry, Ramgopal R. Mettu
Venue: ICLR 2026
First: 2025-10-05T21:47:48+00:00 · Latest: 2026-03-05T23:12:27+00:00
Comments: [Abstract] Learning Meaningful Representations of Life (LMRL) Workshop at ICLR 2026 (Project Page: https://tcreml.jiarui.li/)
Abstract
T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease. While recent advances in machine learning have improved prediction of TCR-pMHC binding, the most effective approaches are black-box transformer models that cannot provide a rationale for predictions. Post-hoc explanation methods can provide insight with respect to the input but do not explicitly model biochemical mechanisms (e.g. known binding regions), as in TCR-pMHC binding. ``Explain-by-design'' models (i.e., with architectural components that can be examined directly after training) have been explored in other domains, but have not been used for TCR-pMHC binding. We propose explainable model layers (TCR-EML) that can be incorporated into protein-language model backbones for TCR-pMHC modeling. Our approach uses prototype layers for amino acid residue contacts drawn from known TCR-pMHC binding mechanisms, enabling high-quality explanations for predicted TCR-pMHC binding. Experiments of our proposed method on large-scale datasets demonstrate competitive predictive accuracy and generalization, and evaluation on the TCR-XAI benchmark demonstrates improved explainability compared with existing approaches.
Summary / 总结
T cell receptor (TCR) recognition of peptide-MHC (pMHC) complexes is a central component of adaptive immunity, with implications for vaccine design, cancer immunotherapy, and autoimmune disease.
From Raw Corpora to Domain Benchmarks: Automated Evaluation of LLM Domain Expertise
Authors: Nitin Sharma, Thomas Wolfers, Çağatay Yıldız
First: 2025-06-09T11:30:12+00:00 · Latest: 2026-03-05T22:36:20+00:00
Comments: 36 pages, 24 figures. Third version
Abstract
Accurate domain-specific benchmarking of LLMs is essential, specifically in domains with direct implications for humans, such as law, healthcare, and education. However, existing benchmarks are documented to be contaminated and are based on multiple-choice questions, which suffer from inherent biases. To measure domain-specific knowledge in LLMs, we present a deterministic pipeline that transforms raw domain corpora into completion-style benchmarks without relying on other LLMs or costly human annotation. Our approach first extracts domain-specific keywords and related target vocabulary from an input corpus. It then constructs prompt-target pairs where domain-specific words serve as prediction targets. By measuring LLMs' ability to complete these prompts, we provide a direct assessment of domain knowledge at low computational cost. Our pipeline avoids benchmark contamination, enables automated updates with new domain data, and facilitates fair comparisons between base and instruction-tuned (chat) models. We validate our approach by showing that model performances on our benchmark significantly correlate with those on an expert-curated benchmark. We then demonstrate how our benchmark provides insights into knowledge acquisition in domain-adaptive, continual, and general pretraining. Finally, we examine the effects of instruction fine-tuning by comparing base and chat models within our unified evaluation framework. In conclusion, our pipeline enables scalable, domain-specific, LLM-independent, and unbiased evaluation of both base and chat models.
Summary / 总结
Accurate domain-specific benchmarking of LLMs is essential, specifically in domains with direct implications for humans, such as law, healthcare, and education.
AfriMTEB and AfriE5: Benchmarking and Adapting Text Embedding Models for African Languages
Authors: Kosei Uemura, Miaoran Zhang, David Ifeoluwa Adelani
First: 2025-10-27T22:06:43+00:00 · Latest: 2026-03-05T19:06:01+00:00
Comments: Accepted to EACL 2026 (main conference)
Abstract
Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing hallucinations in LLMs. Despite the recent release of massively multilingual MTEB (MMTEB), African languages remain underrepresented, with existing tasks often repurposed from translation benchmarks such as FLORES clustering or SIB-200. In this paper, we introduce AfriMTEB -- a regional expansion of MMTEB covering 59 languages, 14 tasks, and 38 datasets, including six newly added datasets. Unlike many MMTEB datasets that include fewer than five languages, the new additions span 14 to 56 African languages and introduce entirely new tasks, such as hate speech detection, intent detection, and emotion classification, which were not previously covered. Complementing this, we present AfriE5, an adaptation of the instruction-tuned mE5 model to African languages through cross-lingual contrastive distillation. Our evaluation shows that AfriE5 achieves state-of-the-art performance, outperforming strong baselines such as Gemini-Embeddings and mE5.
Summary / 总结
Text embeddings are an essential building component of several NLP tasks such as retrieval-augmented generation which is crucial for preventing hallucinations in LLMs.
LoRA-MME: Multi-Model Ensemble of LoRA-Tuned Encoders for Code Comment Classification
Authors: Md Akib Haider, Ahsan Bulbul, Nafis Fuad Shahid, Aimaan Ahmed, Mohammad Ishrak Abedin
First: 2026-03-04T11:36:32+00:00 · Latest: 2026-03-05T18:19:21+00:00
Abstract
Code comment classification is a critical task for automated software documentation and analysis. In the context of the NLBSE'26 Tool Competition, we present LoRA-MME, a Multi-Model Ensemble architecture utilizing Parameter-Efficient Fine-Tuning (PEFT). Our approach addresses the multi-label classification challenge across Java, Python, and Pharo by combining the strengths of four distinct transformer encoders: UniXcoder, CodeBERT, GraphCodeBERT, and CodeBERTa. By independently fine-tuning these models using Low-Rank Adaptation(LoRA) and aggregating their predictions via a learned weighted ensemble strategy, we maximize classification performance without the memory overhead of full model fine-tuning. Our tool achieved an F1 Weighted score of 0.7906 and a Macro F1 of 0.6867 on the test set. However, the computational cost of the ensemble resulted in a final submission score of 41.20%, highlighting the trade-off between semantic accuracy and inference efficiency.
Summary / 总结
Code comment classification is a critical task for automated software documentation and analysis.
VietJobs: A Vietnamese Job Advertisement Dataset
Authors: Hieu Pham Dinh, Hung Nguyen Huy, Mo El-Haj
Venue: Language Resources and Evaluation Conference (LREC) 2026
First: 2026-03-05T15:12:02+00:00 · Latest: 2026-03-05T15:12:02+00:00
Comments: 10 pages
Abstract
VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalities across Vietnam. The dataset provides extensive linguistic and structured information, including job titles, categories, salaries, skills, and employment conditions, covering 16 occupational domains and multiple employment types (full-time, part-time, and internship). Designed to support research in natural language processing and labour market analytics, VietJobs captures substantial linguistic, regional, and socio-economic diversity. We benchmark several generative large language models (LLMs) on two core tasks: job category classification and salary estimation. Instruction-tuned models such as Qwen2.5-7B-Instruct and Llama-SEA-LION-v3-8B-IT demonstrate notable gains under few-shot and fine-tuned settings, while highlighting challenges in multilingual and Vietnamese-specific modelling for structured labour market prediction. VietJobs establishes a new benchmark for Vietnamese NLP and offers a valuable foundation for future research on recruitment language, socio-economic representation, and AI-driven labour market analysis. All code and resources are available at: https://github.com/VinNLP/VietJobs.
Summary / 总结
VietJobs is the first large-scale, publicly available corpus of Vietnamese job advertisements, comprising 48,092 postings and over 15 million words collected from all 34 provinces and municipalities across Vietnam.
Reclaiming Lost Text Layers for Source-Free Cross-Domain Few-Shot Learning
Authors: Zhenyu Zhang, Guangyao Chen, Yixiong Zou, Yuhua Li, Ruixuan Li
Venue: CVPR 2026
First: 2026-03-05T14:51:52+00:00 · Latest: 2026-03-05T14:51:52+00:00
Comments: CVPR 2026
Abstract
Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where CLIP has recently shown promising results due to its generalizability to downstream tasks. Current works indicate CLIP's text encoder is more suitable for cross-domain tasks, however, we find that \textbf{removing certain middle layers of the text encoder can effectively improve performance in SF-CDFSL}, which we call the Lost Layers. In this paper, we delve into this phenomenon for a deeper understanding. We discover that instead of being harmful for the SF-CDFSL task, the information in these layers is actually beneficial, but visual gaps prevent this useful information from being fully utilized, making these layers seem redundant. Based on this understanding, unlike current works that simply remove these layers, we propose a method to teachs the model to \textbf{re-utilize} information in these lost layers at both the layer and encoder levels, guiding the re-learning of the visual branch under domain shifts. Our approach effectively addresses the issue of underutilized information in the text encoder. Extensive experiments across various settings, backbones (CLIP, SigLip, PE-Core), and tasks (4 CDFSL datasets and 10 Meta-dataset datasets) demonstrate the effectiveness of our method. Code is available at https://github.com/zhenyuZ-HUST/CVPR26-VtT.
Summary / 总结
Source-Free Cross-Domain Few-Shot Learning (SF-CDFSL) focuses on fine-tuning with limited training data from target domains (e.g., medical or satellite images), where CLIP has recently shown promising results due to its generalizability to downstream tasks.
SRasP: Self-Reorientation Adversarial Style Perturbation for Cross-Domain Few-Shot Learning
Authors: Wenqian Li, Pengfei Fang, Hui Xue
First: 2026-03-05T13:03:35+00:00 · Latest: 2026-03-05T13:03:35+00:00
Abstract
Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from a seen source domain to unseen target domains, serving as a key benchmark for evaluating the robustness and transferability of models. Existing style-based perturbation methods mitigate domain shift but often suffer from gradient instability and convergence to sharp minima.To address these limitations, we propose a novel crop-global style perturbation network, termed Self-Reorientation Adversarial \underline{S}tyle \underline{P}erturbation (SRasP). Specifically, SRasP leverages global semantic guidance to identify incoherent crops, followed by reorienting and aggregating the style gradients of these crops with the global style gradients within one image. Furthermore, we propose a novel multi-objective optimization function to maximize visual discrepancy while enforcing semantic consistency among global, crop, and adversarial features. Applying the stabilized perturbations during training encourages convergence toward flatter and more transferable solutions, improving generalization to unseen domains. Extensive experiments are conducted on multiple CD-FSL benchmarks, demonstrating consistent improvements over state-of-the-art methods.
Summary / 总结
Cross-Domain Few-Shot Learning (CD-FSL) aims to transfer knowledge from a seen source domain to unseen target domains, serving as a key benchmark for evaluating the robustness and transferability of models.
Lightweight and Scalable Transfer Learning Framework for Load Disaggregation
Authors: L. E. Garcia-Marrero, G. Petrone, E. Monmasson
First: 2026-03-05T09:43:48+00:00 · Latest: 2026-03-05T09:43:48+00:00
Comments: This work has been submitted to the IEEE for possible publication
Abstract
Non-Intrusive Load Monitoring (NILM) aims to estimate appliance-level consumption from aggregate electrical signals recorded at a single measurement point. In recent years, the field has increasingly adopted deep learning approaches; however, cross-domain generalization remains a persistent challenge due to variations in appliance characteristics, usage patterns, and background loads across homes. Transfer learning provides a practical paradigm to adapt models with limited target data. However, existing methods often assume a fixed appliance set, lack flexibility for evolving real-world deployments, remain unsuitable for edge devices, or scale poorly for real-time operation. This paper proposes RefQuery, a scalable multi-appliance, multi-task NILM framework that conditions disaggregation on compact appliance fingerprints, allowing one shared model to serve many appliances without a fixed output set. RefQuery keeps a pretrained disaggregation network fully frozen and adapts to a target home by learning only a per-appliance embedding during a lightweight backpropagation stage. Experiments on three public datasets demonstrate that RefQuery delivers a strong accuracy-efficiency trade-off against single-appliance and multi-appliance baselines, including modern Transformer-based methods. These results support RefQuery as a practical path toward scalable, real-time NILM on resource-constrained edge devices.
Summary / 总结
Non-Intrusive Load Monitoring (NILM) aims to estimate appliance-level consumption from aggregate electrical signals recorded at a single measurement point.
In-Training Defenses against Emergent Misalignment in Language Models
Authors: David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Esha Afzal, Robin Haselhorst, Lucie Flek, Florian Mai
First: 2025-08-08T12:10:28+00:00 · Latest: 2026-03-05T09:02:55+00:00
Comments: Under review
Abstract
Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EMA): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a way that can be hard to detect from the fine-tuning data alone. We present the first systematic study of in-training safeguards against EMA that are practical for providers who expose fine-tuning via an API: We evaluate whether they a) prevent broad misalignment, b) allow narrow misalignment, c) learn well on benign tasks, and d) remain coherent. We investigate four training regularization interventions: (i) KL-divergence regularization toward a safe reference model, (ii) $\mathcal{l}_2$ distance in feature space, (iii) preventative steering with an evil persona vector, and (iv) interleaving training examples from a general instruct-tuning dataset. We demonstrate that selecting interleaving data by the perplexity gap between aligned and misaligned models yields the best results overall.
Summary / 总结
Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EMA): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain.
VisionPangu: A Compact and Fine-Grained Multimodal Assistant with 1.7B Parameters
Authors: Jiaxin Fan, Wenpo Song
First: 2026-03-05T08:51:33+00:00 · Latest: 2026-03-05T08:51:33+00:00
Abstract
Large Multimodal Models (LMMs) have achieved strong performance in vision-language understanding, yet many existing approaches rely on large-scale architectures and coarse supervision, which limits their ability to generate detailed image captions. In this work, we present VisionPangu, a compact 1.7B-parameter multimodal model designed to improve detailed image captioning through efficient multimodal alignment and high-quality supervision. Our model combines an InternVL-derived vision encoder with the OpenPangu-Embedded language backbone via a lightweight MLP projector and adopts an instruction-tuning pipeline inspired by LLaVA. By incorporating dense human-authored descriptions from the DOCCI dataset, VisionPangu improves semantic coherence and descriptive richness without relying on aggressive model scaling. Experimental results demonstrate that compact multimodal models can achieve competitive performance while producing more structured and detailed captions. The code and model weights will be publicly available at https://www.modelscope.cn/models/asdfgh007/visionpangu.
Summary / 总结
Large Multimodal Models (LMMs) have achieved strong performance in vision-language understanding, yet many existing approaches rely on large-scale architectures and coarse supervision, which limits their ability to generate detailed image captions.
AILS-NTUA at SemEval-2026 Task 3: Efficient Dimensional Aspect-Based Sentiment Analysis
Authors: Stavros Gazetas, Giorgos Filandrianos, Maria Lymperaiou, Paraskevi Tzouveli, Athanasios Voulodimos, Giorgos Stamou
First: 2026-03-05T08:30:59+00:00 · Latest: 2026-03-05T08:30:59+00:00
Abstract
In this paper, we present AILS-NTUA system for Track-A of SemEval-2026 Task 3 on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which encompasses three complementary problems: Dimensional Aspect Sentiment Regression (DimASR), Dimensional Aspect Sentiment Triplet Extraction (DimASTE), and Dimensional Aspect Sentiment Quadruplet Prediction (DimASQP) within a multilingual and multi-domain framework. Our methodology combines fine-tuning of language-appropriate encoder backbones for continuous aspect-level sentiment prediction with language-specific instruction tuning of large language models using LoRA for structured triplet and quadruplet extraction. This unified yet task-adaptive design emphasizes parameter-efficient specialization across languages and domains, enabling reduced training and inference requirements while maintaining strong effectiveness. Empirical results demonstrate that the proposed models achieve competitive performance and consistently surpass the provided baselines across most evaluation settings.
Summary / 总结
In this paper, we present AILS-NTUA system for Track-A of SemEval-2026 Task 3 on Dimensional Aspect-Based Sentiment Analysis (DimABSA), which encompasses three complementary problems: Dimensional Aspect Sentiment Regression (DimASR), Dimensional Aspect Sentiment Triplet Extraction (DimASTE), and Dimensional Aspect Sentiment Quadruplet Prediction (DimASQP) within a multilingual and multi-domain framework.
Curriculum Learning for Efficient Chain-of-Thought Distillation via Structure-Aware Masking and GRPO
Authors: Bowen Yu, Maolin Wang, Sheng Zhang, Binhao Wang, Yi Wen, Jingtong Gao, Bowen Liu, Zimo Zhao, Wanyu Wang, Xiangyu Zhao
First: 2026-02-05T05:27:11+00:00 · Latest: 2026-03-05T07:49:45+00:00
Comments: 22 pages, 12 figures
Abstract
Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often too verbose for smaller models to faithfully reproduce. Existing approaches either compress reasoning into single-step, losing the interpretability that makes CoT valuable. We present a three-stage curriculum learning framework that addresses this capacity mismatch through progressive skill acquisition. First, we establish structural understanding via masked shuffled reconstruction. Second, we apply Group Relative Policy Optimization (GRPO) on masked completion tasks, enabling the model to discover its own balance between accuracy and brevity. Third, we identify persistent failure cases and guide the student to internalize teacher knowledge through targeted rewriting, again optimized with GRPO. Experiments on GSM8K demonstrate that our approach enables Qwen2.5-3B-Base to achieve an 11.29 percent accuracy improvement while reducing output length by 27.4 percent, surpassing both instruction-tuned variants and prior distillation methods.
Summary / 总结
Distilling Chain-of-Thought (CoT) reasoning from large language models into compact student models presents a fundamental challenge: teacher rationales are often too verbose for smaller models to faithfully reproduce.
Osmosis Distillation: Model Hijacking with the Fewest Samples
Authors: Yuchen Shi, Huajie Chen, Heng Xu, Zhiquan Liu, Jialiang Shen, Chi Liu, Shuai Zhou, Tianqing Zhu, Wanlei Zhou
First: 2026-03-05T06:34:06+00:00 · Latest: 2026-03-05T06:34:06+00:00
Abstract
Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources. Meanwhile, dataset distillation has emerged to synthesize a compact dataset that preserves critical information from the original large dataset. Therefore, a combination of transfer learning and dataset distillation offers promising performance in evaluations. However, a non-negligible security threat remains undiscovered in transfer learning using synthetic datasets generated by dataset distillation methods, where an adversary can perform a model hijacking attack with only a few poisoned samples in the synthetic dataset. To reveal this threat, we propose Osmosis Distillation (OD) attack, a novel model hijacking strategy that targets deep learning models using the fewest samples. Comprehensive evaluations on various datasets demonstrate that the OD attack attains high attack success rates in hidden tasks while preserving high model utility in original tasks. Furthermore, the distilled osmosis set enables model hijacking across diverse model architectures, allowing model hijacking in transfer learning with considerable attack performance and model utility. We argue that awareness of using third-party synthetic datasets in transfer learning must be raised.
Summary / 总结
Transfer learning is devised to leverage knowledge from pre-trained models to solve new tasks with limited data and computational resources.
Detection of Illicit Content on Online Marketplaces using Large Language Models
Authors: Quoc Khoa Tran, Thanh Thi Nguyen, Campbell Wilson
First: 2026-03-05T01:15:03+00:00 · Latest: 2026-03-05T01:15:03+00:00
Comments: Accepted for publication in the Proceedings of the 8th International Conference on Natural Language Processing (ICNLP 2026)
Abstract
Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes. Traditional content moderation methods such as manual reviews and rule-based automated systems struggle with scalability, dynamic obfuscation techniques, and multilingual content. Conventional machine learning models, though effective in simpler contexts, often falter when confronting the semantic complexities and linguistic nuances characteristic of illicit marketplace communications. This research investigates the efficacy of Large Language Models (LLMs), specifically Meta's Llama 3.2 and Google's Gemma 3, in detecting and classifying illicit online marketplace content using the multilingual DUTA10K dataset. Employing fine-tuning techniques such as Parameter-Efficient Fine-Tuning (PEFT) and quantization, these models were systematically benchmarked against a foundational transformer-based model (BERT) and traditional machine learning baselines (Support Vector Machines and Naive Bayes). Experimental results reveal a task-dependent advantage for LLMs. In binary classification (illicit vs. non-illicit), Llama 3.2 demonstrated performance comparable to traditional methods. However, for complex, imbalanced multi-class classification involving 40 specific illicit categories, Llama 3.2 significantly surpassed all baseline models. These findings offer substantial practical implications for enhancing online safety, equipping law enforcement agencies, e-commerce platforms, and cybersecurity specialists with more effective, scalable, and adaptive tools for illicit content detection and moderation.
Summary / 总结
Online marketplaces, while revolutionizing global commerce, have inadvertently facilitated the proliferation of illicit activities, including drug trafficking, counterfeit sales, and cybercrimes.
Steering Awareness: Models Can Be Trained to Detect Activation Steering
Authors: Joshua Fonseca Rivera, David Demitri Africa
First: 2025-11-26T13:49:43+00:00 · Latest: 2026-03-04T23:33:38+00:00
Comments: 26 pages, 11 figures, 16 tables
Abstract
Activation steering - adding a vector to a language model's residual stream - is widely used to elicit latent behaviors and to probe safety-relevant properties. Many steering-based evaluations implicitly assume that the model cannot tell when such an intervention has occurred. We test this assumption by fine-tuning models to report (i) whether a steering vector was injected and (ii) which concept was injected, a capability we call steering awareness. Across seven open-source instruction-tuned models, the best achieves 95.5% detection on held-out concepts and 71.2% concept identification, with no false positives on our clean controls. We find that such detection transfers to novel vectors extracted by methods that produce directions aligned with contrastive activation addition, but fail for geometrically dissimilar methods. Crucially, detection does not confer behavioral robustness; detection-trained models are consistently more susceptible to steering in realistic settings than their base counterparts. Mechanistically, steering awareness arises from a distributed transformation that progressively rotates diverse injected vectors into a shared detection direction. These findings suggest that activation steering cannot be assumed to remain an undetectable intervention, with implications for the long-term reliability of steering-based safety evaluations and interpretability techniques more broadly.
Summary / 总结
Activation steering - adding a vector to a language model's residual stream - is widely used to elicit latent behaviors and to probe safety-relevant properties.
Robust Unscented Kalman Filtering via Recurrent Meta-Adaptation of Sigma-Point Weights
Authors: Kenan Majewski, Michał Modzelewski, Marcin Żugaj, Piotr Lichota
First: 2026-03-04T18:27:59+00:00 · Latest: 2026-03-04T18:27:59+00:00
Comments: 8 pages, 3 figures, Submitted to the 29th International Conference on Information Fusion (FUSION 2026)
Abstract
The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT). Conventional weighting schemes, governed by fixed scaling parameters, assume implicit Gaussianity and fail to adapt to time-varying dynamics or heavy-tailed measurement noise. This work introduces the Meta-Adaptive UKF (MA-UKF), a framework that reformulates sigma-point weight synthesis as a hyperparameter optimization problem addressed via memory-augmented meta-learning. Unlike standard adaptive filters that rely on instantaneous heuristic corrections, our approach employs a Recurrent Context Encoder to compress the history of measurement innovations into a compact latent embedding. This embedding informs a policy network that dynamically synthesizes the mean and covariance weights of the sigma points at each time step, effectively governing the filter's trust in the prediction versus the measurement. By optimizing the system end-to-end through the filter's recursive logic, the MA-UKF learns to maximize tracking accuracy while maintaining estimation consistency. Numerical benchmarks on maneuvering targets demonstrate that the MA-UKF significantly outperforms standard baselines, exhibiting superior robustness to non-Gaussian glint noise and effective generalization to out-of-distribution (OOD) dynamic regimes unseen during training.
Summary / 总结
The Unscented Kalman Filter (UKF) is a ubiquitous tool for nonlinear state estimation; however, its performance is limited by the static parameterization of the Unscented Transform (UT).
Position: Vector Prompt Interfaces Should Be Exposed to Enable Customization of Large Language Models
Authors: Liangwei Yang, Shiyu Wang, Haolin Chen, Rithesh Murthy, Ming Zhu, Jielin Qiu, Zixiang Chen, Juntao Tan, Jianguo Zhang, Zhiwei Liu, Wenting Zhao, Silvio Savarese, Caiming Xiong, Huan Wang, Shelby Heinecke
First: 2026-03-04T17:08:47+00:00 · Latest: 2026-03-04T17:08:47+00:00
Abstract
As large language models (LLMs) transition from research prototypes to real-world systems, customization has emerged as a central bottleneck. While text prompts can already customize LLM behavior, we argue that text-only prompting does not constitute a suitable control interface for scalable, stable, and inference-only customization. This position paper argues that model providers should expose \emph{vector prompt inputs} as part of the public interface for customizing LLMs. We support this position with diagnostic evidence showing that vector prompt tuning continues to improve with increasing supervision whereas text-based prompt optimization saturates early, and that vector prompts exhibit dense, global attention patterns indicative of a distinct control mechanism. We further discuss why inference-only customization is increasingly important under realistic deployment constraints, and why exposing vector prompts need not fundamentally increase model leakage risk under a standard black-box threat model. We conclude with a call to action for the community to rethink prompt interfaces as a core component of LLM customization.
Summary / 总结
As large language models (LLMs) transition from research prototypes to real-world systems, customization has emerged as a central bottleneck.
Traces of Social Competence in Large Language Models
Authors: Tom Kouwenhoven, Michiel van der Meer, Max van Duijn
First: 2026-03-04T15:19:27+00:00 · Latest: 2026-03-04T15:19:27+00:00
Abstract
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies. For Large Language Models (LLMs), the reliability and explanatory potential of this test have remained limited due to issues like data contamination, insufficient model details, and inconsistent controls. We address these issues by testing 17 open-weight models on a balanced set of 192 FBT variants (Trott et al. 2023) using Bayesian Logistic regression to identify how model size and post-training affect socio-cognitive competence. We find that scaling model size benefits performance, but not strictly. A cross-over effect reveals that explicating propositional attitudes (X thinks) fundamentally alters response patterns. Instruction tuning partially mitigates this effect, but further reasoning-oriented finetuning amplifies it. In a case study analysing social reasoning ability throughout OLMo 2 training, we show that this cross-over effect emerges during pre-training, suggesting that models acquire stereotypical response patterns tied to mental-state vocabulary that can outweigh other scenario semantics. Finally, vector steering allows us to isolate a think vector as the causal driver of observed FBT behaviour.
Summary / 总结
The False Belief Test (FBT) has been the main method for assessing Theory of Mind (ToM) and related socio-cognitive competencies.
Dutch Metaphor Extraction from Cancer Patients' Interviews and Forum Data using LLMs and Human in the Loop
Authors: Lifeng Han, David Lindevelt, Sander Puts, Erik van Mulligen, Suzan Verberne
First: 2025-11-09T15:49:47+00:00 · Latest: 2026-03-04T15:12:08+00:00
Comments: Ongoing project report, on behalf of 4D PICTURE https://4dpicture.eu/
Abstract
Metaphors and metaphorical language (MLs) play an important role in healthcare communication between clinicians, patients, and patients' family members. In this work, we focus on Dutch language data from cancer patients. We extract metaphors used by patients using two data sources: (1) cancer patient storytelling interview data and (2) online forum data, including patients' posts, comments, and questions to professionals. We investigate how current state-of-the-art large language models (LLMs) perform on this task by exploring different prompting strategies such as chain of thought reasoning, few-shot learning, and self-prompting. With a human-in-the-loop setup, we verify the extracted metaphors and compile the outputs into a corpus named HealthQuote.NL. We believe the extracted metaphors can support better patient care, for example shared decision making, improved communication between patients and clinicians, and enhanced patient health literacy. They can also inform the design of personalized care pathways. We share prompts and related resources at https://github.com/4dpicture/HealthQuote.NL
Summary / 总结
Metaphors and metaphorical language (MLs) play an important role in healthcare communication between clinicians, patients, and patients' family members.
A Bayesian Framework for Active Tactile Object Recognition, Pose Estimation and Shape Transfer Learning
Authors: Haodong Zheng, Andrei Jalba, Raymond H. Cuijpers, Wijnand IJsselsteijn, Sanne Schoenmakers
First: 2024-09-10T23:35:30+00:00 · Latest: 2026-03-04T15:02:27+00:00
Abstract
As humans can explore and understand the world through active touch, similar capability is desired for robots. In this paper, we address the problem of active tactile object recognition, pose estimation and shape transfer learning, where a customized particle filter (PF) and Gaussian process implicit surface (GPIS) is combined in a unified Bayesian framework. Upon new tactile input, the customized PF updates the joint distribution of the object class and object pose while tracking the novelty of the object. Once a novel object is identified, its shape will be reconstructed using GPIS. By grounding the prior of the GPIS with the maximum-a-posteriori (MAP) estimation from the PF, the knowledge about known shapes can be transferred to learn novel shapes. An exploration procedure based on global shape estimation is proposed to guide active data acquisition and terminate the exploration upon sufficient information. Through experiments in simulation, the proposed framework demonstrated its effectiveness and efficiency in estimating object class and pose for known objects and learning novel shapes. Furthermore, it can recognize previously learned shapes reliably.
Summary / 总结
As humans can explore and understand the world through active touch, similar capability is desired for robots.