Flexible Kernels for Protein Property Prediction
Authors: Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan
Venue: ICML 2026
First: 2026-06-09T16:20:36+00:00 · Latest: 2026-06-09T16:20:36+00:00
Comments: 50 pages; to appear at ICML 2026
Abstract
Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.
Summary / 总结
Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge.
Does Reasoning Preserve Alignment? On the Trustworthiness of Large Reasoning Models
Authors: Prajakta Kini, Avinash Reddy, Souradip Chakraborty, Satya Sai Srinath Namburi GNVV, Furong Huang, Amrit Singh Bedi, Alvaro Velasquez
First: 2026-06-09T16:14:27+00:00 · Latest: 2026-06-09T16:14:27+00:00
Abstract
Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance. This conversion is usually optimized for reasoning accuracy, without explicitly preserving the alignment behavior of the instruction-tuned model, such as safe refusal, bias avoidance, and privacy protection. We ask: does this conversion preserve alignment? We study this question through a trustworthiness audit and find that it is not behavior-preserving by default. For a systematic analysis, we compare reasoning models produced via supervised fine-tuning, RL-based post-training, and distillation against matched instruction-tuned baselines across six trustworthiness dimensions: safety, toxicity, stereotyping and bias, machine ethics, privacy, and out-of-distribution robustness. We observe that reasoning models often improve on reasoning benchmarks but exhibit alignment regressions, including increased toxicity, amplified stereotyping, miscalibrated refusal, and contextual privacy leakage. These regressions are consistent with behavioral drift from the instruction-tuned baseline, measured by KL divergence. Overall, our results point to the broader conclusion that trustworthiness metrics are essential for evaluating reasoning models and should be reported alongside gains in reasoning capability.
Summary / 总结
Instruction-tuned LLMs are increasingly converted into reasoning models through post-training to improve multi-step task performance.
What Really Matters for Table LLMs? A Meta-Evaluation of Model and Data Effects
Authors: Naihao Deng, Sheng Zhang, Henghui Zhu, Shuaichen Chang, Jiani Zhang, Alexander Hanbo Li, Chung-Wei Hang, Hideo Kobayashi, Yiqun Hu, Patrick Ng
First: 2025-01-24T18:50:26+00:00 · Latest: 2026-06-09T14:39:39+00:00
Comments: EACL 2026 Findings
Abstract
Table modeling has progressed for decades. In this work, we revisit this trajectory and highlight emerging challenges in the LLM era, particularly the paradox of choice: the difficulty of attributing performance gains amid diverse base models and training sets in the context of table instruction tuning. We replicate four table LLMs by instruction-tuning three foundation models on four existing datasets, yielding 12 models. We then evaluate these models across 16 table benchmarks. Our study is the first to quantitatively disentangle the effects of training data and base model selection, revealing that base model choice plays a more dominant role than the training data itself. Generalization and reasoning remain challenging, inviting future effort on table modeling. Based on our findings, we share our thoughts on the future directions for table modeling.
Summary / 总结
Table modeling has progressed for decades.
ASA: Backbone-Training-Free Representation Engineering for Tool-Calling Agents
Authors: Youjin Wang, Run Zhou, Yingjie Ma, Rong Fu, Jiani Liang, Shuaishuai Cao, Min Huang, Tao Fang, Liangming Pan
Venue: ACL
First: 2026-02-04T14:20:02+00:00 · Latest: 2026-06-09T14:21:07+00:00
Comments: The manuscript consists of 24 pages formatted in the ACL style. Youjin Wang, Run Zhou, and Yingjie Ma contributed equally to this work. Tao Fang and Liangming Pan are the co-corresponding authors
Abstract
Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces. Prompt and schema engineering is easy to deploy but often fragile under distribution shift and strict parsers, while continual parameter-efficient fine-tuning improves reliability at the cost of training, maintenance, and potential forgetting. We identify a critical Lazy Agent failure mode where tool necessity is nearly perfectly decodable from mid-layer activations, yet the model remains conservative in entering tool mode, revealing a representation-behavior gap. We propose Activation Steering Adapter (ASA), a training-free, inference-time controller that performs a single-shot mid-layer intervention and targets tool domains via a router-conditioned mixture of steering vectors with a probe-guided signed gate to amplify true intent while suppressing spurious triggers. On MTU-Bench with Qwen2.5-1.5B, ASA improves strict tool-use F1 from 0.18 to 0.50 while reducing the false positive rate from 0.15 to 0.05, using only about 20KB of portable assets and no weight updates.
Summary / 总结
Adapting LLM agents to domain-specific tool calling remains notably brittle under evolving interfaces.
MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation
Authors: Mehrshad Taji, Arad Mahdinezhad Kashani, Iman Ahmadi, AmirHossein Jadidi, Saina Kashani, Babak Khalaj
First: 2026-02-18T21:28:56+00:00 · Latest: 2026-06-09T14:01:51+00:00
Comments: Some fundemental change in text and codebase
Abstract
Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at https://github.com/iman1234ahmadi/MALLVI .
Summary / 总结
Task planning for robotic manipulation with large language models (LLMs) is an emerging area.
GRID: Scaling Task-Agnostic Inference in Continual Prompt Tuning
Authors: Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji
First: 2025-07-19T19:15:03+00:00 · Latest: 2026-06-09T12:16:28+00:00
Abstract
Prompt-based continual learning (CL) offers a parameter-efficient way to adapt large language models (LLMs) across task sequences. However, existing methods often rely on task-aware inference and maintain an expanding set of task-specific prompts, leading to (1) severe performance degradation on earlier tasks when task identifiers are unavailable for prompt selection at inference time, and (2) limited scalability as task sequence grows. We propose GRID, a unified framework designed to address these challenges. GRID incorporates an output-space-aware decoding mechanism that enhances backward transfer by leveraging representative inputs and automatic label semantic normalization, alongside a gradient-guided prompt selection strategy that compresses less informative prompts into a single aggregated representation for scalable, memory-efficient continual learning. Extensive experiments on long-sequence and negative-transfer benchmarks show that GRID improves backward transfer, achieves competitive forward transfer, and substantially reduces prompt memory across encoder-decoder and decoder-only architectures, including T5, Qwen, and LLaMA. Source code is available at https://github.com/AnushkaTi/GRID.
Summary / 总结
Prompt-based continual learning (CL) offers a parameter-efficient way to adapt large language models (LLMs) across task sequences.
AutoPDE: Reliable Agentic PDE Solving via Explicitly Represented Solver Strategies
Authors: Huanshuo Dong, Keyao Zhang, Hong Wang, Zhezheng Hao, Zhiwei Zhuang, Ziyan Liu, Jiacong Wang, Gengyuan Liu, Xin Jin
First: 2026-06-09T12:02:58+00:00 · Latest: 2026-06-09T12:02:58+00:00
Abstract
Numerical solvers for partial differential equations (PDEs) are core computational tools in science and engineering. Building reliable PDE solvers requires not only executable code, but a numerical solver strategy, a set of decisions about discretization, stabilization, solver configuration, and resolution control, that matches the PDE structure. Recent LLM-based coding agents have begun to reduce the programming burden by generating and debugging solver implementations. However, they typically move directly from a PDE problem to solver code, leaving the solver strategy implicit in implementation details. Feedback from a failed solve is therefore routed back to code edits rather than to the underlying strategy, so numerical decisions remain hard to check before code is generated and hard to revise using numerical evidence when it fails. To address this limitation, we propose AutoPDE, a code agent that maintains the solver strategy as an explicitly represented object throughout the solving process: an independent, inspectable object that is built before any code is written and can be revised, using numerical evidence, whenever a solve fails. AutoPDE builds and maintains this object in three stages, all drawing from a library of reusable PDE-solving skills: PDE analysis identifies the equation type and algebraic structure; numerical method selection chooses a numerical method that matches the analysis result and commits to a discretization, stabilization, and linear solver accordingly; and adaptive tuning runs low-cost pilot solves to calibrate resolution and tolerances under the prescribed accuracy and runtime budget. We evaluate AutoPDE on the PDE Agent Bench, where experimental results show that AutoPDE achieves a pass rate of $54.5%$, improving over the strongest baseline by $14.2$ percentage points.
Summary / 总结
Numerical solvers for partial differential equations (PDEs) are core computational tools in science and engineering.
AgentPLM: Agentic Protein Language Models with Reasoning-Augmented Decoding for Protein Sequence Design
Authors: Sahil Rahman, Maxx Richard Rahman
Venue: ICML 2026
First: 2026-06-01T15:35:02+00:00 · Latest: 2026-06-09T09:11:47+00:00
Abstract
Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints. We introduce AgentPLM, which addresses this by equipping a pre-trained PLM with i) Reasoning-Augmented Decoding (RAD), which interleaves autoregressive generation with tool calls (ESMFold, FoldX, AutoDock Vina), and ii) Contrastive Agent Policy Optimisation (CAPO), a trajectory-level extension of direct preference optimisation that trains the policy end-to-end to learn when oracle feedback is informative rather than merely imitating high-fitness sequences. We evaluate AgentPLM on benchmark tasks spanning de novo enzyme design, antibody optimisation, thermostability, PPI interface design, and zero-shot fitness prediction with standardised oracle APIs and controlled sequence-identity splits. AgentPLM achieves state-of-the-art results with a gain in antibody top-10% hit rate over the strongest passive baseline, providing mechanistic evidence of online error correction without explicit backtracking.
Summary / 总结
Protein language models (PLMs) are passive oracles: they generate sequences in a single forward pass with no mechanism to consult external biophysical feedback or redirect generation when a candidate violates thermodynamic or structural constraints.
Small Data, Big Noise: Adversarial Training for Robust Parameter-Efficient Fine-Tuning
Authors: Eitan Cohen, Idan Simai, Uri Shaham
Venue: ACL 2026
First: 2026-06-09T09:11:33+00:00 · Latest: 2026-06-09T09:11:33+00:00
Comments: Accepted to Findings of ACL 2026
Abstract
Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks. However, current PEFT methods often struggle with robustness to noise and performance degradation on limited training data. We propose SDBN (Small Data Big Noise), a unified framework that brings adversarial training to PEFT - a combination that remains less studied in the PEFT setting despite its complementary strengths - to enhance model robustness and generalization, outperforming alternative approaches. We also introduce two variants of the method that use discrete uncertainty sets: SDBN-h, which enumerates character-level edits and selects worst-case variants using gradients, and SDBN-p, which uses LLM-generated variants for robust optimization in generative tasks. Experiments across multiple benchmarks reveal substantial improvements, particularly in low-resource settings and under both word-level and character-level corruptions. This framework addresses the less explored intersection of adversarial training and parameter-efficient adaptation, without introducing additional parameters or only modest computational overhead, making PEFT deployments more reliable in real-world scenarios where data scarcity and linguistic variability often coexist
Summary / 总结
Parameter-Efficient Fine-Tuning (PEFT) has become essential for adapting foundation models to downstream NLP tasks.
Advancing the State-of-the-Art in Empirical Privacy Auditing
Authors: Nicole Mitchell, Galen Andrew, Arun Ganesh, Brendan McMahan, Peter Kairouz
First: 2026-06-09T06:50:49+00:00 · Latest: 2026-06-09T06:50:49+00:00
Abstract
Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership inference (MI) or reconstruction attacks. A key challenge in EPA is designing ``canary'' examples that are mixed with the privacy-sensitive training data. We propose generating synthetic canaries via high-temperature sampling ($T \geq 0.8$) from LLMs, using prompts tailored to the privacy-sensitive training data. These canaries act as high-influence outliers, ensuring high identifiability and hence strong audits. Further, since the canaries are themselves non-private, they are inspectable and can be inserted with repetition without jeopardizing the privacy of the real data. An important use of models fine-tuned on privacy-sensitive data is the generation of synthetic data. This also comes with privacy risk. We introduce a powerful synthetic data audit based on fine-tuning an auxiliary model on the synthetic data. Auditing the auxiliary model for the original canaries then provides a strong estimate of the privacy leakage through the synthetic data. Finally, leveraging our strong auditing methodologies, we perform a systematic investigation into the interacting effects of model capacity and canary entropy on memorization.
Summary / 总结
Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples.
Which LoRA? An Empirical Study on the Effectiveness of LoRA Techniques During Multilingual Instruction Tuning
Authors: Thamali Wijewardhana, Napoleon H. Reyes, Surangika Ranathunga
First: 2026-06-09T05:10:04+00:00 · Latest: 2026-06-09T05:10:04+00:00
Abstract
We investigate whether commonly available LoRA variants have an advantage over basic LoRA in multilingual instruction tuning. Experiments involving LoRA and four other variants on two datasets across diverse target languages show that there is no significant advantage in using more complex LoRA variants instead of basic LoRA, with respect to balancing cross-lingual transfer and knowledge retention. An analysis of hidden embeddings reveal that layer-wise language representation remains largely similar across LLMs fine-tuned with different LoRA techniques, suggesting that architectural novelty of LoRA techniques may not translate into better cross-lingual adaptation.
Summary / 总结
We investigate whether commonly available LoRA variants have an advantage over basic LoRA in multilingual instruction tuning.
Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability
Authors: Yash Aggarwal, Atmika Gorti, Vinija Jain, Aman Chadha, Krishnaprasad Thirunarayan, Manas Gaur
First: 2026-05-04T23:12:32+00:00 · Latest: 2026-06-09T02:10:17+00:00
Abstract
Large language models (LLMs) are increasingly deployed in settings that require nuanced ethical reasoning, yet existing bias evaluations treat model outputs as simply "biased" or "unbiased." This binary framing misses the gradual, context-sensitive way bias actually emerges. We address this gap in two stages: behavioral profiling and mechanistic validation. In the behavioral stage, we introduce the Moral Sensitivity Index (MSI), a metric that quantifies the probability of biased output across a graduated, seven-tier stress test ranging from abstract numerical problems to scenarios rooted in historical and socioeconomic injustice. Evaluating four leading models (Claude 3.5, Qwen 3.5, Llama 3, and Gemini 1.5), we identify distinct behavioral signatures shaped by alignment design: for instance, Gemini 1.5 reaches 72.7% MSI by Tier 5 under socioeconomic framing, while Claude exhibits sharp suppression consistent with identity-based safety training. We then verify these behavioral patterns mechanistically. We select criminal-bias scenarios, which produced the highest MSI scores across models, as probes and apply logit lens, attention analysis, activation patching, and semantic probing to a controlled set of six models spanning three capability tiers: small language models (SLMs), instruction-tuned base models, and reasoning-distilled variants. Circuit-level analysis reveals a U-curve of bias: SLMs exhibit strong criminal bias; scaling to instruction-tuned models eliminates it; reasoning distillation reintroduces bias to SLM-like levels despite identical parameter counts, suggesting distillation compresses reasoning traces in ways that reactivate shallow statistical associations. Critically, the socially loaded cues that drive high MSI scores activate the same bias-driving circuits identified mechanistically, providing cross-stage validation.
Summary / 总结
Large language models (LLMs) are increasingly deployed in settings that require nuanced ethical reasoning, yet existing bias evaluations treat model outputs as simply "biased" or "unbiased." This binary framing misses the gradual, context-sensitive way bias actually emerges.
Domain Adapted Large Language Models for Additive Manufacturing
Authors: Peter Pak, Amir Barati Farimani
First: 2026-03-23T14:28:10+00:00 · Latest: 2026-06-08T22:41:31+00:00
Abstract
This work presents a collection of multi-modal domain adapted large language models built upon the instruction tuned variants of open weight models (Gemma 3, Qwen 3, Gemma 4) using a relatively small dataset of around 50 million tokens. The dataset consists of open-access additive manufacturing journal articles with data extracted for the domain adaptive pretraining and visual instruction tuning processes. Various stages of the developed model are evaluated with the Additive-Manufacturing-Benchmark which consists of additive manufacturing domain specific tasks compiled published resources. Domain adapted and instruction tuned models exhibit proficiency in both language and vision based tasks, achieving accuracies upwards of 90% in general additive manufacturing knowledge. This domain adaptive pretraining and instruction tuning strategy outline an accessible specialization method for large language models to a domain such as additive manufacturing.
Summary / 总结
This work presents a collection of multi-modal domain adapted large language models built upon the instruction tuned variants of open weight models (Gemma 3, Qwen 3, Gemma 4) using a relatively small dataset of around 50 million tokens.
A Source Domain is All You Need: Source-Only Cross-OS Transfer Learning for APT Anomaly Detection via Semantic Alignment and Optimal Transport
Authors: Sidahmed Benabderrahmanea, Petko Valtchev, James Cheney, Talal Rahwan
First: 2026-06-08T22:13:42+00:00 · Latest: 2026-06-08T22:13:42+00:00
Abstract
Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior. These challenges are amplified in cross-operating-system (cross-OS) settings, where a detector trained on one source platform must be deployed on an unlabeled target platform without access to target-domain labels. We study this source-only cross-OS APT detection problem using system-level provenance traces and propose a transport-based framework for ranking anomalous target processes under zero target supervision. The framework abstracts process behavior into structured natural-language descriptions, embeds them using pretrained language models, and constructs a source-normal reference for target scoring. It combines three evidence channels: semantic deviation from source-normal prototypes, structural deviation captured by graph autoencoding, and geometric deviation measured through Optimal Transport (OT). The main contribution is an OT-based barycentric anomaly score that projects target embeddings onto the source-normal manifold and quantifies residual transport mismatch. We further introduce entropy-weighted, angle-aware, and density-aware OT variants to capture uncertainty, directional drift, and sparse-support behavior. Evaluation on DARPA Transparent Computing data spanning Linux, Windows, BSD, and Android, across two APT scenarios and twelve cross-OS transfer pairs, shows that the proposed framework improves ROC-AUC and nDCG over source-only anomaly-detection baselines. The results demonstrate that source-only provenance modeling, combined with semantic abstraction and OT-based anomaly scoring, can support practical cross-platform APT detection without target-domain supervision.
Summary / 总结
Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior.
Fisher-Guided Progressive Parameter Selection for Adaptive Fine-Tuning
Authors: Ghodsiyeh Rostami, Po-Han Chen, Mahdi S. Hosseini
First: 2026-06-08T21:35:11+00:00 · Latest: 2026-06-08T21:35:11+00:00
Abstract
Parameter-efficient fine-tuning (PEFT) aims to adapt pretrained models with a small trainable parameter subset, however, most existing methods choose this subset from fixed architectural heuristics rather than using dynamic, task-aware criteria. We introduce \textbf{FisherAdapTune}, a Fisher-guided Adaptive Fine-Tuning framework that progressively selects parameter groups by tracking the temporal drift of their Fisher geometry. Starting from a PAC-Bayesian view of fine-tuning, we decompose the generalization error bound into Fisher-weighted update costs and show that parameter groups whose curvature contribution has stabilized can be frozen to reduce the error bound without interrupting the remaining adaptation dynamics. FisherAdapTune formulates this criterion with a scale-invariant Jensen-Shannon distance between consecutive Fisher distributions, yielding an adaptive active parameter set. We evaluate our approach on a downstream segmentation task, and results show FisherAdapTune improves the in-distribution performance and zero-shot transfer in multiple settings, validating that Fisher structural drift is a useful signal for efficient, task-aware adaptation. We release our \href{https://github.com/AtlasAnalyticsLab/FisherAdapTune}{code} publicly to enable further application of our proposed approach.
Summary / 总结
Parameter-efficient fine-tuning (PEFT) aims to adapt pretrained models with a small trainable parameter subset, however, most existing methods choose this subset from fixed architectural heuristics rather than using dynamic, task-aware criteria.
Divide-and-Conquer Modeling for the CTF-4-Science Lorenz Benchmark
Authors: Shundong Li
First: 2026-06-08T19:08:22+00:00 · Latest: 2026-06-08T19:08:22+00:00
Abstract
This work presents a divide-and-conquer modeling strategy for the CTF-4-Science Lorenz benchmark, which evaluates chaotic-system prediction across twelve hidden scores and five scenario families: clean forecasting, noisy reconstruction, noisy-input forecasting, few-shot learning, and parametric generalization. Rather than forcing one model class to handle all regimes, the final system matched each prediction block to the evaluation behavior of its task group. The main contributions are: smoothing-based reconstruction for noisy full-trajectory denoising; NG-RC/NVAR models tuned for noisy long-time attractor forecasting; a fitted Lorenz transition correction restricted to the sensitive clean short-time prefix; and a parametric prefix blend for the interpolation task. The resulting system with final public score of 79.63 shows that bounded, scenario-specific updates can outperform broad model replacement on mixed chaotic forecasting benchmarks.
Summary / 总结
This work presents a divide-and-conquer modeling strategy for the CTF-4-Science Lorenz benchmark, which evaluates chaotic-system prediction across twelve hidden scores and five scenario families: clean forecasting, noisy reconstruction, noisy-input forecasting, few-shot learning, and parametric generalization.
VFUSE: Virulent Feature Understanding with Sparse autoEncoders
Authors: Michael Yu, Matthew L. Olson
First: 2026-06-08T18:54:31+00:00 · Latest: 2026-06-08T18:54:31+00:00
Abstract
Generative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of hazardous proteins. In this work, we introduce VFUSE (Virulent Feature Understanding with Sparse autoEncoders), a mechanistic interpretability approach that trains SAEs on diffusion-transformer activations to audit protein models for hazard-aware features. We apply VFUSE to RoseTTAFold3 and RFDiffusion3, popular open-weight models for protein folding and synthesis. We find that for certain blocks, linear probes detect hazardous designs significantly better when fit in the SAE latent space over the original model's representations: improving interpretability without sacrificing model performance. Furthermore, we identify monosemantic features from the SAE that fire only on hazardous designs at up to AUROC $0.84$ ($q < 10^{-13}$). To our knowledge this is the first SAE trained on an all-atom diffusion model and the first feature-level virulence audit of a protein design model, paving the way towards safe and interpretable protein design.
Summary / 总结
Generative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of hazardous proteins.
BenSyc: Benchmarking Conversational Sycophancy and Human Alignment in LLMs for Bengali Contexts
Authors: Kazi Noshin, Sajib Acharjee Dip, Ranat Das Prangon, Fardin Hassan Tamim, Syed Ishtiaque Ahmed, Liqing Zhang, Sharifa Sultana
First: 2026-06-08T18:37:23+00:00 · Latest: 2026-06-08T18:37:23+00:00
Abstract
Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment. Existing sycophancy research primarily focuses on factual agreement and instruction-following settings, leaving culturally grounded conversational sycophancy underexplored. We introduce BenSyc, the first benchmark for studying conversational sycophancy in Bengali social contexts. Starting from 11,840 Reddit posts and 170k comments collected from communities across Bangladesh and West Bengal, we construct a human-validated benchmark with binary labels and a fine-grained five-level taxonomy spanning Invalidation, Neutral, Support, Validation, and Escalation. We evaluate more than 15 open and proprietary LLMs on conversational alignment classification and response generation tasks. Results show that distinguishing empathetic support from reinforcement-oriented validation remains challenging even for frontier instruction-tuned models: the best system achieves only 61.8 Macro-F1 on binary detection and 61.7 Macro-F1 on five-class classification. In generation settings, several models frequently produce strongly validating or escalatory responses in emotionally charged situations. Our findings highlight substantial variation across model families and conversational behaviors, underscoring the importance of culturally grounded multilingual benchmarks for evaluating socially aligned conversational AI systems.
Summary / 总结
Large language models (LLMs) increasingly participate in emotionally sensitive social conversations, where responses may shift from balanced support toward excessive validation or escalatory alignment.
Data Synthesis and Parameter-Efficient Fine-Tuning for Low-Resource NMT: A Case Study on Q'eqchi' Mayan
Authors: Alexander Chulzhanov, Soeren Eberhardt, Arjun Mukherjee
First: 2026-06-08T17:29:08+00:00 · Latest: 2026-06-08T17:29:08+00:00
Comments: Accepted to the 29th International Conference on Text, Speech and Dialogue (TSD 2026). This version of the contribution has been accepted for publication, after peer review but is not the Version of Record and does not reflect post-acceptance improvements, or any corrections
Abstract
Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping. To ensure data sovereignty, this study introduces a data synthesis methodology to bootstrap NMT models without scraping target-language parallel text. Focusing on Q'eqchi' Mayan, we transformed community-sourced dictionaries into a massive synthetic corpus, utilizing Parameter-Efficient Fine-Tuning (PEFT) via LoRA adapters on an mT5-base model.
In-domain evaluation demonstrates high structural acquisition (BLEU 42.02), proving that synthetic constraints effectively teach complex agglutinative morphology and VOS word order. However, evaluation against an organic glossary reveals a structural-semantic gap (BLEU 0.59), where the model maintains grammatical integrity but lacks the lexical grounding of natural language. The model exhibits overfitting to the constrained structural variance of the synthetic templates; despite high semantic entropy in the pipeline, it struggles with the syntactic fluidity of natural language, forcing organic inputs into rigid learned patterns. Furthermore, an ablation study utilizing a Multi-Task Learning architecture resulted in negative transfer, suggesting that auxiliary tasks competed for limited parameter capacity within the LoRA adapters, causing over-optimization for synthetic markers at the expense of organic flexibility. Ultimately, we establish that synthetic bootstrapping is a highly effective structural primer, but requires authentic data for semantic refinement via Curriculum Learning.
Summary / 总结
Neural machine translation for digitally low-resource Indigenous languages is often hindered by extreme data scarcity, prompting reliance on extractive web-scraping.
PsychoSafe: Eliciting Psychologically-Informed Refusals in Large Language Models
Authors: Gianluca Barmina, Federico Torrielli, Sven Harms, Jacob Nielsen, Felix Mächtle, Stine Lyngsø Beltoft, Peter Schneider-Kamp, Thomas Eisenbarth, Lukas Galke Poech, Anne Lauscher
First: 2026-06-08T16:19:18+00:00 · Latest: 2026-06-08T16:19:18+00:00
Abstract
Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention. However, refusals themselves can be helpful. In high-risk interactions involving crisis, coercion, or escalating intent, blunt non-compliance may prevent direct harm while still failing to support the needs of the person behind the request. We present PsychoSafe, a psychologically-informed refusal framework that reframes refusal as structured supportive communication grounded in evidence-based intervention strategies. To develop PsychoSafe, we construct a corpus of 8019 prompt-response pairs spanning five psychologically salient risk domains and apply prompting and parameter-efficient fine-tuning to Qwen 3.5 27B. On a balanced validation set of 500 prompts, evaluated with an LLM judge and validated through human ratings, PsychoSafe prompting improves overall refusal quality by 28.1% over a generic baseline, with particularly strong gains in external resource referral (+46.8%) and psychological grounding (+34.8%), while preserving downstream performance on non-refusal tasks. Fine-tuning achieves near-perfect refusal and resource-referral rates but reduces response relevance. Additional evaluations on SORRY-Bench and XSTest show strong in-domain robustness but limited out-of-domain generalization, suggesting that future work should diversify fine-tuning data to help models apply interventions selectively rather than schematically.
Summary / 总结
Large language models (LLMs) routinely face requests that should be refused, creating a trade-off between helpfulness and harm prevention.
The Sample Complexity of Parameter-Free Stochastic Convex Optimization
Authors: Jared Lawrence, Ari Kalinsky, Hannah Bradfield, Yair Carmon, Oliver Hinder
First: 2025-06-12T22:14:49+00:00 · Latest: 2026-06-08T16:05:40+00:00
Comments: Accepted for publication in JMLR
Abstract
We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown. We pursue two strategies. First, we develop a reliable model selection method that avoids overfitting to the validation set. This method allows us to generically tune the learning rate of stochastic optimization methods to match the optimal known-parameter sample complexity up to log log factors. Second, we develop a regularization-based method that is specialized to the case that only the distance to optimality is unknown. More specifically, it uses norm-regularized empirical risk minimization to estimate the distance to optimality to within a constant factor, allowing known-parameter stochastic optimization methods to achieve optimal sample complexity. This method provides perfect adaptability to unknown distance to optimality, demonstrating a separation between the sample and computational complexity of parameter-free stochastic convex optimization. Combining these two methods allows us to simultaneously adapt to multiple problem structures.
Experiments performing few-shot learning on CIFAR-10 by fine-tuning CLIP models and prompt engineering Gemini to count shapes indicate that our reliable model selection method can help mitigate overfitting to small validation sets.
Summary / 总结
We study the sample complexity of stochastic convex optimization when problem parameters such as the distance to optimality and the Lipschitz constant are unknown.
Toward automatic generation of control structures for process flow diagrams with large language models
Authors: Edwin Hirtreiter, Lukas Schulze Balhorn, Artur M. Schweidtmann
Venue: AIChE Journal, Volume 70, Issue 1, January 2024, e18259
First: 2022-10-26T10:03:15+00:00 · Latest: 2026-06-08T15:07:16+00:00
Abstract
Developing Piping and Instrumentation Diagrams (P&IDs) is a crucial step during process development. We propose a data-driven method for the prediction of control structures. Our methodology is inspired by end-to-end transformer-based human language translation models. We cast the control structure prediction as a translation task where Process Flow Diagrams (PFDs) without control structures are translated to PFDs with control structures. We represent the topology of PFDs as strings using the SFILES 2.0 notation. We pretrain our model using generated PFDs to learn the grammatical structure. Thereafter, the model is fine-tuned leveraging transfer learning on real PFDs. The model achieved a top-5 accuracy of 74.8% on 10,000 generated PFDs and 89.2% on 100,000 generated PFDs. These promising results show great potential for AI-assisted process engineering. The tests on a dataset of 312 real PFDs indicate the need for a larger PFD dataset for industry applications and hybrid artificial intelligence solutions.
Summary / 总结
Developing Piping and Instrumentation Diagrams (P&IDs) is a crucial step during process development.
Learning from flowsheets: A generative transformer model for autocompletion of flowsheets
Authors: Gabriel Vogel, Lukas Schulze Balhorn, Artur M. Schweidtmann
Venue: Computers and Chemical Engineering Volume 171, March 2023, 108162
First: 2022-08-01T13:43:58+00:00 · Latest: 2026-06-08T14:33:06+00:00
Abstract
We propose a novel method enabling autocompletion of chemical flowsheets. This idea is inspired by the autocompletion of text. We represent flowsheets as strings using the text-based SFILES 2.0 notation and learn the grammatical structure of the SFILES 2.0 language and common patterns in flowsheets using a transformer-based language model. We pre-train our model on synthetically generated flowsheet topologies to learn the flowsheet language grammar. Then, we fine-tune our model in a transfer learning step on real flowsheet topologies. Finally, we use the trained model for causal language modeling to autocomplete flowsheets. Eventually, the proposed method can provide chemical engineers with recommendations during interactive flowsheet synthesis. The results demonstrate a high potential of this approach for future AI-assisted process synthesis but also reveal the limitations at the present state and the next steps that need to be taken to deploy this technique in realistic flowsheet synthesis scenarios.
Summary / 总结
We propose a novel method enabling autocompletion of chemical flowsheets.
Emergence of Context Characteristics Sensitivity in Large Language Models
Authors: Nadya Yuki Wangsajaya, Haeun Yu, Isabelle Augenstein
First: 2026-06-08T14:11:07+00:00 · Latest: 2026-06-08T14:11:07+00:00
Abstract
During instruction fine-tuning (IFT), large language models (LLMs) learn to follow instructions by using the provided context to answer a query. While prior work has studied how context characteristics correlate with context usage by the LLM, this analysis has been limited to inference time, leaving open how these relationships are acquired in the first place. Here, we measure how models' sensitivity to such characteristics shifts across successive IFT stages: supervised fine-tuning (SFT), direct preference optimization (DPO), and reinforcement learning with verifiable rewards (RLVR). Experiments across four models and three datasets show that SFT makes models more likely to use contexts that are easy to understand, such as containing high length, context-query similarity, and fluency. Post-SFT dynamics may either reinforce or resolve these preferences depending on the training dataset. Our findings reveal that context usage is actively reshaped at each IFT stage, and designing a balanced IFT dataset is important in ensuring robust context utilization of instruction-tuned models.
Summary / 总结
During instruction fine-tuning (IFT), large language models (LLMs) learn to follow instructions by using the provided context to answer a query.
LargeMonitor: Monitoring Online Task-Free Continual Learning via Large Pretrained Models
Authors: Mingqi Yuan, Xiaoquan Sun, Shihao Luo, Jiayu Chen
First: 2026-06-08T12:41:15+00:00 · Latest: 2026-06-08T12:41:15+00:00
Abstract
Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers. Existing online TFCL paradigms primarily rely on parameter-efficient prompt tuning or dynamic structure expansion driven by training-coupled optimization dynamics, such as empirical loss fluctuations or evolving latent distances. As a result, these training-coupled solvers remain agnostic to the structural origins of distribution drift, mechanically enforcing a fixed strategy across fundamentally distinct streaming variations. To address this gap, we propose LargeMonitor, a framework that leverages large pretrained foundation models to autonomously orchestrate task-free continuous adaptation. Specifically, LargeMonitor introduces a decoupled detection module utilizing the frozen, stable representation space of large vision models (LVMs) to achieve robust, zero-shot drift detection without training-dependent interference or brittle threshold tuning. Upon a confirmed drift, the framework activates a context-aware diagnostic module driven by large multimodal models (LMMs) to interpret the precise semantic etiologies of the stream variation (e.g., novel class emergence vs. environmental domain shift). This dual-stage capability empowers the continuous learner to dynamically deploy adaptive and shift-specific optimization strategies. Extensive experiments across multiple TFCL settings and benchmarks demonstrate that LargeMonitor achieves precise, robust detection and diagnosis of complex data streams while consistently improving the performance of existing online TFCL algorithms.
Summary / 总结
Online task-free continual learning (TFCL) requires intelligent agents to sequentially accumulate knowledge from an unbounded, non-stationary data stream under strict single-pass constraints and without any explicit task identifiers.
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Authors: Bartłomiej Marek, Lorenzo Rossi, Vincent Hanke, Xun Wang, Michael Backes, Franziska Boenisch, Adam Dziedzic
Venue: ICLR 2026 Oral
First: 2026-06-08T12:21:02+00:00 · Latest: 2026-06-08T12:21:02+00:00
Comments: Accepted at ICLR 2026 (Oral)
Abstract
Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees. However, its practical effectiveness remains unclear, partly due to LLM pretraining, where overlaps and interdependencies with adaptation data can undermine privacy despite DP efforts. To analyze this issue in practice, we investigate privacy risks under DP adaptations in LLMs using state-of-the-art attacks such as robust membership inference and canary data extraction. We benchmark these risks by systematically varying the adaptation data distribution, from exact overlaps with pretraining data, through in-distribution (IID) cases, to entirely out-of-distribution (OOD) examples. Additionally, we evaluate how different adaptation methods and different privacy regimes impact the vulnerability. Our results show that distribution shifts strongly influence privacy vulnerability: the closer the adaptation data is to the pretraining distribution, the higher the practical privacy risk at the same theoretical guarantee, even without direct data overlap. We find that parameter-efficient fine-tuning methods, such as LoRA, achieve the highest empirical privacy protection for OOD data. Our benchmark identifies key factors for achieving practical privacy in DP LLM adaptation, providing actionable insights for deploying customized models in sensitive settings. Looking forward, we propose a structured framework for holistic privacy assessment beyond adaptation privacy, to identify and evaluate risks across the full pretrain-adapt pipeline of LLMs.
Summary / 总结
Recent work has applied differential privacy (DP) to adapt large language models (LLMs) for sensitive applications, offering theoretical guarantees.
Distilling Safe LLM Systems via Soft Prompts for On Device Settings
Authors: Motasem Alfarra, Cristina Pinneri, Dana Kianfar, Mohammed Almousa, Christos Louizos
Venue: 42nd Conference on Uncertainty in Artificial Intelligence 2026
First: 2026-06-08T12:03:51+00:00 · Latest: 2026-06-08T12:03:51+00:00
Comments: Accepted to UAI 2026
Abstract
Deploying safe large language models (LLMs) on resource-constrained edge devices presents a critical challenge: while dual-model systems combining LLMs with guard models provide effective safety guarantees, their substantial memory and computational demands make them prohibitively expensive for on-device deployment. This paper presents a comprehensive study of parameter-efficient safety alignment methods for resource-constrained settings. Through systematic evaluation across multiple LLM architectures, training objectives, and parameter-efficient fine-tuning approaches, we identify that soft prompts combined with distillation-based training consistently outperform alternative methods. We introduce distillation frameworks based on total variation and KL divergence that effectively transfer safety behaviors from guard models into learned soft prompts. Our evaluations on various benchmarks demonstrate that this combination achieves superior safety-usefulness trade-offs compared to LoRA adapters, steering vectors, and direct optimization methods, while requiring minimal additional memory and compute at inference time. These findings establish soft prompt distillation as the preferred approach for safety alignment in on-device LLM deployment.
Summary / 总结
Deploying safe large language models (LLMs) on resource-constrained edge devices presents a critical challenge: while dual-model systems combining LLMs with guard models provide effective safety guarantees, their substantial memory and computational demands make them prohibitively expensive for on-device deployment.
Beyond Humans: Multispecies Animal Face Recognition Using Transfer Learning
Authors: Maria De Marsico, Anil K. Jain, Annalaura Miglino
First: 2026-06-08T11:27:11+00:00 · Latest: 2026-06-08T11:27:11+00:00
Comments: This paper extends the work published in the proceedings of CAIP 2025 conference: 'Adapting to the Wild: From Human Face to Animal Face Recognition' by De Marsico, M., Jain, A. K., Miranda, M., & Orlando, A
Abstract
Individual animal recognition can be useful in the search for lost or stolen pets, the tracking of individuals of endangered species, and the recognition of animals in crowded farms. Present recognition techniques mostly use physical devices, e.g., microchips, often impractical and difficult to apply. These could be replaced by remote recognition via the animal's face; if accurate enough, it provides several advantages: it is non-invasive, can work at a distance, and is difficult to counterfeit, as, for instance, in the case of substituting sick animals for healthy ones in the food industry. The few existing datasets with sufficient per-subject images annotated with a single animal identity are not large enough to train current deep learning architectures. We rather investigate the possibility of transfer learning, exploiting pre-trained network models as backbones. Our experiments compared FaceNet, which is specifically trained on large databases of human faces, with the Vision Transformer (ViT) pre-trained on ImageNet, i.e., on object categories. We used three face datasets of very different animals: dogs, primates (lemurs, golden monkeys, and chimpanzees), and cattle. We report the results and, for each dataset, compare them with the state of the art (SOTA) ad hoc-trained deep networks. The capture conditions differ among the three datasets. Image quality (resolution, motion blur, diverse poses, etc.) decreases from dogs to cattle to primates. The best performance was achieved with dogs, where ViT reached a mean verification accuracy of 96.85% and a Rank-1 Identification Rate of 84.34%. The results for endangered primates are still encouraging, but performance varies across animal classes and tasks (verification or identification), and does not always outperform SOTA. For cattle, the ViT results outperform SOTA, while FaceNet is still competitive.
Summary / 总结
Individual animal recognition can be useful in the search for lost or stolen pets, the tracking of individuals of endangered species, and the recognition of animals in crowded farms.
Decoding Pedestrian Crossing Intention from Egocentric Vision via Vision Language Models
Authors: Danya Li, Xiang Su, Yan Feng, Rico Krueger
First: 2026-06-08T07:39:42+00:00 · Latest: 2026-06-08T07:39:42+00:00
Abstract
Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored. In this work, we study the decoding of pedestrian crossing intentions from short egocentric video clips. We approach this by formulating the task as a closed-ended visual question answering (VQA) problem and leveraging vision language models (VLMs) to predict the pedestrians' intent. We first benchmark three families of state-of-the-art VLMs in a zero-shot setting, finding that they achieve moderate gains over random guessing but exhibit limited higher-level traffic reasoning. Motivated by these findings, we further adapt VLMs to the target task using parameter-efficient fine-tuning. Our results show that the fine-tuned models substantially outperform their zero-shot counterparts and achieve a 9\% accuracy improvement over a specialized transformer-based baseline. Finally, we demonstrate that incorporating additional contextual cues, including ego motion, vehicle motion, and eye gaze, further improves predictive performance. In particular, the fine-tuned Qwen3-VL-2B model guided by eye gaze and ego motion achieves a 14.5% accuracy improvement over the transformer baseline, establishing a new state of the art for egocentric pedestrian intent decoding.
Summary / 总结
Egocentric vision offers a first-person view of human perception and decision making, yet its potential for traffic-safety prediction remains underexplored.
Vision Language Model Helps Private Information De-Identification in Vision Data
Authors: Tiejin Chen, Pingzhi Li, Kaixiong Zhou, Tianlong Chen, Hua Wei
First: 2026-06-08T07:30:20+00:00 · Latest: 2026-06-08T07:30:20+00:00
Abstract
Visual Language Models (VLMs) have gained significant popularity due to their remarkable ability. While various methods exist to enhance privacy in text-based applications, privacy risks associated with visual inputs remain largely overlooked such as Protected Health Information (PHI) in medical images. To tackle this problem, two key tasks: accurately localizing sensitive text and processing it to ensure privacy protection should be performed. To address this issue, we introduce VisShield (Vision Privacy Shield), an end-to-end framework designed to enhance the privacy awareness of VLMs. Our framework consists of two key components: a specialized instruction-tuning dataset OPTIC (Optical Privacy Text Instruction Collection) and a tailored training methodology. The dataset provides diverse privacy-oriented prompts that guide VLMs to perform targeted Optical Character Recognition (OCR) for precise localization of sensitive text, while the training strategy ensures effective adaptation of VLMs to privacy-preserving tasks. Specifically, our approach ensures that VLMs recognize privacy-sensitive text and output precise bounding boxes for detected entities, allowing for effective masking of sensitive information. Extensive experiments demonstrate that our framework significantly outperforms existing approaches in handling private information, paving the way for privacy-preserving applications in vision-language models. Our dataset and code can be found here.
Summary / 总结
Visual Language Models (VLMs) have gained significant popularity due to their remarkable ability.