Legal2LogicICL: Improving Generalization in Transforming Legal Cases to Logical Formulas via Diverse Few-Shot Learning
Authors: Jieying Xue, Phuong Minh Nguyen, Ha Thanh Nguyen, May Myo Zin, Ken Satoh
First: 2026-04-13T16:36:48+00:00 · Latest: 2026-04-13T16:36:48+00:00
Comments: Accepted at ICAIL 2026
Abstract
This work aims to improve the generalization of logic-based legal reasoning systems by integrating recent advances in NLP with legal-domain adaptive few-shot learning techniques using LLMs. Existing logic-based legal reasoning pipelines typically rely on fine-tuned models to map natural-language legal cases into logical formulas before forwarding them to a symbolic reasoner. However, such approaches are heavily constrained by the scarcity of high-quality annotated training data. To address this limitation, we propose a novel LLM-based legal reasoning framework that enables effective in-context learning through retrieval-augmented generation. Specifically, we introduce Legal2LogicICL, a few-shot retrieval framework that balances diversity and similarity of exemplars at both the latent semantic representation level and the legal text structure level. In addition, our method explicitly accounts for legal structure by mitigating entity-induced retrieval bias in legal texts, where lengthy and highly specific entity mentions often dominate semantic representations and obscure legally meaningful reasoning patterns. Our Legal2LogicICL constructs informative and robust few-shot demonstrations, leading to accurate and stable logical rule generation without requiring additional training. In addition, we construct a new dataset, named Legal2Proleg, which is annotated with alignments between legal cases and PROLEG logical formulas to support the evaluation of legal semantic parsing. Experimental results on both open-source and proprietary LLMs demonstrate that our approach significantly improves accuracy, stability, and generalization in transforming natural-language legal case descriptions into logical representations, highlighting its effectiveness for interpretable and reliable legal reasoning. Our code is available at https://github.com/yingjie7/Legal2LogicICL.
Summary / 总结
This work aims to improve the generalization of logic-based legal reasoning systems by integrating recent advances in NLP with legal-domain adaptive few-shot learning techniques using LLMs.
RPA-Check: A Multi-Stage Automated Framework for Evaluating Dynamic LLM-based Role-Playing Agents
Authors: Riccardo Rosati, Edoardo Colucci, Massimiliano Bolognini, Adriano Mancini, Paolo Sernani
First: 2026-04-13T16:08:03+00:00 · Latest: 2026-04-13T16:08:03+00:00
Abstract
The rapid adoption of Large Language Models (LLMs) in interactive systems has enabled the creation of dynamic, open-ended Role-Playing Agents (RPAs). However, evaluating these agents remains a significant challenge, as standard NLP metrics fail to capture the nuances of role adherence, logical consistency, and long-term narrative stability. This paper introduces RPA-Check, a multi-stage automated evaluation framework designed to objectively assess the performance of LLM-based RPAs in complex, constraints-heavy environments. Our methodology is based on a four-step pipeline: (1) Dimension Definition, establishing high-level qualitative behavioral criteria; (2) Augmentation, where these requirements are expanded into granular boolean checklist indicators; (3) Semantic Filtering, to ensure indicator objectivity, no redundancy and agent isolation; and (4) LLM-as-a-Judge Evaluation, which employs chain-of-thought verification to score agent fidelity. We validate this framework by applying it to LLM Court, a serious game for forensic training involving several quantized local models. Experimental results across five distinct legal scenarios demonstrate the framework's ability to identify subtle trade-offs between model size, reasoning depth, and operational stability. Notably, the findings reveal an inverse relationship between parametric scale and procedural consistency, showing that smaller, adequately instruction-tuned models (8-9B) can outperform larger architectures prone to user-alignment bias or sycophancy. RPA-Check thus provides a standardized and reproducible metric for future research in generative agent evaluation within specialized domains.
Summary / 总结
The rapid adoption of Large Language Models (LLMs) in interactive systems has enabled the creation of dynamic, open-ended Role-Playing Agents (RPAs).
Merging Triggers, Breaking Backdoors: Defensive Poisoning for Instruction-Tuned Language Models
Authors: San Kim, Gary Geunbae Lee
First: 2026-01-07T23:30:26+00:00 · Latest: 2026-04-13T12:20:51+00:00
Comments: 17 pages
Abstract
Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad task generalization without additional fine-tuning. However, their reliance on large-scale datasets-often collected from human or web sources-makes them vulnerable to backdoor attacks, where adversaries poison a small subset of data to implant hidden behaviors. Despite this growing risk, defenses for instruction-tuned models remain underexplored. We propose MB-Defense (Merging & Breaking Defense Framework), a novel training pipeline that immunizes instruction-tuned LLMs against diverse backdoor threats. MB-Defense comprises two stages: (i) Defensive Poisoning, which merges attacker and defensive triggers into a unified backdoor representation, and (ii) Backdoor Neutralization, which breaks this representation through additional training to restore clean behavior. Extensive experiments across multiple LLMs show that MB-Defense substantially lowers attack success rates while preserving instruction-following ability. Our method offers a generalizable and data-efficient defense strategy, improving the robustness of instruction-tuned LLMs against unseen backdoor attacks.
Summary / 总结
Large Language Models (LLMs) have greatly advanced Natural Language Processing (NLP), particularly through instruction tuning, which enables broad task generalization without additional fine-tuning.
CapBench: A Multi-PDK Dataset for Machine-Learning-Based Post-Layout Capacitance Extraction
Authors: Hector R. Rodriguez, Jiechen Huang, Wenjian Yu
First: 2026-04-13T09:01:12+00:00 · Latest: 2026-04-13T09:01:12+00:00
Comments: Accepted at the 63rd ACM/IEEE Design Automation Conference (DAC '26). 7 pages, 5 figures
Abstract
We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction. The dataset is derived from open-source designs, including single-core CPUs, systems-on-chip, and media accelerators. All designs are fully placed and routed using 14 independent OpenROAD flow runs spanning three technology nodes: ASAP7, NanGate45, and Sky130HD. From these layouts, we extract 61,855 3D windows across three size tiers to enable transfer learning and scalability studies. High-fidelity capacitance labels are generated using RWCap, a state-of-the-art random-walk solver, and validated against the industry-standard Raphael, achieving a mean absolute error of 0.64% for total capacitance. Each window is pre-processed into density maps, graph representations, and point clouds. We evaluate 10 machine learning architectures that illustrate dataset usage and serve as baselines, including convolutional neural networks (CNNs), point cloud transformers, and graph neural networks (GNNs). CNNs demonstrate the lowest errors (1.75%), while GNNs are up to 41.4x faster but exhibit larger errors (10.2%), illustrating a clear accuracy-speed trade-off. Code and dataset are available at https://github.com/THU-numbda/CapBench.
Summary / 总结
We present CapBench, a fully reproducible, multi-PDK dataset for capacitance extraction.
SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors
Authors: Tiancheng Hu, Joachim Baumann, Lorenzo Lupo, Nigel Collier, Dirk Hovy, Paul Röttger
Venue: ICLR 2026
First: 2025-10-20T13:14:38+00:00 · Latest: 2026-04-13T07:04:29+00:00
Comments: Accepted at ICLR 2026. Project Website: http://simbench.tiancheng.hu/ Data: https://huggingface.co/datasets/pitehu/SimBench
Abstract
Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are fragmented, based on bespoke tasks and metrics, creating a patchwork of incomparable results. To address this, we introduce SimBench, the first large-scale, standardized benchmark for a robust, reproducible science of LLM simulation. By unifying 20 diverse datasets covering tasks from moral decision-making to economic choice across a large global participant pool, SimBench provides the necessary foundation to ask fundamental questions about when, how, and why LLM simulations succeed or fail. We show that the best LLMs today achieve meaningful but modest simulation fidelity (score: 40.80/100), with performance scaling log-linearly with model size but not with increased inference-time compute. We discover an alignment-simulation tradeoff: instruction tuning improves performance on low-entropy (consensus) questions but degrades it on high-entropy (diverse) ones. Models particularly struggle when simulating specific demographic groups. Finally, we demonstrate that simulation ability correlates most strongly with knowledge-intensive reasoning (MMLU-Pro, r = 0.939). By making progress measurable, we aim to accelerate the development of more faithful LLM simulators.
Summary / 总结
Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors.
Find Your Optimal Teacher: Personalized Data Synthesis via Router-Guided Multi-Teacher Distillation
Authors: Hengyuan Zhang, Shiping Yang, Xiao Liang, Chenming Shang, Yuxuan Jiang, Chaofan Tao, Jing Xiong, Hayden Kwok-Hay So, Ruobing Xie, Angel X. Chang, Ngai Wong
Venue: ACL 2026
First: 2025-10-13T02:36:36+00:00 · Latest: 2026-04-13T05:08:28+00:00
Comments: ACL 2026 Main Conference
Abstract
Training student models on synthetic data generated by strong teacher models is a promising way to distilling the capabilities of teachers. However, recent studies show that stronger models are not always optimal teachers, revealing a mismatch between teacher outputs and student learnability. To address this issue, we propose PerSyn (Personalized data Synthesis), a novel synthesis strategy that operates under a new ``Route then Generate'' paradigm to create data tailored to each student model, enabling it to learn more effectively. Specifically, PerSyn first assigns each prompt to its optimal teacher via a query-level router that jointly considers student learnability and teacher response quality. Each teacher then synthesizes data only for its assigned prompts, making the process more efficient than the conventional ``Generate then Select'' paradigm, where all teachers must generate parallel responses for the entire prompt set before constructing the final dataset. Extensive experiments across different model families and scales demonstrate that PerSyn consistently achieves superior or comparable performance to all baselines in instruct tuning and math reasoning settings. Further analysis verifies the effectiveness of PerSyn and offers extra insights to propel future research.
Summary / 总结
Training student models on synthetic data generated by strong teacher models is a promising way to distilling the capabilities of teachers.
When Valid Signals Fail: Regime Boundaries Between LLM Features and RL Trading Policies
Authors: Zhengzhe Yang
First: 2026-04-13T04:53:06+00:00 · Latest: 2026-04-13T04:53:06+00:00
Abstract
Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents? We build a modular pipeline where a frozen LLM serves as a stateless feature extractor, transforming unstructured daily news and filings into a fixed-dimensional vector consumed by a downstream PPO agent. We introduce an automated prompt-optimization loop that treats the extraction prompt as a discrete hyperparameter and tunes it directly against the Information Coefficient - the Spearman rank correlation between predicted and realized returns - rather than NLP losses. The optimized prompt discovers genuinely predictive features (IC above 0.15 on held-out data). However, these valid intermediate representations do not automatically translate into downstream task performance: during a distribution shift caused by a macroeconomic shock, LLM-derived features add noise, and the augmented agent under-performs a price-only baseline. In a calmer test regime the agent recovers, yet macroeconomic state variables remain the most robust driver of policy improvement. Our findings highlight a gap between feature-level validity and policy-level robustness that parallels known challenges in transfer learning under distribution shift.
Summary / 总结
Can large language models (LLMs) generate continuous numerical features that improve reinforcement learning (RL) trading agents?
Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models
Authors: Rongji Li, Jian Xu, Yi Chen, Xueqing Chen, Yisheng Yang, Jiayi Wang, Xingyu Chen, Chunyu Xie, Dawei Leng, Xu-Yao Zhang
First: 2026-01-13T04:23:36+00:00 · Latest: 2026-04-13T01:54:45+00:00
Abstract
In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining. However, the two dominant paradigms for private knowledge injection each have clear drawbacks: fine-tuning is expensive to iterate under continual updates that can induce catastrophic forgetting and general-capability regression; retrieval-augmented generation (RAG) keeps the base model intact but remains brittle in specialized private corpora due to chunk-induced evidence fragmentation, retrieval mismatch, and long-context pressure. Inspired by how multimodal LLMs align heterogeneous modalities into a shared semantic space, we propose Generation-Augmented Generation (GAG), which treats private expertise as an auxiliary modality and injects it into a frozen base model through a compact, constant-budget latent interface. Concretely, GAG distills question-conditioned specialist knowledge from lightweight domain experts into multi-slot latent memories, integrates multi-layer expert signals via per-slot cross-layer fusion, and aligns them to the frozen base model through gated residual projection, while supporting scalable mixed-domain deployment with reliable selective activation. In a unified mixed-domain evaluation spanning two scientific private-domain QA benchmarks (catalytic materials and immunology adjuvant) together with general-domain queries, GAG consistently outperforms strong retrieval-based and parameter-efficient fine-tuning baselines on specialist QA, while preserving general-domain capability, achieving highly reliable routing, and offering a favorable efficiency--effectiveness trade-off. Code and datasets are provided in the supplementary material. Code is publicly available at https://github.com/360CVGroup/GAG.
Summary / 总结
In domains such as materials science, biomedicine, and finance, high-stakes deployment of large language models (LLMs) requires injecting private, domain-specific knowledge that is proprietary, fast-evolving, and under-represented in public pretraining.
FlexMS is a flexible framework for benchmarking deep learning-based mass spectrum prediction tools in metabolomics
Authors: Yunhua Zhong, Yixuan Tang, Yifan Li, Jie Yang, Pan Liu, Jun Xia
First: 2026-02-26T10:05:01+00:00 · Latest: 2026-04-12T17:04:41+00:00
Comments: 28 pages, preprint version v2 (rethink author contribution)
Abstract
The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of mass-to-charge ratio peaks. However, the lack of experimental spectra hinders the attachment of each molecular identification, and thus urges the establishment of prediction approaches for computational models. Deep learning models appear promising for predicting molecular structure spectra, but overall assessment remains challenging as a result of the heterogeneity in methods and the lack of well-defined benchmarks. To address this, our contribution is the creation of benchmark framework FlexMS for constructing and evaluating diverse model architectures in mass spectrum prediction. With its easy-to-use flexibility, FlexMS supports the dynamic construction of numerous distinct combinations of model architectures, while assessing their performance on preprocessed public datasets using different metrics. In this paper, we provide insights into factors influencing performance, including the structural diversity of datasets, hyperparameters like learning rate and data sparsity, pretraining effects, metadata ablation settings and cross-domain transfer learning analysis. This provides practical guidance in choosing suitable models. Moreover, retrieval benchmarks simulate practical identification scenarios and score potential matches based on predicted spectra.
Summary / 总结
The identification and property prediction of chemical molecules is of central importance in the advancement of drug discovery and material science, where the tandem mass spectrometry technology gives valuable fragmentation cues in the form of mass-to-charge ratio peaks.
MM-LIMA: Less Is More for Alignment in Multi-Modal Datasets
Authors: Lai Wei, Xiaozhe Li, Zihao Jiang, Weiran Huang, Lichao Sun
First: 2023-08-23T11:27:30+00:00 · Latest: 2026-04-12T14:13:44+00:00
Comments: Published at Artificial Intelligence for Engineering
Abstract
Multimodal large language models are typically trained in two stages: first pre-training on image-text pairs, and then fine-tuning using supervised vision-language instruction data. Recent studies have shown that large language models can achieve satisfactory results even with a limited amount of high-quality instruction-following data. In this paper, we introduce MM-LIMA, which is fine-tuned on a small dataset comprising only 200 examples, amounting to approximately 6% of the instruction-following data used in the alignment dataset for MiniGPT-4. To achieve this, we first propose several metrics to access the quality of multimodal instruction data. Based on these metrics, we present an effective and trainable data selector to automatically identify and filter low-quality vision-language data. By employing this method, MM-LIMA outperforms the original MiniGPT-4 on various evaluations. Overall, our findings demonstrate that less but high-quality instruction tuning data is efficient in enabling multimodal large language models to generate better output. Our code is available at https://github.com/waltonfuture/InstructionGPT-4.
Summary / 总结
Multimodal large language models are typically trained in two stages: first pre-training on image-text pairs, and then fine-tuning using supervised vision-language instruction data.
ProUIE: A Macro-to-Micro Progressive Learning Method for LLM-based Universal Information Extraction
Authors: Wenda Liu, Zhigang Song, Shuai Nie, Guangyao Liu, Lisung Chen, Binyu Yang, Yaran Chen, Peng Zhou, Hongzhen Wang, Yuchen Liu, Wenyue Hu, Jiaming Xu, Runyu Shi, Ying Huang
First: 2026-04-12T13:20:58+00:00 · Latest: 2026-04-12T13:20:58+00:00
Abstract
LLM-based universal information extraction (UIE) methods often rely on additional information beyond the original training data, which increases training complexity yet often yields limited gains. To address this, we propose ProUIE, a Macro-to-Micro progressive learning approach that improves UIE without introducing any external information. ProUIE consists of three stages: (i) macro-level Complete Modeling (CM), which learns NER, RE, and EE along their intrinsic difficulty order on the full training data to build a unified extraction foundation, (ii) meso-level Streamlined Alignment (SA), which operates on sampled data with simplified target formats, streamlining and regularizing structured outputs to make them more concise and controllable, and (iii) micro-level Deep Exploration (DE), which applies GRPO with stepwise fine-grained rewards (SFR) over structural units to guide exploration and improve performance. Experiments on 36 public datasets show that ProUIE consistently improves unified extraction, outperforming strong instruction-tuned baselines on average for NER and RE while using a smaller backbone, and it further demonstrates clear gains in large-scale production-oriented information extraction.
Summary / 总结
LLM-based universal information extraction (UIE) methods often rely on additional information beyond the original training data, which increases training complexity yet often yields limited gains.
MAESTRO: Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization
Authors: Yang Zhao, Hepeng Wang, Xiao Ding, Yangou Ouyang, Bibo Cai, Kai Xiong, Jinglong Gao, Zhouhao Sun, Li Du, Bing Qin, Ting Liu
Venue: ACL 2026
First: 2026-01-12T05:02:48+00:00 · Latest: 2026-04-12T09:26:03+00:00
Comments: ACL 2026 Main Conference
Abstract
Group-Relative Policy Optimization (GRPO) has emerged as an efficient paradigm for aligning Large Language Models (LLMs), yet its efficacy is primarily confined to domains with verifiable ground truths. Extending GRPO to open-domain settings remains a critical challenge, as unconstrained generation entails multi-faceted and often conflicting objectives - such as creativity versus factuality - where rigid, static reward scalarization is inherently suboptimal. To address this, we propose MAESTRO (Meta-learning Adaptive Estimation of Scalarization Trade-offs for Reward Optimization), which introduces a meta-cognitive orchestration layer that treats reward scalarization as a dynamic latent policy, leveraging the model's terminal hidden states as a semantic bottleneck to perceive task-specific priorities. We formulate this as a contextual bandit problem within a bi-level optimization framework, where a lightweight Conductor network co-evolves with the policy by utilizing group-relative advantages as a meta-reward signal. Across seven benchmarks, MAESTRO consistently outperforms single-reward and static multi-objective baselines, while preserving the efficiency advantages of GRPO, and in some settings even reducing redundant generation.
Summary / 总结
Group-Relative Policy Optimization (GRPO) has emerged as an efficient paradigm for aligning Large Language Models (LLMs), yet its efficacy is primarily confined to domains with verifiable ground truths.
How Many Tries Does It Take? Iterative Self-Repair in LLM Code Generation Across Model Scales and Benchmarks
Authors: Johin Johny Arimbur
First: 2026-04-12T07:51:41+00:00 · Latest: 2026-04-12T07:51:41+00:00
Comments: 11 pages, 7 figures, 8 tables
Abstract
Large language models frequently fail to produce correct code on their first attempt, yet most benchmarks evaluate them in a single-shot setting. We investigate iterative self-repair (feeding execution errors back to the model for correction) across seven models spanning three families and both open-weight and proprietary providers: Llama 3.1 8B, Llama 3.3 70B, Llama 4 Scout (MoE, 16 experts), Llama 4 Maverick (MoE, 128 experts), Qwen3 32B, Gemini 2.5 Flash, and Gemini 2.5 Pro. On HumanEval (164 problems) and MBPP Sanitized (257 problems) with up to five attempts, self-repair universally improves pass rates: +4.9 to +17.1 pp on HumanEval and +16.0 to +30.0 pp on MBPP. Gemini 2.5 Flash achieves the highest final pass rates (96.3% HumanEval, 93.8% MBPP). Most gains concentrate in the first two rounds.Error-type analysis shows assertion errors (logical mistakes) are the hardest to repair at ~45%, while syntax and name errors are repaired at substantially higher rates, connecting to broader findings on the limits of LLM self-correction. Prior work found that weaker models fail at self-repair or require fine-tuning; we show that modern instruction-tuned models succeed with prompting alone, even at 8B scale. We also provide the first comparison of dense and MoE architectures for self-repair, and extend the repair-vs-resampling tradeoff analysis to modern models. A prompt ablation reveals chain-of-thought repair yields up to +5.5 pp additional self-repair gain (measured as improvement in repair delta) over minimal prompting for capable models.
Summary / 总结
Large language models frequently fail to produce correct code on their first attempt, yet most benchmarks evaluate them in a single-shot setting.
Instruction Data Selection via Answer Divergence
Authors: Bo Li, Mingda Wang, Shikun Zhang, Wei Ye
Venue: ACL2026, Main Conference
First: 2026-04-12T04:11:12+00:00 · Latest: 2026-04-12T04:11:12+00:00
Comments: Github: https://github.com/WisdomShell/ADG Project: https://wisdomshell.github.io/ADG/
Abstract
Instruction tuning relies on large instruction-response corpora whose quality and composition strongly affect downstream performance. We propose Answer Divergence-Guided Selection (ADG), which selects instruction data based on the geometric structure of multi-sample outputs. ADG draws several high-temperature generations per instruction, maps responses into an embedding space, and computes an output divergence score that jointly encodes dispersion magnitude and shape anisotropy. High scores correspond to instructions whose answers are both far apart and multi-modal, rather than clustered paraphrases along a single direction. Across two backbones and three public instruction pools, fine-tuning on only 10K ADG-selected examples consistently outperforms strong selectors on six benchmarks spanning reasoning, knowledge, and coding. Analyses further show that both dispersion magnitude and shape anisotropy are necessary, supporting answer divergence as a practical signal for instruction data selection. Code and appendix are included in the supplementary materials.
Summary / 总结
Instruction tuning relies on large instruction-response corpora whose quality and composition strongly affect downstream performance.
Data Selection for Multi-turn Dialogue Instruction Tuning
Authors: Bo Li, Shikun Zhang, Wei Ye
Venue: ACL 2026
First: 2026-04-09T07:01:26+00:00 · Latest: 2026-04-12T02:03:46+00:00
Comments: Github: https://github.com/WisdomShell/MDS Project: https://wisdomshell.github.io/MDS/
Abstract
Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.
Summary / 总结
Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns.
Ultra-Low-Dimensional Prompt Tuning via Random Projection
Authors: Zijun Wu, Yongchang Hao, Lili Mou
First: 2025-02-06T21:00:29+00:00 · Latest: 2026-04-12T01:21:55+00:00
Comments: Accepted by EACL 2026 (Main Conference, Long Paper)
Abstract
Large language models achieve state-of-the-art performance but are increasingly costly to fine-tune. Prompt tuning is a parameter-efficient fine-tuning method that addresses parameter-efficiency by learning prompt embeddings, but these embeddings are typically tied to the model's hidden dimensionality, limiting parameter saving. In this paper, we propose Ultra-Low-dimensional Prompt Tuning (ULPT), a simple yet effective method that optimizes prompts in a low-dimensional space (e.g., 2D) and uses a frozen random matrix for up-projection. ULPT can achieve 98% reduction in the training parameters compared to vanilla prompt tuning while preserving performance. Our extensive experiments across over 20 NLP tasks demonstrate that ULPT consistently outperforms recent parameter-efficient tuning methods using significantly fewer parameters, making it well-suited as a storage-efficient framework for massive LLM customization.
Summary / 总结
Large language models achieve state-of-the-art performance but are increasingly costly to fine-tune.
VeriTrans: Fine-Tuned LLM-Assisted NL-to-PL Translation via a Deterministic Neuro-Symbolic Pipeline
Authors: Xuan Liu, Dheeraj Kodakandla, Kushagra Srivastva, Mahfuza Farooque
First: 2026-04-11T19:59:02+00:00 · Latest: 2026-04-11T19:59:02+00:00
Abstract
\textbf{VeriTrans} is a reliability-first ML system that compiles natural-language requirements into solver-ready logic with validator-gated reliability. The pipeline integrates an instruction-tuned NL$\!\to\!$PL translator, round-trip reconstruction (PL$\!\to\!$NL) used as a high-precision acceptance gate, and canonical PL$\!\to\!$CNF compilation, all executed via fixed API configuration (temperature$=0$; fine-tuning runs use seed$=42$) and per-item artifact logging (prompts, outputs, hashes) to support auditability and replay-driven debugging. On \textbf{SatBench} (2{,}100 specifications), VeriTrans achieves 94.46\% SAT/UNSAT correctness and 87.73\% median round-trip similarity. Compact fine-tuning on 100--150 curated examples improves fidelity by about 1--1.5\,pp without increasing latency (mean 25.8\,s/spec on our 201-spec runtime subset). A thresholded acceptance policy on the round-trip score exposes a reliability--coverage knob: at $τ{=}75$, roughly 68\% of items are retained with $\sim$94\% correctness on the accepted set. Validator overhead contributes $<15\%$ of end-to-end runtime, and all prompts/responses and timing metadata are logged to enable replay-driven debugging and regression testing. By separating learned translation from symbolic verification and enforcing deterministic, validator-gated acceptance, VeriTrans turns NL$\!\to\!$logic front-ends into auditable, reproducible components for reliability-critical workflows.
Summary / 总结
\textbf{VeriTrans} is a reliability-first ML system that compiles natural-language requirements into solver-ready logic with validator-gated reliability.
Jailbreaking the Matrix: Nullspace Steering for Controlled Model Subversion
Authors: Vishal Pramanik, Maisha Maliha, Susmit Jha, Sumit Kumar Jha
First: 2026-04-11T19:19:05+00:00 · Latest: 2026-04-11T19:19:05+00:00
Abstract
Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning. We propose Head-Masked Nullspace Steering (HMNS), a circuit-level intervention that (i) identifies attention heads most causally responsible for a model's default behavior, (ii) suppresses their write paths via targeted column masking, and (iii) injects a perturbation constrained to the orthogonal complement of the muted subspace. HMNS operates in a closed-loop detection-intervention cycle, re-identifying causal heads and reapplying interventions across multiple decoding attempts. Across multiple jailbreak benchmarks, strong safety defenses, and widely used language models, HMNS attains state-of-the-art attack success rates with fewer queries than prior methods. Ablations confirm that nullspace-constrained injection, residual norm scaling, and iterative re-identification are key to its effectiveness. To our knowledge, this is the first jailbreak method to leverage geometry-aware, interpretability-informed interventions, highlighting a new paradigm for controlled model steering and adversarial safety circumvention.
Summary / 总结
Large language models remain vulnerable to jailbreak attacks -- inputs designed to bypass safety mechanisms and elicit harmful responses -- despite advances in alignment and instruction tuning.
Evolutionary Profiles for Protein Fitness Prediction
Authors: Jigang Fan, Xiaoran Jiao, Shengdong Lin, Zhanming Liang, Weian Mao, Chenchen Jing, Hao Chen, Chunhua Shen
First: 2025-10-08T17:46:02+00:00 · Latest: 2026-04-11T09:47:15+00:00
Abstract
Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space. Protein language models (pLMs) trained with masked language modeling (MLM) exhibit strong zero-shot fitness prediction; we provide a unifying view by interpreting natural evolution as implicit reward maximization and MLM as inverse reinforcement learning (IRL), in which extant sequences act as expert demonstrations and pLM log-odds serve as fitness estimates. Building on this perspective, we introduce EvoIF, a lightweight model that integrates two complementary sources of evolutionary signal: (i) within-family profiles from retrieved homologs and (ii) cross-family structural-evolutionary constraints distilled from inverse folding logits. EvoIF fuses sequence-structure representations with these profiles via a compact transition block, yielding calibrated probabilities for log-odds scoring. On ProteinGym (217 mutational assays; >2.5M mutants), EvoIF and its MSA-enabled variant achieve state-of-the-art or competitive performance while using only 0.15% of the training data and fewer parameters than recent large models. Ablations confirm that within-family and cross-family profiles are complementary, improving robustness across function types, MSA depths, taxa, and mutation depths. The codes will be made publicly available.
Summary / 总结
Predicting the fitness impact of mutations is central to protein engineering but constrained by limited assays relative to the size of sequence space.
PAT: Privacy-Preserving Adversarial Transfer for Accurate, Robust and Privacy-Preserving EEG Decoding
Authors: Xiaoqing Chen, Tianwang Jia, Yunlu Tu, Dongrui Wu
First: 2024-12-16T02:37:38+00:00 · Latest: 2026-04-11T02:33:41+00:00
Abstract
An electroencephalogram (EEG)-based brain-computer interface (BCI) enables direct communication between the brain and external devices. However, such systems face at least three major challenges in real-world applications: limited decoding accuracy, poor robustness, and privacy risks. Although prior studies have addressed one or two of these issues, methods that simultaneously improve accuracy, robustness, and privacy remain largely unexplored. In this paper, we propose Privacy-preserving Adversarial Transfer (PAT), a unified training framework that combines data alignment, adversarial training, and privacy-preserving transfer. PAT provides a single pipeline that can be instantiated under three privacy-preserving scenarios, i.e., centralized source-free transfer, federated source-free transfer, and transfer with privacy-preserved source data, while jointly improving accuracy and robustness. Experiments on five public EEG datasets under three privacy-preserving scenarios (centralized source-free transfer, federated source-free transfer, and transfer with privacy-preserved source data) show that PAT outperforms over ten classic and state-of-the-art methods in both accuracy and robustness. PAT also outperformed leading transfer learning approaches that do not incorporate any privacy mechanisms by 9.76% in terms of average accuracy and robustness. To our knowledge, this is the first approach that simultaneously addresses all three major challenges in EEG-based BCIs. We believe this work can help motivate further research on more accurate, robust, and privacy-preserving EEG decoding.
Summary / 总结
An electroencephalogram (EEG)-based brain-computer interface (BCI) enables direct communication between the brain and external devices.
New Hybrid Fine-Tuning Paradigm for LLMs: Algorithm Design and Convergence Analysis Framework
Authors: Shaocong Ma, Peiran Yu, Heng Huang
Venue: ICLR 2026
First: 2026-04-10T22:39:38+00:00 · Latest: 2026-04-10T22:39:38+00:00
Comments: Accepted by ICLR 2026
Abstract
Fine-tuning Large Language Models (LLMs) typically involves either full fine-tuning, which updates all model parameters, or Parameter-Efficient Fine-Tuning (PEFT), which adjusts a small subset of parameters. However, both approaches have inherent limitations: full fine-tuning is computationally expensive, while PEFT often struggles to learn new knowledge and exhibits suboptimal performance. To overcome these issues, we propose a novel hybrid fine-tuning approach that jointly updates both LLMs and PEFT modules using a combination of zeroth-order and first-order optimization methods. To analyze our new algorithm, we develop a theoretical framework centered on the concept of hybrid smoothness condition, which accounts for the heterogeneous nature of the optimization landscape in joint LLM and PEFT training. We derive a rigorous convergence analysis for the convergence of reshuffling-type SGD algorithm under multiple learning rates and demonstrate its effectiveness through extensive empirical studies across various downstream tasks and model architectures. On the practical side, our results demonstrate consistent performance improvement, making the approach a viable solution for large-scale language model fine-tuning.
Summary / 总结
Fine-tuning Large Language Models (LLMs) typically involves either full fine-tuning, which updates all model parameters, or Parameter-Efficient Fine-Tuning (PEFT), which adjusts a small subset of parameters.
Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning
Authors: Giray Önür, Azita Dabiri, Bart De Schutter
First: 2025-12-08T10:52:00+00:00 · Latest: 2026-04-10T15:28:31+00:00
Comments: Accepted for presentation and publication in the proceedings of the 2026 European Control Conference (ECC)
Abstract
Effective traffic control is essential for mitigating congestion in transportation networks. Conventional traffic management strategies, including route guidance and ramp metering, often rely on state feedback controllers, which are used for their simplicity and reactivity; however, they lack the adaptability required to cope with complex and time-varying traffic dynamics. This paper proposes a multi-agent reinforcement learning (RL) framework in which each agent adaptively tunes the parameters of a state feedback traffic controller, combining the reactivity of state feedback controllers with the adaptability of RL. By tuning parameters at a lower frequency rather than directly determining control inputs at a high frequency, the RL agents achieve improved training efficiency while maintaining adaptability to varying traffic conditions. The multi-agent structure further enhances system robustness, as local controllers can operate independently in the event of partial failures. The proposed framework is evaluated on a simulated multi-class transportation network under varying traffic conditions. Results show that the proposed multi-agent framework outperforms the no-control and fixed-parameter state feedback control cases, while performing on par with the single-agent RL-based adaptive state feedback control, but with much greater resilience to disturbances.
Summary / 总结
Effective traffic control is essential for mitigating congestion in transportation networks.
Variational Quantum Physics-Informed Neural Networks for Hydrological PDE-Constrained Learning with Inherent Uncertainty Quantification
Authors: Prasad Nimantha Madusanka Ukwatta Hewage, Midhun Chakkravarthy, Ruvan Kumara Abeysekara
First: 2026-04-10T14:45:38+00:00 · Latest: 2026-04-10T14:45:38+00:00
Comments: 25 pages, 6 tables. Code available at https://github.com/nimanpra/HQC-PINN-Flood-Prediction
Abstract
We propose a Hybrid Quantum-Classical Physics-Informed Neural Network (HQC-PINN) that integrates parameterized variational quantum circuits into the PINN framework for hydrological PDE-constrained learning. Our architecture encodes multi-source remote sensing features into quantum states via trainable angle encoding, processes them through a hardware-efficient variational ansatz with entangling layers, and constrains the output using the Saint-Venant shallow water equations and Manning's flow equation as differentiable physics loss terms. The inherent stochasticity of quantum measurement provides a natural mechanism for uncertainty quantification without requiring explicit Bayesian inference machinery. We further introduce a quantum transfer learning protocol that pre-trains on multi-hazard disaster data before fine-tuning on flood-specific events. Numerical simulations on multi-modal satellite and meteorological data from the Kalu River basin, Sri Lanka, show that the HQC-PINN achieves convergence in ~3x fewer training epochs and uses ~44% fewer trainable parameters compared to an equivalent classical PINN, while maintaining competitive classification accuracy. Theoretical analysis indicates that hydrological physics constraints narrow the effective optimization landscape, providing a natural mitigation against barren plateaus in variational quantum circuits. This work establishes the first application of quantum-enhanced physics-informed learning to hydrological prediction and demonstrates a viable path toward quantum advantage in environmental science.
Summary / 总结
We propose a Hybrid Quantum-Classical Physics-Informed Neural Network (HQC-PINN) that integrates parameterized variational quantum circuits into the PINN framework for hydrological PDE-constrained learning.
Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design
Authors: Simon J. Crouzet
Venue: ICLR 2026
First: 2026-04-10T14:39:57+00:00 · Latest: 2026-04-10T14:39:57+00:00
Comments: ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design
Abstract
Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization. Here we present CrossAbSense, a framework of property-specific neural oracles that combine frozen protein language model encoders with configurable attention decoders, identified through a systematic hyperparameter campaign totaling over 200 runs per property. On the GDPa1 benchmark of 242 therapeutic IgGs, our oracles achieve notable improvements of 12--20\% over established baselines on three of five developability assays and competitive performance on the remaining two. The central finding is that optimal decoder architectures \emph{invert} our initial biological hypotheses: self-attention alone suffices for aggregation-related properties (hydrophobic interaction chromatography, polyreactivity), where the relevant sequence signatures -- such as CDR-H3 hydrophobic patches -- are already fully resolved within single-chain embeddings by the high-capacity 6B encoder. Bidirectional cross-attention, by contrast, is required for expression yield and thermal stability -- properties that inherently depend on the compatibility between heavy and light chains. Learned chain fusion weights independently confirm heavy-chain dominance in aggregation ($w_H = 0.62$) versus balanced contributions for stability ($w_H = 0.51$). We demonstrate practical utility by deploying CrossAbSense on 100 IgLM-generated antibody designs, illustrating a path toward substantial reduction in experimental screening costs.
Summary / 总结
Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization.
Transferable FB-GNN-MBE Framework for Potential Energy Surfaces: Data-Adaptive Transfer Learning in Deep Learned Many-Body Expansion Theory
Authors: Siqi Chen, Zhiqiang Wang, Yili Shen, Xianqi Deng, Xi Cheng, Cheng-Wei Ju, Jun Yi, Guo Ling, Dieaa Alhmoud, Hui Guan, Zhou Lin
First: 2026-04-10T13:43:19+00:00 · Latest: 2026-04-10T13:43:19+00:00
Comments: Under review with The Journal of Chemical Physics. Main text: 23 pages, 11 figures, and 1 table. Supplementary Materials: 28 pages, 6 figures, 15 tables, 4 pseudo-algorithms
Abstract
Mechanistic understanding and rational design of complex chemical systems depend on fast and accurate predictions of electronic structures beyond individual building blocks. However, if the system exceeds hundreds of atoms, first-principles quantum mechanical (QM) modeling becomes impractical. In this study, we developed FB-GNN-MBE by integrating a fragment-based graph neural network (FB-GNN) into the many-body expansion (MBE) theory and demonstrated its capacity to reproduce first-principles potential energy surfaces (PES) for hierarchically structured systems with manageable accuracy, complexity, and interpretability. Specifically, we divided the entire system into basic building blocks (fragments), evaluated their one-fragment energies using a QM model, and addressed many-fragment interactions using the structure-property relationships trained by FB-GNNs. Our investigation shows that FB-GNN-MBE achieves chemical accuracy in predicting two-body (2B) and three-body (3B) energies across water, phenol, and mixture benchmarks, as well as the one-dimensional dissociation curves of water and phenol dimers. To transfer the success of FB-GNN-MBE across various systems with minimal computational costs and data demands, we developed and validated a teacher-student learning protocol. A heavy-weight FB-GNN trained on a mixed-density water cluster ensemble (teacher) distills its learned knowledge and passes it to a light-weight GNN (student), which is later fine-tuned on a uniform-density (H2O)21 cluster ensemble. This transfer learning strategy resulted in efficient and accurate prediction of 2B and 3B energies for variously sized water clusters without retraining. Our transferable FB-GNN-MBE framework outperformed conventional non-FB-GNN-based models and showed high practicality for large-scale molecular simulations.
Summary / 总结
Mechanistic understanding and rational design of complex chemical systems depend on fast and accurate predictions of electronic structures beyond individual building blocks.
Meta-Learned Basis Adaptation for Parametric Linear PDEs
Authors: Vikas Dwivedi, Monica Sigovan, Bruno Sixou
First: 2026-04-10T13:00:03+00:00 · Latest: 2026-04-10T13:00:03+00:00
Abstract
We propose a hybrid physics-informed framework for solving families of parametric linear partial differential equations (PDEs) by combining a meta-learned predictor with a least-squares corrector. The predictor, termed \textbf{KAPI} (Kernel-Adaptive Physics-Informed meta-learner), is a shallow task-conditioned model that maps query coordinates and PDE parameters to solution values while internally generating an interpretable, task-adaptive Gaussian basis geometry. A lightweight meta-network maps PDE parameters to basis centers, widths, and activity patterns, thereby learning how the approximation space should adapt across the parametric family. This predictor-generated geometry is transferred to a second-stage corrector, which augments it with a background basis and computes the final solution through a one-shot physics-informed Extreme Learning Machine (PIELM)-style least-squares solve. We evaluate the method on four linear PDE families spanning diffusion, transport, mixed advection--diffusion, and variable-speed transport. Across these cases, the predictor captures meaningful physics through localized and transport-aligned basis placement, while the corrector further improves accuracy, often by one or more orders of magnitude. Comparisons with parametric PINNs, physics-informed DeepONet, and uniform-grid PIELM correctors highlight the value of predictor-guided basis adaptation as an interpretable and efficient strategy for parametric PDE solving.
Summary / 总结
We propose a hybrid physics-informed framework for solving families of parametric linear partial differential equations (PDEs) by combining a meta-learned predictor with a least-squares corrector.
Automatic Self-supervised Learning for Social Recommendations
Authors: Xin He, Wenqi Fan, Mingchen Sun, Ying Wang, Xin Wang
First: 2024-12-25T01:47:39+00:00 · Latest: 2026-04-10T08:58:31+00:00
Comments: Accepted by Neurocomputing
Abstract
In recent years, researchers have leveraged social relations to enhance recommendation performance. However, most existing social recommendation methods require carefully designed auxiliary social tasks tailored to specific scenarios, which depend heavily on domain knowledge and expertise. To address this limitation, we propose Automatic Self-supervised Learning for Social Recommendations (AusRec), which integrates multiple self-supervised auxiliary tasks with an automatic weighting mechanism to adaptively balance their contributions through a meta-learning optimization framework. This design enables the model to automatically learn the optimal importance of each auxiliary task, thereby enhancing representation learning in social recommendations. Extensive experiments on several real-world datasets demonstrate that AusRec consistently outperforms state-of-the-art baselines, validating its effectiveness and robustness across different recommendation scenarios.
Summary / 总结
In recent years, researchers have leveraged social relations to enhance recommendation performance.
Task-Distributionally Robust Data-Free Meta-Learning
Authors: Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao
First: 2023-11-23T15:46:54+00:00 · Latest: 2026-04-10T08:54:36+00:00
Abstract
Data-Free Meta-Learning (DFML) aims to enable efficient learning of unseen few-shot tasks, by meta-learning from multiple pre-trained models without accessing their original training data. While existing DFML methods typically generate synthetic data from these models to perform meta-learning, a comprehensive analysis of DFML's robustness-particularly its failure modes and vulnerability to potential attacks-remains notably absent. Such an analysis is crucial as algorithms often operate in complex and uncertain real-world environments. This paper fills this significant gap by systematically investigating the robustness of DFML, identifying two critical but previously overlooked vulnerabilities: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS refers to the sequential shifts in the evolving task distribution, leading to the catastrophic forgetting of previously learned meta-knowledge. TDC exposes a security flaw of DFML, revealing its susceptibility to attacks when the pre-trained model pool includes untrustworthy models that deceptively claim to be beneficial but are actually harmful. To mitigate these vulnerabilities, we propose a trustworthy DFML framework comprising three components: synthetic task reconstruction, meta-learning with task memory interpolation, and automatic model selection. Specifically, utilizing model inversion techniques, we reconstruct synthetic tasks from multiple pre-trained models to perform meta-learning. To prevent forgetting, we introduce a strategy to replay interpolated historical tasks to efficiently recall previous meta-knowledge. Furthermore, our framework seamlessly incorporates an automatic model selection mechanism to automatically filter out untrustworthy models during the meta-learning process. Code is available at https://github.com/Egg-Hu/Trustworthy-DFML.
Summary / 总结
Data-Free Meta-Learning (DFML) aims to enable efficient learning of unseen few-shot tasks, by meta-learning from multiple pre-trained models without accessing their original training data.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
Authors: Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal
Venue: CVPR
First: 2026-04-10T08:33:56+00:00 · Latest: 2026-04-10T08:33:56+00:00
Comments: 17 pages (8 main + 2 references + 7 supplementary), Accepted to CVPR Findings 2026
Abstract
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI), a backdoor detection method designed for prompt-tuned CLIP models. Assuming white-box access to the delivered model and a pool of unlabeled OOD images, CI reconstructs possible triggers for each class to determine if the model exhibits backdoor behaviour or not. Additionally, we demonstrate that using CI's reconstructed trigger for fine-tuning on correctly labeled triggered inputs enables us to re-align the model and reduce backdoor effectiveness. Through extensive experiments across ten datasets and four backdoor attacks, we demonstrate that CI can reconstruct effective triggers in a single epoch using only 1,000 OOD images, achieving a 94% detection accuracy (47/50 models). Compared to adapted trigger-inversion baselines, CI yields a markedly higher AUROC score (0.973 vs 0.495/0.687), thus enabling the vetting and post-hoc repair of prompt-tuned CLIP models to ensure safe deployment.
Summary / 总结
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch.
Chronological Contrastive Learning: Few-Shot Progression Assessment in Irreversible Diseases
Authors: Clemens Watzenböck, Daniel Aletaha, Michaël Deman, Thomas Deimel, Jana Eder, Ivana Janickova, Robert Janiczek, Peter Mandl, Philipp Seeböck, Gabriela Supp, Paul Weiser, Georg Langs
First: 2026-03-23T12:53:04+00:00 · Latest: 2026-04-10T08:26:35+00:00
Comments: Accepted for MIDL 2026; Reviews available at https://openreview.net/forum?id=c1UkGC3MVq
Abstract
Quantitative disease severity scoring in medical imaging is costly, time-consuming, and subject to inter-reader variability. At the same time, clinical archives contain far more longitudinal imaging data than expert-annotated severity scores. Existing self-supervised methods typically ignore this chronological structure. We introduce ChronoCon, a contrastive learning approach that replaces label-based ranking losses with rankings derived solely from the visitation order of a patient's longitudinal scans. Under the clinically plausible assumption of monotonic progression in irreversible diseases, the method learns disease-relevant representations without using any expert labels. This generalizes the idea of Rank-N-Contrast from label distances to temporal ordering. Evaluated on rheumatoid arthritis radiographs for severity assessment, the learned representations substantially improve label efficiency. In low-label settings, ChronoCon significantly outperforms a fully supervised baseline initialized from ImageNet weights. In a few-shot learning experiment, fine-tuning ChronoCon on expert scores from only five patients yields an intraclass correlation coefficient of 86% for severity score prediction. These results demonstrate the potential of chronological contrastive learning to exploit routinely available imaging metadata to reduce annotation requirements in the irreversible disease domain. Code is available at https://github.com/cirmuw/ChronoCon.
Summary / 总结
Quantitative disease severity scoring in medical imaging is costly, time-consuming, and subject to inter-reader variability.