Adaptive Tuning of Parameterized Traffic Controllers via Multi-Agent Reinforcement Learning
Authors: Giray Önür, Azita Dabiri, Bart De Schutter
First: 2025-12-08T10:52:00+00:00 · Latest: 2026-04-10T15:28:31+00:00
Comments: Accepted for presentation and publication in the proceedings of the 2026 European Control Conference (ECC)
Abstract
Effective traffic control is essential for mitigating congestion in transportation networks. Conventional traffic management strategies, including route guidance and ramp metering, often rely on state feedback controllers, which are used for their simplicity and reactivity; however, they lack the adaptability required to cope with complex and time-varying traffic dynamics. This paper proposes a multi-agent reinforcement learning (RL) framework in which each agent adaptively tunes the parameters of a state feedback traffic controller, combining the reactivity of state feedback controllers with the adaptability of RL. By tuning parameters at a lower frequency rather than directly determining control inputs at a high frequency, the RL agents achieve improved training efficiency while maintaining adaptability to varying traffic conditions. The multi-agent structure further enhances system robustness, as local controllers can operate independently in the event of partial failures. The proposed framework is evaluated on a simulated multi-class transportation network under varying traffic conditions. Results show that the proposed multi-agent framework outperforms the no-control and fixed-parameter state feedback control cases, while performing on par with the single-agent RL-based adaptive state feedback control, but with much greater resilience to disturbances.
Summary / 总结
Effective traffic control is essential for mitigating congestion in transportation networks.
Variational Quantum Physics-Informed Neural Networks for Hydrological PDE-Constrained Learning with Inherent Uncertainty Quantification
Authors: Prasad Nimantha Madusanka Ukwatta Hewage, Midhun Chakkravarthy, Ruvan Kumara Abeysekara
First: 2026-04-10T14:45:38+00:00 · Latest: 2026-04-10T14:45:38+00:00
Comments: 25 pages, 6 tables. Code available at https://github.com/nimanpra/HQC-PINN-Flood-Prediction
Abstract
We propose a Hybrid Quantum-Classical Physics-Informed Neural Network (HQC-PINN) that integrates parameterized variational quantum circuits into the PINN framework for hydrological PDE-constrained learning. Our architecture encodes multi-source remote sensing features into quantum states via trainable angle encoding, processes them through a hardware-efficient variational ansatz with entangling layers, and constrains the output using the Saint-Venant shallow water equations and Manning's flow equation as differentiable physics loss terms. The inherent stochasticity of quantum measurement provides a natural mechanism for uncertainty quantification without requiring explicit Bayesian inference machinery. We further introduce a quantum transfer learning protocol that pre-trains on multi-hazard disaster data before fine-tuning on flood-specific events. Numerical simulations on multi-modal satellite and meteorological data from the Kalu River basin, Sri Lanka, show that the HQC-PINN achieves convergence in ~3x fewer training epochs and uses ~44% fewer trainable parameters compared to an equivalent classical PINN, while maintaining competitive classification accuracy. Theoretical analysis indicates that hydrological physics constraints narrow the effective optimization landscape, providing a natural mitigation against barren plateaus in variational quantum circuits. This work establishes the first application of quantum-enhanced physics-informed learning to hydrological prediction and demonstrates a viable path toward quantum advantage in environmental science.
Summary / 总结
We propose a Hybrid Quantum-Classical Physics-Informed Neural Network (HQC-PINN) that integrates parameterized variational quantum circuits into the PINN framework for hydrological PDE-constrained learning.
Biologically-Grounded Multi-Encoder Architectures as Developability Oracles for Antibody Design
Authors: Simon J. Crouzet
Venue: ICLR 2026
First: 2026-04-10T14:39:57+00:00 · Latest: 2026-04-10T14:39:57+00:00
Comments: ICLR 2026 Workshop on Generative and Experimental Perspectives for Biomolecular Design
Abstract
Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization. Here we present CrossAbSense, a framework of property-specific neural oracles that combine frozen protein language model encoders with configurable attention decoders, identified through a systematic hyperparameter campaign totaling over 200 runs per property. On the GDPa1 benchmark of 242 therapeutic IgGs, our oracles achieve notable improvements of 12--20\% over established baselines on three of five developability assays and competitive performance on the remaining two. The central finding is that optimal decoder architectures \emph{invert} our initial biological hypotheses: self-attention alone suffices for aggregation-related properties (hydrophobic interaction chromatography, polyreactivity), where the relevant sequence signatures -- such as CDR-H3 hydrophobic patches -- are already fully resolved within single-chain embeddings by the high-capacity 6B encoder. Bidirectional cross-attention, by contrast, is required for expression yield and thermal stability -- properties that inherently depend on the compatibility between heavy and light chains. Learned chain fusion weights independently confirm heavy-chain dominance in aggregation ($w_H = 0.62$) versus balanced contributions for stability ($w_H = 0.51$). We demonstrate practical utility by deploying CrossAbSense on 100 IgLM-generated antibody designs, illustrating a path toward substantial reduction in experimental screening costs.
Summary / 总结
Generative models can now propose thousands of \emph{de novo} antibody sequences, yet translating these designs into viable therapeutics remains constrained by the cost of biophysical characterization.
Transferable FB-GNN-MBE Framework for Potential Energy Surfaces: Data-Adaptive Transfer Learning in Deep Learned Many-Body Expansion Theory
Authors: Siqi Chen, Zhiqiang Wang, Yili Shen, Xianqi Deng, Xi Cheng, Cheng-Wei Ju, Jun Yi, Guo Ling, Dieaa Alhmoud, Hui Guan, Zhou Lin
First: 2026-04-10T13:43:19+00:00 · Latest: 2026-04-10T13:43:19+00:00
Comments: Under review with The Journal of Chemical Physics. Main text: 23 pages, 11 figures, and 1 table. Supplementary Materials: 28 pages, 6 figures, 15 tables, 4 pseudo-algorithms
Abstract
Mechanistic understanding and rational design of complex chemical systems depend on fast and accurate predictions of electronic structures beyond individual building blocks. However, if the system exceeds hundreds of atoms, first-principles quantum mechanical (QM) modeling becomes impractical. In this study, we developed FB-GNN-MBE by integrating a fragment-based graph neural network (FB-GNN) into the many-body expansion (MBE) theory and demonstrated its capacity to reproduce first-principles potential energy surfaces (PES) for hierarchically structured systems with manageable accuracy, complexity, and interpretability. Specifically, we divided the entire system into basic building blocks (fragments), evaluated their one-fragment energies using a QM model, and addressed many-fragment interactions using the structure-property relationships trained by FB-GNNs. Our investigation shows that FB-GNN-MBE achieves chemical accuracy in predicting two-body (2B) and three-body (3B) energies across water, phenol, and mixture benchmarks, as well as the one-dimensional dissociation curves of water and phenol dimers. To transfer the success of FB-GNN-MBE across various systems with minimal computational costs and data demands, we developed and validated a teacher-student learning protocol. A heavy-weight FB-GNN trained on a mixed-density water cluster ensemble (teacher) distills its learned knowledge and passes it to a light-weight GNN (student), which is later fine-tuned on a uniform-density (H2O)21 cluster ensemble. This transfer learning strategy resulted in efficient and accurate prediction of 2B and 3B energies for variously sized water clusters without retraining. Our transferable FB-GNN-MBE framework outperformed conventional non-FB-GNN-based models and showed high practicality for large-scale molecular simulations.
Summary / 总结
Mechanistic understanding and rational design of complex chemical systems depend on fast and accurate predictions of electronic structures beyond individual building blocks.
Meta-Learned Basis Adaptation for Parametric Linear PDEs
Authors: Vikas Dwivedi, Monica Sigovan, Bruno Sixou
First: 2026-04-10T13:00:03+00:00 · Latest: 2026-04-10T13:00:03+00:00
Abstract
We propose a hybrid physics-informed framework for solving families of parametric linear partial differential equations (PDEs) by combining a meta-learned predictor with a least-squares corrector. The predictor, termed \textbf{KAPI} (Kernel-Adaptive Physics-Informed meta-learner), is a shallow task-conditioned model that maps query coordinates and PDE parameters to solution values while internally generating an interpretable, task-adaptive Gaussian basis geometry. A lightweight meta-network maps PDE parameters to basis centers, widths, and activity patterns, thereby learning how the approximation space should adapt across the parametric family. This predictor-generated geometry is transferred to a second-stage corrector, which augments it with a background basis and computes the final solution through a one-shot physics-informed Extreme Learning Machine (PIELM)-style least-squares solve. We evaluate the method on four linear PDE families spanning diffusion, transport, mixed advection--diffusion, and variable-speed transport. Across these cases, the predictor captures meaningful physics through localized and transport-aligned basis placement, while the corrector further improves accuracy, often by one or more orders of magnitude. Comparisons with parametric PINNs, physics-informed DeepONet, and uniform-grid PIELM correctors highlight the value of predictor-guided basis adaptation as an interpretable and efficient strategy for parametric PDE solving.
Summary / 总结
We propose a hybrid physics-informed framework for solving families of parametric linear partial differential equations (PDEs) by combining a meta-learned predictor with a least-squares corrector.
Automatic Self-supervised Learning for Social Recommendations
Authors: Xin He, Wenqi Fan, Mingchen Sun, Ying Wang, Xin Wang
First: 2024-12-25T01:47:39+00:00 · Latest: 2026-04-10T08:58:31+00:00
Comments: Accepted by Neurocomputing
Abstract
In recent years, researchers have leveraged social relations to enhance recommendation performance. However, most existing social recommendation methods require carefully designed auxiliary social tasks tailored to specific scenarios, which depend heavily on domain knowledge and expertise. To address this limitation, we propose Automatic Self-supervised Learning for Social Recommendations (AusRec), which integrates multiple self-supervised auxiliary tasks with an automatic weighting mechanism to adaptively balance their contributions through a meta-learning optimization framework. This design enables the model to automatically learn the optimal importance of each auxiliary task, thereby enhancing representation learning in social recommendations. Extensive experiments on several real-world datasets demonstrate that AusRec consistently outperforms state-of-the-art baselines, validating its effectiveness and robustness across different recommendation scenarios.
Summary / 总结
In recent years, researchers have leveraged social relations to enhance recommendation performance.
Task-Distributionally Robust Data-Free Meta-Learning
Authors: Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Baoyuan Wu, Chun Yuan, Dacheng Tao
First: 2023-11-23T15:46:54+00:00 · Latest: 2026-04-10T08:54:36+00:00
Abstract
Data-Free Meta-Learning (DFML) aims to enable efficient learning of unseen few-shot tasks, by meta-learning from multiple pre-trained models without accessing their original training data. While existing DFML methods typically generate synthetic data from these models to perform meta-learning, a comprehensive analysis of DFML's robustness-particularly its failure modes and vulnerability to potential attacks-remains notably absent. Such an analysis is crucial as algorithms often operate in complex and uncertain real-world environments. This paper fills this significant gap by systematically investigating the robustness of DFML, identifying two critical but previously overlooked vulnerabilities: Task-Distribution Shift (TDS) and Task-Distribution Corruption (TDC). TDS refers to the sequential shifts in the evolving task distribution, leading to the catastrophic forgetting of previously learned meta-knowledge. TDC exposes a security flaw of DFML, revealing its susceptibility to attacks when the pre-trained model pool includes untrustworthy models that deceptively claim to be beneficial but are actually harmful. To mitigate these vulnerabilities, we propose a trustworthy DFML framework comprising three components: synthetic task reconstruction, meta-learning with task memory interpolation, and automatic model selection. Specifically, utilizing model inversion techniques, we reconstruct synthetic tasks from multiple pre-trained models to perform meta-learning. To prevent forgetting, we introduce a strategy to replay interpolated historical tasks to efficiently recall previous meta-knowledge. Furthermore, our framework seamlessly incorporates an automatic model selection mechanism to automatically filter out untrustworthy models during the meta-learning process. Code is available at https://github.com/Egg-Hu/Trustworthy-DFML.
Summary / 总结
Data-Free Meta-Learning (DFML) aims to enable efficient learning of unseen few-shot tasks, by meta-learning from multiple pre-trained models without accessing their original training data.
CLIP-Inspector: Model-Level Backdoor Detection for Prompt-Tuned CLIP via OOD Trigger Inversion
Authors: Akshit Jindal, Saket Anand, Chetan Arora, Vikram Goyal
Venue: CVPR
First: 2026-04-10T08:33:56+00:00 · Latest: 2026-04-10T08:33:56+00:00
Comments: 17 pages (8 main + 2 references + 7 supplementary), Accepted to CVPR Findings 2026
Abstract
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch. This semi-honest setting creates a security risk where a malicious provider can follow the prompt-tuning protocol yet implant a backdoor, forcing triggered inputs to be classified into an attacker-chosen class, even for out-of-distribution (OOD) data. Such backdoors leave encoders untouched, making them undetectable to existing methods that focus on encoder corruption. Other data-level methods that sanitize data before training or during inference, also fail to answer the critical question, "Is the delivered model backdoored or not?" To address this model-level verification problem, we introduce CLIP-Inspector (CI), a backdoor detection method designed for prompt-tuned CLIP models. Assuming white-box access to the delivered model and a pool of unlabeled OOD images, CI reconstructs possible triggers for each class to determine if the model exhibits backdoor behaviour or not. Additionally, we demonstrate that using CI's reconstructed trigger for fine-tuning on correctly labeled triggered inputs enables us to re-align the model and reduce backdoor effectiveness. Through extensive experiments across ten datasets and four backdoor attacks, we demonstrate that CI can reconstruct effective triggers in a single epoch using only 1,000 OOD images, achieving a 94% detection accuracy (47/50 models). Compared to adapted trigger-inversion baselines, CI yields a markedly higher AUROC score (0.973 vs 0.495/0.687), thus enabling the vetting and post-hoc repair of prompt-tuned CLIP models to ensure safe deployment.
Summary / 总结
Organisations with limited data and computational resources increasingly outsource model training to Machine Learning as a Service (MLaaS) providers, who adapt vision-language models (VLMs) such as CLIP to downstream tasks via prompt tuning rather than training from scratch.
Chronological Contrastive Learning: Few-Shot Progression Assessment in Irreversible Diseases
Authors: Clemens Watzenböck, Daniel Aletaha, Michaël Deman, Thomas Deimel, Jana Eder, Ivana Janickova, Robert Janiczek, Peter Mandl, Philipp Seeböck, Gabriela Supp, Paul Weiser, Georg Langs
First: 2026-03-23T12:53:04+00:00 · Latest: 2026-04-10T08:26:35+00:00
Comments: Accepted for MIDL 2026; Reviews available at https://openreview.net/forum?id=c1UkGC3MVq
Abstract
Quantitative disease severity scoring in medical imaging is costly, time-consuming, and subject to inter-reader variability. At the same time, clinical archives contain far more longitudinal imaging data than expert-annotated severity scores. Existing self-supervised methods typically ignore this chronological structure. We introduce ChronoCon, a contrastive learning approach that replaces label-based ranking losses with rankings derived solely from the visitation order of a patient's longitudinal scans. Under the clinically plausible assumption of monotonic progression in irreversible diseases, the method learns disease-relevant representations without using any expert labels. This generalizes the idea of Rank-N-Contrast from label distances to temporal ordering. Evaluated on rheumatoid arthritis radiographs for severity assessment, the learned representations substantially improve label efficiency. In low-label settings, ChronoCon significantly outperforms a fully supervised baseline initialized from ImageNet weights. In a few-shot learning experiment, fine-tuning ChronoCon on expert scores from only five patients yields an intraclass correlation coefficient of 86% for severity score prediction. These results demonstrate the potential of chronological contrastive learning to exploit routinely available imaging metadata to reduce annotation requirements in the irreversible disease domain. Code is available at https://github.com/cirmuw/ChronoCon.
Summary / 总结
Quantitative disease severity scoring in medical imaging is costly, time-consuming, and subject to inter-reader variability.
ALTO: Adaptive LoRA Tuning and Orchestration for Heterogeneous LoRA Training Workloads
Authors: Jingwei Zuo, Xinze Feng, Zien Liu, Kaijian Wang, Fanjiang Ye, Ye Cao, Zhuang Wang, Yuke Wang
First: 2026-04-07T04:40:17+00:00 · Latest: 2026-04-10T07:32:38+00:00
Abstract
Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices. In practice, this leads to many concurrent LoRA jobs, often spanning heterogeneous tasks in multi-tenant environments. Existing systems largely handle these jobs independently, which both wastes computation on weak candidates and leaves GPUs underutilized. We present ALTO (Adaptive LoRA Tuning and Orchestration), a co-designed training system that accelerates LoRA hyperparameter tuning while enabling efficient cluster sharing across heterogeneous tasks. The central insight behind ALTO is that when multiple tuning jobs run concurrently over a shared frozen backbone, they expose optimization opportunities that single-job designs cannot exploit. Building on this, ALTO monitors loss trajectories to terminate unpromising configurations early, uses fused grouped GEMM together with a new rank-local adapter parallelism to co-locate surviving adapters and reclaim freed GPU capacity, and combines intra-task and inter-task scheduling to improve multi-task placement by leveraging the predictable duration of LoRA jobs. Extensive evaluation shows that ALTO achieves up to $13.8\times$ speedup over state-of-the-art without sacrificing adapter quality.
Summary / 总结
Low-Rank Adaptation (LoRA) is now the dominant method for parameter-efficient fine-tuning of large language models, but achieving a high-quality adapter often requires systematic hyperparameter tuning because LoRA performance is highly sensitive to configuration choices.
BLEG: LLM Functions as Powerful fMRI Graph-Enhancer for Brain Network Analysis
Authors: Rui Dong, Zitong Wang, Jiaxing Li, Weihuang Zheng, Youyong Kong
First: 2026-04-01T06:53:13+00:00 · Latest: 2026-04-10T05:38:28+00:00
Abstract
Graph Neural Networks (GNNs) have been widely used in diverse brain network analysis tasks based on preprocessed functional magnetic resonance imaging (fMRI) data. However, their performances are constrained due to high feature sparsity and inherent limitations of domain knowledge within uni-modal neurographs. Meanwhile, large language models (LLMs) have demonstrated powerful representation capabilities. Combining LLMs with GNNs presents a promising direction for brain network analysis. While LLMs and MLLMs have emerged in neuroscience, integration of LLMs with graph-based data remains unexplored. In this work, we deal with these issues by incorporating LLM's powerful representation and generalization capabilities. Considering great cost for directly tuning LLMs, we instead function LLM as enhancer to boost GNN's performance on downstream tasks. Our method, namely BLEG, can be divided into three stages. We firstly prompt LLM to get augmented texts for fMRI graph data, then we design a LLM-LM instruction tuning method to get enhanced textual representations at a relatively lower cost. GNN is trained together for coarsened alignment. Finally we finetune an adapter after GNN for given downstream tasks. Alignment loss between LM and GNN logits is designed to further enhance GNN's representation. Extensive experiments on different datasets confirmed BLEG's superiority.Code can be available at https://github.com/KamonRiderDR/BLEG.
Summary / 总结
Graph Neural Networks (GNNs) have been widely used in diverse brain network analysis tasks based on preprocessed functional magnetic resonance imaging (fMRI) data.
Large-Scale Universal Defect Generation: Foundation Models and Datasets
Authors: Yuanting Fan, Jun Liu, Bin-Bin Gao, Xiaochen Chen, Yuhuan Lin, Zhewei Dai, Jiawei Zhan, Chengjie Wang
First: 2026-04-10T03:21:17+00:00 · Latest: 2026-04-10T03:21:17+00:00
Comments: 25 pages, 13 figures, preprint
Abstract
Existing defect/anomaly generation methods often rely on few-shot learning, which overfits to specific defect categories due to the lack of large-scale paired defect editing data. This issue is aggravated by substantial variations in defect scale and morphology, resulting in limited generalization, degraded realism, and category consistency. We address these challenges by introducing UDG, a large-scale dataset of 300K normal-abnormal-mask-caption quadruplets spanning diverse domains, and by presenting UniDG, a universal defect generation foundation model that supports both reference-based defect generation and text instruction-based defect editing without per-category fine-tuning. UniDG performs Defect-Context Editing via adaptive defect cropping and structured diptych input format, and fuses reference and target conditions through MM-DiT multimodal attention. A two-stage training strategy, Diversity-SFT followed by Consistency-RFT, further improves diversity while enhancing realism and reference consistency. Extensive experiments on MVTec-AD and VisA show that UniDG outperforms prior few-shot anomaly generation and image insertion/editing baselines in synthesis quality and downstream single- and multi-class anomaly detection/localization. Code will be available at https://github.com/RetoFan233/UniDG.
Summary / 总结
Existing defect/anomaly generation methods often rely on few-shot learning, which overfits to specific defect categories due to the lack of large-scale paired defect editing data.
Temporal Transfer Learning for Traffic Optimization with Coarse-grained Advisory Autonomy
Authors: Jung-Hoon Cho, Sirui Li, Jeongyun Kim, Cathy Wu
Venue: IEEE Transactions on Robotics, 2026
First: 2023-11-27T21:18:06+00:00 · Latest: 2026-04-10T03:00:25+00:00
Comments: 18 pages, 12 figures
Abstract
The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput. This paper explores advisory autonomy, in which real-time driving advisories are issued to the human drivers, thus achieving near-term performance of automated vehicles. Due to the complexity of traffic systems, recent studies of coordinating CAVs have resorted to leveraging deep reinforcement learning (RL). Coarse-grained advisory is formalized as zero-order holds, and we consider a range of hold duration from 0.1 to 40 seconds. However, despite the similarity of the higher frequency tasks on CAVs, a direct application of deep RL fails to be generalized to advisory autonomy tasks. To overcome this, we utilize zero-shot transfer, training policies on a set of source tasks--specific traffic scenarios with designated hold durations--and then evaluating the efficacy of these policies on different target tasks. We introduce Temporal Transfer Learning (TTL) algorithms to select source tasks for zero-shot transfer, systematically leveraging the temporal structure to solve the full range of tasks. TTL selects the most suitable source tasks to maximize the performance of the range of tasks. We validate our algorithms on diverse mixed-traffic scenarios, demonstrating that TTL more reliably solves the tasks than baselines. This paper underscores the potential of coarse-grained advisory autonomy with TTL in traffic flow optimization.
Summary / 总结
The recent development of connected and automated vehicle (CAV) technologies has spurred investigations to optimize dense urban traffic to maximize vehicle speed and throughput.
Improving Automatic Summarization of Radiology Reports through Mid-Training of Large Language Models
Authors: Mengxian Lyu, Cheng Peng, Ziyi Chen, Mengyuan Zhang, Jieting Li Lu, Yonghui Wu
First: 2026-02-28T03:36:46+00:00 · Latest: 2026-04-10T01:48:39+00:00
Abstract
Automatic summarization of radiology reports is an essential application to reduce the burden on physicians. Previous studies have widely used the "pre-training, fine-tuning" strategy to adapt large language models (LLMs) for summarization. This study proposed a subdomain adaptation through a mid-training method to improve summarization. We explored three adaptation strategies: (1) general-domain pre-training, (2) clinical-domain pre-training, and (3) clinical-domain pre-training followed by subdomain mid-training. We developed models using large-scale clinical text from the University of Florida (UF) Health and conducted mid-training and fine-tuning experiments using widely used benchmark datasets including OpenI and MIMIC-CXR. The experimental results show that the mid-trained model, GatorTronT5-Radio, achieved the best performance, outperforming models without mid-training in both text-based measures (ROUGE-L) and factuality measures (RadGraph-F1). Our mid-training methods also demonstrate better few-shot learning and could alleviate the "cold start" problem reported in previous studies as a learning barrier. Our findings support the use of "pre-training, mid-training, fine-tuning," instead of the widely used direct fine-tuning strategy.
Summary / 总结
Automatic summarization of radiology reports is an essential application to reduce the burden on physicians.
Task-agnostic Low-rank Residual Adaptation for Efficient Federated Continual Fine-Tuning
Authors: Feng Yu, Jia Hu, Geyong Min
First: 2025-05-18T09:19:13+00:00 · Latest: 2026-04-09T20:36:53+00:00
Abstract
Federated Parameter-Efficient Fine-Tuning (Fed-PEFT) enables lightweight adaptation of large pre-trained models in federated learning settings by updating only a small subset of parameters. However, Fed-PEFT methods typically assume a fixed label space and static downstream tasks, which is restrictive in realistic application scenarios where clients continuously encounter new classes over time. This leads to an emerging problem, known as \emph{Federated Continual Fine-Tuning} (FCFT). In FCFT, clients collaboratively fine-tune a pre-trained model over a sequence of tasks, where each client observes disjoint sets of new classes over time, and task identity is unavailable at inference time. FCFT is challenging because it simultaneously suffers from severe forgetting under non-IID client data distributions, parameter growth and task-specific inference caused by task-wise modules, and aggregation inconsistency across heterogeneous clients. To address these challenges, we propose Federated Task-agnostic Low-rank Residual Adaptation (Fed-TaLoRA), a novel approach for efficient FCFT built on task-agnostic adaptation, post-aggregation model calibration, and strategic low-rank adaptation placement. Fed-TaLoRA continuously fine-tunes a single shared module across sequential tasks to avoid task-wise parameter growth, and further introduces a theoretically grounded residual weight update mechanism to calibrate the aggregated global model and improve aggregation fidelity. We provide a theoretical analysis of the convergence and aggregation behavior of Fed-TaLoRA. Extensive experiments on four benchmark datasets demonstrate that Fed-TaLoRA consistently outperforms strong baselines while reducing communication and computation costs significantly.
Summary / 总结
Federated Parameter-Efficient Fine-Tuning (Fed-PEFT) enables lightweight adaptation of large pre-trained models in federated learning settings by updating only a small subset of parameters.
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Authors: Mu Nan, Muquan Yu, Weijian Mai, Jacob S. Prince, Hossein Adeli, Rui Zhang, Jiahang Cao, Benjamin Becker, John A. Pyles, Margaret M. Henderson, Chunfeng Song, Nikolaus Kriegeskorte, Michael J. Tarr, Xiaoqing Hu, Andrew F. Luo
Venue: CVPR 2026
First: 2026-04-09T17:59:32+00:00 · Latest: 2026-04-09T17:59:32+00:00
Comments: Accepted to CVPR 2026, website: https://github.com/ezacngm/brainCodec
Abstract
Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision. A field-wide goal is to achieve generalizable, cross-subject models. A major obstacle towards this goal is the substantial variability in neural representations across individuals, which has so far required training bespoke models or fine-tuning separately for each subject. To address this challenge, we introduce a meta-optimized approach for semantic visual decoding from fMRI that generalizes to novel subjects without any fine-tuning. By simply conditioning on a small set of image-brain activation examples from the new individual, our model rapidly infers their unique neural encoding patterns to facilitate robust and efficient visual decoding. Our approach is explicitly optimized for in-context learning of the new subject's encoding model and performs decoding by hierarchical inference, inverting the encoder. First, for multiple brain regions, we estimate the per-voxel visual response encoder parameters by constructing a context over multiple stimuli and responses. Second, we construct a context consisting of encoder parameters and response values over multiple voxels to perform aggregated functional inversion. We demonstrate strong cross-subject and cross-scanner generalization across diverse visual backbones without retraining or fine-tuning. Moreover, our approach requires neither anatomical alignment nor stimulus overlap. This work is a critical step towards a generalizable foundation model for non-invasive brain decoding.
Summary / 总结
Visual decoding from brain signals is a key challenge at the intersection of computer vision and neuroscience, requiring methods that bridge neural representations and computational models of vision.
Quantifying Explanation Consistency: The C-Score Metric for CAM-Based Explainability in Medical Image Classification
Authors: Kabilan Elangovan, Daniel Ting
First: 2026-04-09T17:47:31+00:00 · Latest: 2026-04-09T17:47:31+00:00
Abstract
Class Activation Mapping (CAM) methods are widely used to generate visual explanations for deep learning classifiers in medical imaging. However, existing evaluation frameworks assess whether explanations are correct, measured by localisation fidelity against radiologist annotations, rather than whether they are consistent: whether the model applies the same spatial reasoning strategy across different patients with the same pathology. We propose the C-Score (Consistency Score), a confidence-weighted, annotation-free metric that quantifies intra-class explanation reproducibility via intensity-emphasised pairwise soft IoU across correctly classified instances. We evaluate six CAM techniques: GradCAM, GradCAM++, LayerCAM, EigenCAM, ScoreCAM, and MS GradCAM++ across three CNN architectures (DenseNet201, InceptionV3, ResNet50V2) over thirty training epochs on the Kermany chest X-ray dataset, covering transfer learning and fine-tuning phases. We identify three distinct mechanisms of AUC-consistency dissociation, invisible to standard classification metrics: threshold-mediated gold list collapse, technique-specific attribution collapse at peak AUC, and class-level consistency masking in global aggregation. C-Score provides an early warning signal of impending model instability. ScoreCAM deterioration on ResNet50V2 is detectable one full checkpoint before catastrophic AUC collapse and yields architecture-specific clinical deployment recommendations grounded in explanation quality rather than predictive ranking alone.
Summary / 总结
Class Activation Mapping (CAM) methods are widely used to generate visual explanations for deep learning classifiers in medical imaging.
SUPERNOVA: Eliciting General Reasoning in LLMs with Reinforcement Learning on Natural Instructions
Authors: Ashima Suvarna, Kendrick Phan, Mehrab Beikzadeh, Hritik Bansal, Saadia Gabriel
First: 2026-04-09T17:16:07+00:00 · Latest: 2026-04-09T17:16:07+00:00
Comments: 23 Pages, 4 figures
Abstract
Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved large language model (LLM) reasoning in formal domains such as mathematics and code. Despite these advancements, LLMs still struggle with general reasoning tasks requiring capabilities such as causal inference and temporal understanding. Extending RLVR to general reasoning is fundamentally constrained by the lack of high-quality, verifiable training data that spans diverse reasoning skills. To address this challenge, we propose SUPERNOVA, a data curation framework for RLVR aimed at enhancing general reasoning. Our key insight is that instruction-tuning datasets containing expert-annotated ground-truth encode rich reasoning patterns that can be systematically adapted for RLVR. To study this, we conduct 100+ controlled RL experiments to analyze how data design choices impact downstream reasoning performance. In particular, we investigate three key factors: (i) source task selection, (ii) task mixing strategies, and (iii) synthetic interventions for improving data quality. Our analysis reveals that source task selection is non-trivial and has a significant impact on downstream reasoning performance. Moreover, selecting tasks based on their performance for individual target tasks outperforms strategies based on overall average performance. Finally, models trained on SUPERNOVA outperform strong baselines (e.g., Qwen3.5) on challenging reasoning benchmarks including BBEH, Zebralogic, and MMLU-Pro. In particular, training on SUPERNOVA yields relative improvements of up to 52.8\% on BBEH across model sizes, demonstrating the effectiveness of principled data curation for RLVR. Our findings provide practical insights for curating human-annotated resources to extend RLVR to general reasoning. The code and data is available at https://github.com/asuvarna31/supernova.
Summary / 总结
Reinforcement Learning with Verifiable Rewards (RLVR) has significantly improved large language model (LLM) reasoning in formal domains such as mathematics and code.
SOLAR: Communication-Efficient Model Adaptation via Subspace-Oriented Latent Adapter Reparametrization
Authors: Seyed Mahmoud Sajjadi Mohammadabadi, Xiaolong Ma, Lei Yang, Feng Yan, Junshan Zhang
First: 2026-04-09T15:34:13+00:00 · Latest: 2026-04-09T15:34:13+00:00
Abstract
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, enable scalable adaptation of foundation models by injecting low-rank adapters. However, their communication and storage costs remain a major bottleneck in resource-constrained settings. We propose SOLAR (Subspace-Oriented Latent Adapter Reparameterization), a post-training compression framework that substantially reduces the communication cost (i.e., the number of parameters to transmit or store) of PEFT adapters. SOLAR expresses each PEFT update as a linear combination of basis vectors formed from the foundation model's singular vectors with controlled random perturbations. By exploiting the subspace similarity (the alignment of principal directions) between the foundation model and task-specific fine-tuned updates, SOLAR decouples the adapter size from PEFT structure and ensures compact yet expressive representations. It is model-agnostic and compatible with existing PEFT methods, including LoRA, AdaLoRA, and other adapter modules. We theoretically establish a bound on the reconstruction error. Experiments on language and vision tasks using LLaMA, GPT, and ViT models demonstrate that SOLAR preserves task performance while significantly reducing model representation sizes, offering an effective and communication-efficient solution for deployment in distributed systems and edge devices.
Summary / 总结
Parameter-efficient fine-tuning (PEFT) methods, such as LoRA, enable scalable adaptation of foundation models by injecting low-rank adapters.
Scaling Implicit Fields via Hypernetwork-Driven Multiscale Coordinate Transformations
Authors: Plein Versace
First: 2025-11-23T10:27:04+00:00 · Latest: 2026-04-09T15:07:56+00:00
Comments: arXiv admin note: This paper has been withdrawn by arXiv due to unverifiable authorship and affiliation
Abstract
Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, 3D shapes, signed distance fields, and radiance fields. While significant progress has been made in architecture design (e.g., SIREN, FFC, KAN-based INRs) and optimization strategies (meta-learning, amortization, distillation), existing approaches still suffer from two core limitations: (1) a representation bottleneck that forces a single MLP to uniformly model heterogeneous local structures, and (2) limited scalability due to the absence of a hierarchical mechanism that dynamically adapts to signal complexity. This work introduces Hyper-Coordinate Implicit Neural Representations (HC-INR), a new class of INRs that break the representational bottleneck by learning signal-adaptive coordinate transformations using a hypernetwork. HC-INR decomposes the representation task into two components: (i) a learned multiscale coordinate transformation module that warps the input domain into a disentangled latent space, and (ii) a compact implicit field network that models the transformed signal with significantly reduced complexity. The proposed model introduces a hierarchical hypernetwork architecture that conditions coordinate transformations on local signal features, enabling dynamic allocation of representation capacity. We theoretically show that HC-INR strictly increases the upper bound of representable frequency bands while maintaining Lipschitz stability. Extensive experiments across image fitting, shape reconstruction, and neural radiance field approximation demonstrate that HC-INR achieves up to 4 times higher reconstruction fidelity than strong INR baselines while using 30--60\% fewer parameters.
Summary / 总结
Implicit Neural Representations (INRs) have emerged as a powerful paradigm for representing signals such as images, 3D shapes, signed distance fields, and radiance fields.
Parameter-Efficient Transfer Learning for Microseismic Phase Picking Using a Neural Operator
Authors: Ayrat Abdullin, Umair Bin Waheed, Leo Eisner, Naveed Iqbal
First: 2025-12-15T11:13:21+00:00 · Latest: 2026-04-09T14:01:27+00:00
Comments: v2: Revised manuscript after journal review; updated methods/results; now submitted to Nature Scientific Reports
Abstract
Seismic phase picking is fundamental for microseismic monitoring and subsurface imaging. Manual processing is impractical for real-time applications and large sensor arrays, motivating the use of deep learning-based pickers trained on extensive earthquake catalogs. On a broader scale, these models are generally tuned to perform optimally in high signal-to-noise and long-duration networks and often fail to perform satisfactorily when applied to campaign-based microseismic datasets, which are characterized by low signal-to-noise ratios, sparse geometries, and limited labeled data.
In this study, we present a microseismic adaptation of a network-wide earthquake phase picker, Phase Neural Operator (PhaseNO), using transfer learning and parameter-efficient fine-tuning. Starting from a model pre-trained on more than 57,000 three-component earthquake and noise records, we fine-tune it using only 200 labeled and noisy microseismic recordings from hydraulic fracturing settings. We present a parameter-efficient adaptation of PhaseNO that fine-tunes a small fraction of its parameters (only 3.6%) while retaining its global spatiotemporal representations learned from a large dataset of earthquake recordings.
We then evaluate our adapted model on three independent microseismic datasets and compare its performance against the original pre-trained PhaseNO, a STA/LTA-based workflow, and two state-of-the-art deep learning models, PhaseNet and EQTransformer. We demonstrate that our adapted model significantly outperforms the original PhaseNO in F1 and accuracy metrics, achieving up to 30% absolute improvements in all test sets and consistently performing better than STA/LTA and state-of-the-art models. With our adaptation being based on a small calibration set, our proposed workflow is a practical and efficient tool to deploy network-wide models in data-limited microseismic applications.
Summary / 总结
Seismic phase picking is fundamental for microseismic monitoring and subsurface imaging.
SearchAD: Large-Scale Rare Image Retrieval Dataset for Autonomous Driving
Authors: Felix Embacher, Jonas Uhrig, Marius Cordts, Markus Enzweiler
Venue: CVPR 2026
First: 2026-04-09T09:10:41+00:00 · Latest: 2026-04-09T09:10:41+00:00
Comments: To be published in CVPR 2026
Abstract
Retrieving rare and safety-critical driving scenarios from large-scale datasets is essential for building robust autonomous driving (AD) systems. As dataset sizes continue to grow, the key challenge shifts from collecting more data to efficiently identifying the most relevant samples. We introduce SearchAD, a large-scale rare image retrieval dataset for AD containing over 423k frames drawn from 11 established datasets. SearchAD provides high-quality manual annotations of more than 513k bounding boxes covering 90 rare categories. It specifically targets the needle-in-a-haystack problem of locating extremely rare classes, with some appearing fewer than 50 times across the entire dataset. Unlike existing benchmarks, which focused on instance-level retrieval, SearchAD emphasizes semantic image retrieval with a well-defined data split, enabling text-to-image and image-to-image retrieval, few-shot learning, and fine-tuning of multi-modal retrieval models. Comprehensive evaluations show that text-based methods outperform image-based ones due to stronger inherent semantic grounding. While models directly aligning spatial visual features with language achieve the best zero-shot results, and our fine-tuning baseline significantly improves performance, absolute retrieval capabilities remain unsatisfactory. With a held-out test set on a public benchmark server, SearchAD establishes the first large-scale dataset for retrieval-driven data curation and long-tail perception research in AD: https://iis-esslingen.github.io/searchad/
Summary / 总结
Retrieving rare and safety-critical driving scenarios from large-scale datasets is essential for building robust autonomous driving (AD) systems.
DSCA: Dynamic Subspace Concept Alignment for Lifelong VLM Editing
Authors: Gyanendra Das, Sai Satyam Jena
Venue: CVPR 2026
First: 2026-04-09T08:25:54+00:00 · Latest: 2026-04-09T08:25:54+00:00
Comments: Accepted at CVPR 2026
Abstract
Model editing aims to update knowledge to add new concepts and change relevant information without retraining. Lifelong editing is a challenging task, prone to disrupting previously learned concepts, especially for Vision Language Models (VLMs), because sequential edits can lead to degraded reasoning and cross modal misalignment. Existing VLM knowledge editing methods based on gated adapters, activation edits, and parameter merging techniques address catastrophic forgetting seen in full fine tuning; however, they still operate in the shared representation space of the VLM, where concepts are entangled, so edits interfere with other non relevant concepts. We hypothesize that this instability persists because current methods algorithmically control edits via optimization rather than structurally separating knowledge. We introduce Dynamic Subspace Concept Alignment (DSCA) which by design mitigates this limitation by decomposing the representation space into a set of orthogonal semantic subspaces and proposing edits only in those transformed spaces. These subspaces are obtained through incremental clustering and PCA on joint vision language representations. This process structurally isolates concepts, enabling precise, non interfering edits by turning isolation from a soft training objective into an architectural property. The surgical edits are guided by a multi term loss function for maintaining task fidelity, edit locality, and cross modal alignment. With the base model frozen, our method achieves 98 percent single edit success, remains over 95 percent after 1000 sequential edits, lowers hallucination by 3 to 5 percent, and achieves the best backward transfer (BWT) scores on continual instruction tuning benchmarks. Extensive experiments demonstrate DSCA state of the art stability and knowledge retention capability in continual lifelong editing across various datasets and benchmarks.
Summary / 总结
Model editing aims to update knowledge to add new concepts and change relevant information without retraining.
Data Selection for Multi-turn Dialogue Instruction Tuning
Authors: Bo Li, Shikun Zhang, Wei Ye
Venue: ACL 2026
First: 2026-04-09T07:01:26+00:00 · Latest: 2026-04-09T07:01:26+00:00
Comments: Project: https://github.com/pkuserc/MDS
Abstract
Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns. We address this from a data selection perspective and propose \textbf{MDS} (Multi-turn Dialogue Selection), a dialogue-level framework that scores whole conversations rather than isolated turns. MDS combines a global coverage stage that performs bin-wise selection in the user-query trajectory space to retain representative yet non-redundant dialogues, with a local structural stage that evaluates within-dialogue reliability through entity-grounded topic grounding and information progress, together with query-answer form consistency for functional alignment. MDS outperforms strong single-turn selectors, dialogue-level LLM scorers, and heuristic baselines on three multi-turn benchmarks and an in-domain Banking test set, achieving the best overall rank across reference-free and reference-based metrics, and is more robust on long conversations under the same training budget. Code and resources are included in the supplementary materials.
Summary / 总结
Instruction-tuned language models increasingly rely on large multi-turn dialogue corpora, but these datasets are often noisy and structurally inconsistent, with topic drift, repetitive chitchat, and mismatched answer formats across turns.
Comparative Evaluation of Embedding Representations for Financial News Sentiment Analysis
Authors: Joyjit Roy, Samaresh Kumar Singh
Venue: 2026 IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI), IEEE, 2026
First: 2025-12-15T04:52:30+00:00 · Latest: 2026-04-09T04:37:43+00:00
Comments: 6 pages, 2 figures. Published in the 4th IEEE International Conference on Interdisciplinary Approaches in Technology and Management for Social Innovation (IATMSI 2026), IEEE
Abstract
Financial sentiment analysis enhances market understanding. However, standard Natural Language Processing (NLP) approaches encounter significant challenges when applied to small datasets. This study presents a comparative evaluation of embedding-based techniques for financial news sentiment classification in resource-constrained environments. Word2Vec, GloVe, and sentence transformer representations are evaluated in combination with gradient boosting on a manually labeled dataset of 349 financial news headlines. Experimental results identify a substantial gap between validation and test performance. Despite strong validation metrics, models underperform relative to trivial baselines. The analysis indicates that pretrained embeddings yield diminishing returns below a critical data sufficiency threshold. Small validation sets contribute to overfitting during model selection. Practical application is illustrated through weekly sentiment aggregation and narrative summarization for market monitoring. Overall, the findings indicate that embedding quality alone cannot address fundamental data scarcity in sentiment classification. Practitioners with limited labeled data should consider alternative strategies, including few-shot learning, data augmentation, or lexicon-enhanced hybrid methods.
Summary / 总结
Financial sentiment analysis enhances market understanding.
PIKA: Expert-Level Synthetic Datasets for Post-Training Alignment from Scratch
Authors: Shangjian Yin, Shining Liang, Wenbiao Ding, Yuli Qian, Zhouxing Shi, Hongzhi Li, Yutao Xie
First: 2025-10-08T05:47:37+00:00 · Latest: 2026-04-09T03:56:10+00:00
Abstract
High-quality instruction data is critical for LLM alignment, yet existing open-source datasets often lack efficiency, requiring hundreds of thousands of examples to approach proprietary performance. In this work, we find that beyond the widely recognized importance of prompt-response quality, prompt difficulty itself plays a critical role in driving alignment gains. Motivated by this observation, we introduce PiKa, a data-efficient family of expert-level alignment datasets that concentrates supervision on high-difficulty instructions. The PiKa-SFT dataset contains only 30k examples, an order of magnitude fewer than state-of-the-art open datasets like Magpie-Pro. Despite its small size, fine-tuning Llama-3-8B-Base on PiKa-SFT even outperforms the official Llama-3-8B-Instruct model trained on over 10M proprietary examples on widely used benchmarks such as AlpacaEval 2.0 and Arena-Hard. We also validate the generalizability of PiKa across the Qwen2.5 series (0.5B-7B), consistently surpassing their official instruction-tuned counterparts. Additionally, we provide 30k high-quality preference optimization examples to further enhance alignment. Our results demonstrate that promising alignment is achievable with significantly reduced data, democratizing access for resource-constrained research. Our code and data will be available at https://github.com/SJY8460/PiKa.
Summary / 总结
High-quality instruction data is critical for LLM alignment, yet existing open-source datasets often lack efficiency, requiring hundreds of thousands of examples to approach proprietary performance.
Towards Rapid Constitutive Model Discovery from Multi-Modal Data: Physics Augmented Finite Element Model Updating (paFEMU)
Authors: Jingye Tan, Govinda Anantha Padmanabha, Steven J. Yang, Nikolaos Bouklas
First: 2026-04-09T03:08:07+00:00 · Latest: 2026-04-09T03:08:07+00:00
Abstract
Recent progress in AI-enabled constitutive modeling has concentrated on moving from a purely data-driven paradigm to the enforcement of physical constraints and mechanistic principles, a concept referred to as physics augmentation. Classical phenomenological approaches rely on selecting a pre-defined model and calibrating its parameters, while machine learning methods often focus on discovery of the model itself. Sparse regression approaches lie in between, where large libraries of pre-defined models are probed during calibration. Sparsification in the aforementioned paradigm, but also in the context of neural network architecture, has been shown to enable interpretability, uncertainty quantification, but also heterogeneous software integration due to the low-dimensional nature of the resulting models. Most works in AI-enabled constitutive modeling have also focused on data from a single source, but in reality, materials modeling workflows can contain data from many different sources (multi-modal data), and also from testing other materials within the same materials class (multi-fidelity data). In this work, we introduce physics augmented finite element model updating (paFEMU), as a transfer learning approach that combines AI-enabled constitutive modeling, sparsification for interpretable model discovery, and finite element-based adjoint optimization utilizing multi-modal data. This is achieved by combining simple mechanical testing data, potentially from a distinct material, with digital image correlation-type full-field data acquisition to ultimately enable rapid constitutive modeling discovery. The simplicity of the sparse representation enables easy integration of neural constitutive models in existing finite element workflows, and also enables low-dimensional updating during transfer learning.
Summary / 总结
Recent progress in AI-enabled constitutive modeling has concentrated on moving from a purely data-driven paradigm to the enforcement of physical constraints and mechanistic principles, a concept referred to as physics augmentation.
Tractable Uncertainty-Aware Meta-Learning
Authors: Young-Jin Park, Cesar Almecija, Apoorva Sharma, Navid Azizan
First: 2022-10-04T20:02:25+00:00 · Latest: 2026-04-08T18:43:06+00:00
Abstract
Meta-learning is a popular approach for learning new tasks with limited data by leveraging the commonalities among different tasks. However, meta-learned models can perform poorly when context data is too limited, or when data is drawn from an out-of-distribution (OoD) task. Especially in safety-critical settings, this necessitates an uncertainty-aware approach to meta-learning. In addition, the often multimodal nature of task distributions can pose unique challenges to meta-learning methods. To this end, we present LUMA, a meta-learning method for regression that (1) makes probabilistic predictions on in-distribution tasks efficiently, (2) is capable of detecting OoD context data, and (3) handles heterogeneous, multimodal task distributions effectively. The strength of our framework lies in its solid theoretical basis, enabling analytically tractable Bayesian inference on a linearized model for principled uncertainty estimation and robust generalization. We achieve this by adopting a probabilistic perspective and learning a parametric, tunable task distribution via Bayesian inference on a linearized neural network, leveraging Gaussian process theory. Moreover, we make our approach computationally tractable by leveraging a low-rank prior covariance learning scheme based on the Fisher Information Matrix. Our numerical analysis demonstrates that LUMA quickly adapts to new tasks and remains accurate even in low-data regimes; it effectively detects OoD tasks; and that both of these properties continue to hold for multimodal task distributions.
Summary / 总结
Meta-learning is a popular approach for learning new tasks with limited data by leveraging the commonalities among different tasks.
Probabilistic Predictions of Process-Induced Deformation in Carbon/Epoxy Composites Using a Deep Operator Network
Authors: Elham Kiyani, Amit Makarand Deshpande, Madhura Limaye, Zhiwei Gao, Zongren Zou, Sai Aditya Pradeep, Srikanth Pilla, Gang Li, Zhen Li, George Em Karniadakis
First: 2025-12-15T03:04:45+00:00 · Latest: 2026-04-08T15:00:20+00:00
Comments: 21 pages, 13 figures
Abstract
Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets. These heterogeneities generate residual stresses over multiple length scales, whose partial release leads to process-induced deformation (PID), requiring accurate prediction and mitigation via optimized non-isothermal cure cycles. This study considers a unidirectional AS4 carbon fiber/amine bi-functional epoxy prepreg and models PID using a two-mechanism framework that accounts for thermal expansion/shrinkage and cure shrinkage. The model is validated against manufacturing trials to identify initial and boundary conditions, then used to generate PID responses for a diverse set of non-isothermal cure cycles (time-temperature profiles). Building on this physics-based foundation, we develop a data-driven surrogate based on Deep Operator Networks (DeepONets). A DeepONet is trained on a dataset combining high-fidelity simulations with targeted experimental measurements of PID. We extend this to a Feature-wise Linear Modulation (FiLM) DeepONet, where branch-network features are modulated by external parameters, including the initial degree of cure, enabling prediction of time histories of degree of cure, viscosity, and deformation. Because experimental data are available only at limited time instances (for example, final deformation), we use transfer learning: simulation-trained trunk and branch networks are fixed and only the final layer is updated using measured final deformation. Finally, we augment the framework with Ensemble Kalman Inversion (EKI) to quantify uncertainty under experimental conditions and to support optimization of cure schedules for reduced PID in composites.
Summary / 总结
Fiber reinforcement and polymer matrix respond differently to manufacturing conditions due to mismatch in coefficient of thermal expansion and matrix shrinkage during curing of thermosets.
Gemma 4, Phi-4, and Qwen3: Accuracy-Efficiency Tradeoffs in Dense and MoE Reasoning Language Models
Authors: Md Motaleb Hossen Manik, Ge Wang
First: 2026-04-08T12:50:52+00:00 · Latest: 2026-04-08T12:50:52+00:00
Abstract
Mixture-of-experts (MoE) language models are often expected to offer better quality-efficiency tradeoffs than dense models because only a subset of parameters is activated per token, but the practical value of that advantage depends on end-to-end behavior under realistic inference constraints. We present a controlled empirical benchmark of seven recent reasoning-oriented instruction-tuned models spanning dense and MoE designs, namely Gemma-4-E2B, Gemma-4-E4B, Gemma-4-26B-A4B, Phi-4-mini-reasoning, Phi-4-reasoning, Qwen3-8B, and Qwen3-30B-A3B, evaluated on four benchmarks -- ARC-Challenge, GSM8K, Math Level 1-3, and TruthfulQA MC1 -- under three prompting strategies: zero-shot, chain-of-thought, and few-shot chain-of-thought. The study covers 8,400 total model-dataset-prompt evaluations and records accuracy, latency, peak GPU memory usage (VRAM), and an approximate floating-point operations (FLOPs)-per-token proxy. Across the weighted multi-task summary, Gemma-4-E4B with few-shot chain-of-thought achieved the best overall result, reaching weighted accuracy 0.675 with mean VRAM 14.9 GB, while Gemma-4-26B-A4B was close in accuracy at 0.663 but substantially more memory intensive at 48.1 GB. At the task level, Gemma models dominated ARC and Math, Phi models were strongest on TruthfulQA, and GSM8K showed the largest prompt sensitivity, including a sharp drop for Phi-4-reasoning from 0.67 under chain-of-thought to 0.11 under few-shot chain-of-thought. These results show that sparse activation alone does not guarantee the best practical operating point: observed accuracy-efficiency tradeoffs depend jointly on architecture, prompting protocol, and task composition. We release a reproducible benchmark pipeline, aggregated results, and paired statistical analyses to support deployment-oriented evaluation of reasoning LLMs under real resource constraints.
Summary / 总结
Mixture-of-experts (MoE) language models are often expected to offer better quality-efficiency tradeoffs than dense models because only a subset of parameters is activated per token, but the practical value of that advantage depends on end-to-end behavior under realistic inference constraints.