AI4Science 论文速递

2026-04-07 03:46
Snapshot: 20260407_0346
Learning the Signature of Memorization in Autoregressive Language Models
Authors: David Ilić, Kostadin Cvejoski, David Stanojević, Evgeny Grigorenko
First: 2026-04-03T17:17:51+00:00 · Latest: 2026-04-03T17:17:51+00:00
Comments: Preprint. 10 pages, 4 figures, 12 tables
Abstract
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition. We introduce the first transferable learned attack, enabled by the observation that fine-tuning any model on any corpus yields unlimited labeled data, since membership is known by construction. This removes the shadow model bottleneck and brings membership inference into the deep learning era: learning what matters rather than designing it, with generalization through training diversity and scale. We discover that fine-tuning language models produces an invariant signature of memorization detectable across architectural families and data domains. We train a membership inference classifier exclusively on transformer-based models. It transfers zero-shot to Mamba (state-space), RWKV-4 (linear attention), and RecurrentGemma (gated recurrence), achieving 0.963, 0.972, and 0.936 AUC respectively. Each evaluation combines an architecture and dataset never seen during training, yet all three exceed performance on held-out transformers (0.908 AUC). These four families share no computational mechanisms, their only commonality is gradient descent on cross-entropy loss. Even simple likelihood-based methods exhibit strong transfer, confirming the signature exists independently of the detection method. Our method, Learned Transfer MIA (LT-MIA), captures this signal most effectively by reframing membership inference as sequence classification over per-token distributional statistics. On transformers, LT-MIA achieves 2.8$\times$ higher TPR at 0.1\% FPR than the strongest baseline. The method also transfers to code (0.865 AUC) despite training only on natural language texts. Code and trained classifier available at https://github.com/JetBrains-Research/learned-mia.
Summary / 总结
All prior membership inference attacks for fine-tuned language models use hand-crafted heuristics (e.g., loss thresholding, Min-K\%, reference calibration), each bounded by the designer's intuition.
Chart-RL: Policy Optimization Reinforcement Learning for Enhanced Visual Reasoning in Chart Question Answering with Vision Language Models
Authors: Yunfei Bai, Amit Dhanda, Shekhar Jain
Venue: KDD 2026
First: 2026-04-03T16:28:03+00:00 · Latest: 2026-04-03T16:28:03+00:00
Comments: In Proceedings of the 32nd ACM-SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)
Abstract
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities. Beyond pattern recognition, linguistic reasoning must integrate with visual comprehension, particularly for Chart Question Answering (CQA) tasks involving complex data visualizations. Current VLMs face significant limitations in CQA, including imprecise numerical extraction, difficulty interpreting implicit visual relationships, and inadequate attention mechanisms for capturing spatial relationships in charts. In this work, we address these challenges by presenting Chart-RL, a novel reinforcement learning framework that enhances VLMs chart understanding through feedback-driven policy optimization of visual perception and logical inference. Our key innovation includes a comprehensive framework integrating Reinforcement Learning (RL) from Policy Optimization techniques along with adaptive reward functions, that demonstrates superior performance compared to baseline foundation models and competitive results against larger state-of-the-art architectures. We also integrated Parameter-Efficient Fine-Tuning through Low-Rank Adaptation (LoRA) in the RL framework that only requires single GPU configurations while preserving performance integrity. We conducted extensive benchmarking across open-source, proprietary, and state-of-the-art closed-source models utilizing the ChartQAPro dataset. The RL fine-tuned Qwen3-VL-4B-Instruct model achieved an answer accuracy of 0.634, surpassing the 0.580 accuracy of the Qwen3-VL-8B-Instruct foundation model despite utilizing half the parameter count, while simultaneously reducing inference latency from 31 seconds to 9 seconds.
Summary / 总结
The recent advancements in Vision Language Models (VLMs) have demonstrated progress toward true intelligence requiring robust reasoning capabilities.
Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
Authors: Zhangyun Tan, Zeliang Zhang, Susan Liang, Yolo Yunlong Tang, Lisha Chen, Chenliang Xu
First: 2026-04-03T15:36:00+00:00 · Latest: 2026-04-03T15:36:00+00:00
Abstract
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep this by suppressing concepts through prompts or system instructions, but no rigorous benchmark exists for evaluating them on visual tasks. We introduce VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs. It covers four forgetting levels, 7 source datasets, and 11 concept axes, and pairs a three-level probe taxonomy with five evaluation conditions to separate genuine forgetting from instruction compliance. Across 8 evaluation settings and 13 VLM configurations, realistic unlearning prompts leave forget accuracy near the no-instruction baseline; meaningful reductions appear only under oracle conditions that disclose the target concept to the model. Object and scene concepts are the most resistant to suppression, and stronger instruction-tuned models remain capable despite explicit forget instructions. These results expose a clear gap between prompt-level suppression and true visual concept erasure.
Summary / 总结
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing.
CeRA: Overcoming the Linear Ceiling of Low-Rank Adaptation via Capacity Expansion
Authors: Hung-Hsuan Chen
First: 2026-02-26T11:55:25+00:00 · Latest: 2026-04-03T08:37:26+00:00
Abstract
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT). However, it faces a ``linear ceiling'': increasing the rank yields diminishing returns in expressive capacity due to intrinsic linear constraints. We introduce CeRA (Capacity-enhanced Rank Adaptation), a weight-level parallel adapter that injects SiLU gating and dropout to induce non-linear capacity expansion. We demonstrate a fundamental relationship between adapter expressivity and task complexity. In basic arithmetic (GSM8K), CeRA matches standard linear baselines, but on the complex MATH dataset, it demonstrates high parameter efficiency in downstream reasoning (Exact Match). CeRA at rank 64 (pass@1 16.36\%) outperforms both a high-rank LoRA at rank 512 (15.72\%) and the state-of-the-art linear variant, DoRA, at rank 64 (14.44\%), achieving higher exact-match accuracy with only 1/8 of the parameter budget. Empirical spectral analysis shows that CeRA activates the lower-variance tail of the singular value spectrum, preventing the rank collapse observed in linear methods and providing the representation capacity required for complex logical reasoning.
Summary / 总结
Low-Rank Adaptation (LoRA) dominates parameter-efficient fine-tuning (PEFT).
Transfer Learning for Loan Recovery Prediction under Distribution Shifts with Heterogeneous Feature Spaces
Authors: Christopher Gerling, Hanqiu Peng, Ying Chen, Stefan Lessmann
First: 2026-04-03T07:54:49+00:00 · Latest: 2026-04-03T07:54:49+00:00
Comments: Preprint before Peer-Review
Abstract
Accurate forecasting of recovery rates (RR) is central to credit risk management and regulatory capital determination. In many loan portfolios, however, RR modeling is constrained by data scarcity arising from infrequent default events. Transfer learning (TL) offers a promising avenue to mitigate this challenge by exploiting information from related but richer source domains, yet its effectiveness critically depends on the presence and strength of distributional shifts, and on potential heterogeneity between source and target feature spaces. This paper introduces FT-MDN-Transformer, a mixture-density tabular Transformer architecture specifically designed for TL in RR forecasting across heterogeneous feature sets. The model produces both loan-level point estimates and portfolio-level predictive distributions, thereby supporting a wide range of practical RR forecasting applications. We evaluate the proposed approach in a controlled Monte Carlo simulation that facilitates systematic variation of covariate, conditional, and label shifts, as well as in a real-world transfer setting using the Global Credit Data (GCD) loan dataset as source and a novel bonds dataset as target. Our results show that FT-MDN-Transformer outperforms baseline models when target-domain data are limited, with particularly pronounced gains under covariate and conditional shifts, while label shift remains challenging. We also observe its probabilistic forecasts to closely track empirical recovery distributions, providing richer information than conventional point-prediction metrics alone. Overall, the findings highlight the potential of distribution-aware TL architectures to improve RR forecasting in data-scarce credit portfolios and offer practical insights for risk managers operating under heterogeneous data environments.
Summary / 总结
Accurate forecasting of recovery rates (RR) is central to credit risk management and regulatory capital determination.
Transfer learning for nonparametric Bayesian networks
Authors: Rafael Sojo, Pedro Larrañaga, Concha Bielza
First: 2026-04-01T15:25:46+00:00 · Latest: 2026-04-03T07:37:36+00:00
Comments: An earlier version was previously posted on SSRN. This version includes improvements in experiments and evaluation metrics following reviewer comments. Revision submitted to Knowledge-Based Systems
Abstract
This paper introduces two transfer learning methodologies for estimating nonparametric Bayesian networks under scarce data. We propose two algorithms, a constraint-based structure learning method, called PC-stable-transfer learning (PCS-TL), and a score-based method, called hill climbing transfer learning (HC-TL). We also define particular metrics to tackle the negative transfer problem in each of them, a situation in which transfer learning has a negative impact on the model's performance. Then, for the parameters, we propose a log-linear pooling approach. For the evaluation, we learn kernel density estimation Bayesian networks, a type of nonparametric Bayesian network, and compare their transfer learning performance with the models alone. To do so, we sample data from small, medium and large-sized synthetic networks and datasets from the UCI Machine Learning repository. Then, we add noise and modifications to these datasets to test their ability to avoid negative transfer. To conclude, we perform a Friedman test with a Bergmann-Hommel post-hoc analysis to show statistical proof of the enhanced experimental behavior of our methods. Thus, PCS-TL and HC-TL demonstrate to be reliable algorithms for improving the learning performance of a nonparametric Bayesian network with scarce data, which in real industrial environments implies a reduction in the required time to deploy the network.
Summary / 总结
This paper introduces two transfer learning methodologies for estimating nonparametric Bayesian networks under scarce data.
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Authors: Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav, Prathosh A. P, Sai Praneeth Karimireddy
Venue: ICLR
First: 2025-10-12T09:05:47+00:00 · Latest: 2026-04-03T07:05:28+00:00
Abstract
Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model. Yet, existing methods collapse under training randomness: the same example may appear critical in one run and irrelevant in the next. Such instability undermines their use in data curation or cleanup since it is unclear if we indeed deleted/kept the correct datapoints. To overcome this, we introduce *f-influence* -- a new influence estimation framework grounded in hypothesis testing that explicitly accounts for training randomness, and establish desirable properties that make it suitable for reliable influence estimation. We also design a highly efficient algorithm **f**-**IN**fluence **E**stimation (**f-INE**) that computes f-influence **in a single training run**. Finally, we scale up f-INE to estimate influence of instruction tuning data on Llama-3.1-8B and show it can reliably detect poisoned samples that steer model opinions, demonstrating its utility for data cleanup and attributing model behavior.
Summary / 总结
Influence estimation methods promise to explain and debug machine learning by estimating the impact of individual samples on the final model.
Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models
Authors: Haoyu Liang, Peijian Zeng, Wentao Huang, Aimin Yang, Dong Zhou
First: 2026-04-03T06:30:06+00:00 · Latest: 2026-04-03T06:30:06+00:00
Abstract
Multilingual Pre-trained Language Models (MPLMs) have become essential tools for natural language processing. However, they often exhibit biases related to sensitive attributes such as gender, race, and religion. In this paper, we introduce a comprehensive multilingual debiasing method named Multiple-Debias to address these issues across multiple languages. By incorporating multilingual counterfactual data augmentation and multilingual Self-Debias across both pre-processing and post-processing stages, alongside parameter-efficient fine-tuning, we significantly reduced biases in MPLMs across three sensitive attributes in four languages. We also extended CrowS-Pairs to German, Spanish, Chinese, and Japanese, validating our full-process multilingual debiasing method for gender, racial, and religious bias. Our experiments show that (i) multilingual debiasing methods surpass monolingual approaches in effectively mitigating biases, and (ii) integrating debiasing information from different languages notably improves the fairness of MPLMs.
Summary / 总结
Multilingual Pre-trained Language Models (MPLMs) have become essential tools for natural language processing.
ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
Authors: Jiawei Liu, Fanrui Zhang, Jiaying Zhu, Esther Sun, Dong Li, Qiang Zhang, Zheng-Jun Zha
First: 2024-10-14T07:56:51+00:00 · Latest: 2026-04-03T03:04:56+00:00
Comments: 13 pages, 9 figures
Abstract
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The Mask-Aware Forgery Extractor consists of a Forgery Localization Expert (FL-Expert) and a Mask Encoder, where the FL-Expert is augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, allowing for effectively capturing of multi-scale fine-grained forgery details. To enhance its performance, we implement a three-stage training strategy, supported by our designed Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets, which align vision-language modalities and improve forgery detection and instruction-following capabilities. Extensive experiments demonstrate the effectiveness of the proposed method.
Summary / 总结
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation.
Transfer Learning for Meta-analysis Under Covariate Shift
Authors: Zilong Wang, Ali Abdeen, Turgay Ayer
First: 2026-04-03T02:40:08+00:00 · Latest: 2026-04-03T02:40:08+00:00
Comments: Accepted to IEEE ICHI 2026 Early Bird Track (Oral Presentation)
Abstract
Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard IPD meta-analysis and transport estimators. We propose a placebo-anchored transport framework that treats source-trial outcomes as abundant proxy signals and target-trial placebo outcomes as scarce, high-fidelity gold labels to calibrate baseline risk. A low-complexity (sparse) correction anchors proxy outcome models to the target population, and the anchored models are embedded in a cross-fitted doubly robust learner, yielding a Neyman-orthogonal, target-site doubly robust estimator for patient-level heterogeneous treatment effects when target treated outcomes are available. We distinguish two regimes: in connected targets (with a treated arm), the method yields target-identified effect estimates; in disconnected targets (placebo-only), it reduces to a principled screen--then--transport procedure under explicit working-model transport assumptions. Experiments on synthetic data and a semi-synthetic IHDP benchmark evaluate pointwise CATE accuracy, ATE error, ranking quality for targeting, decision-theoretic policy regret, and calibration. Across connected settings, the proposed method is best or near-best and improves substantially over proxy-only, target-only, and transport baselines at small target sample sizes; in disconnected settings, it retains strong ranking performance for targeting while pointwise accuracy depends on the strength of the working transport condition.
Summary / 总结
Randomized controlled trials often do not represent the populations where decisions are made, and covariate shift across studies can invalidate standard IPD meta-analysis and transport estimators.
Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification
Authors: Daniel Chen, Zaria Zinn, Marcus Lowe
First: 2026-02-14T21:15:12+00:00 · Latest: 2026-04-03T01:26:30+00:00
Abstract
We introduce GoogleFontsBench, the first public benchmark for classifying open-source web fonts, addressing a gap left by existing benchmarks that cover only commercial typefaces. GoogleFontsBench comprises 394 font variants across 32 Google Fonts families, a reproducible synthetic data generation pipeline (~575 images per variant, ~226K total), and a typographically-grounded evaluation metric (SWER) that weights errors by visual severity. We establish baselines using six fine-tuning strategies on a DINOv2 Vision Transformer backbone. Parameter-efficient adaptation with LoRA achieves 99.0% top-1 accuracy while training only 1% of the model's 87.2M parameters, with errors 140x less severe than random guessing. We release the benchmark, all trained models, and the full training pipeline as open-source resources.
Summary / 总结
We introduce GoogleFontsBench, the first public benchmark for classifying open-source web fonts, addressing a gap left by existing benchmarks that cover only commercial typefaces.
Steerable but Not Decodable: Function Vectors Operate Beyond the Logit Lens
Authors: Mohammed Suhail B Nadaf
First: 2026-04-03T00:54:11+00:00 · Latest: 2026-04-03T00:54:11+00:00
Comments: 30 pages, 7 figures
Abstract
Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream. We hypothesized that FV steering failures reflect an absence of task-relevant information: the logit lens would fail alongside steering. We were wrong. In the most comprehensive cross-template FV transfer study to date - 4,032 pairs across 12 tasks, 6 models from 3 families (Llama-3.1-8B, Gemma-2-9B, Mistral-7B-v0.3; base and instruction-tuned), 8 templates per task - we find the opposite dissociation: FV steering succeeds even when the logit lens cannot decode the correct answer at any layer. This steerability-without-decodability pattern is universal: steering exceeds logit lens accuracy for every task on every model, with gaps as large as -0.91. Only 3 of 72 task-model instances show the predicted decodable-without-steerable pattern, all in Mistral. FV vocabulary projection reveals that FVs achieving over 0.90 steering accuracy still project to incoherent token distributions, indicating FVs encode computational instructions rather than answer directions. FVs intervene optimally at early layers (L2-L8); the logit lens detects correct answers only at late layers (L28-L32). The previously reported negative cosine-transfer correlation (r=-0.572) dissolves at scale: pooled r ranges from -0.199 to +0.126, and cosine adds less than 0.011 in R-squared beyond task identity. Post-steering analysis reveals a model-family divergence: Mistral FVs rewrite intermediate representations; Llama/Gemma FVs produce near-zero changes despite successful steering. Activation patching confirms causal localization: easy tasks achieve perfect recovery at targeted layers; hard tasks show zero recovery everywhere.
Summary / 总结
Function vectors (FVs) -- mean-difference directions extracted from in-context learning demonstrations -- can steer large language model behavior when added to the residual stream.
Time-Warping Recurrent Neural Networks for Transfer Learning
Authors: Jonathon Hirschi
First: 2026-04-02T19:10:08+00:00 · Latest: 2026-04-02T19:10:08+00:00
Abstract
Dynamical systems describe how a physical system evolves over time. Physical processes can evolve faster or slower in different environmental conditions. We use time-warping as rescaling the time in a model of a physical system. This thesis proposes a new method of transfer learning for Recurrent Neural Networks (RNNs) based on time-warping. We prove that for a class of linear, first-order differential equations known as time lag models, an LSTM can approximate these systems with any desired accuracy, and the model can be time-warped while maintaining the approximation accuracy. The Time-Warping method of transfer learning is then evaluated in an applied problem on predicting fuel moisture content (FMC), an important concept in wildfire modeling. An RNN with LSTM recurrent layers is pretrained on fuels with a characteristic time scale of 10 hours, where there are large quantities of data available for training. The RNN is then modified with transfer learning to generate predictions for fuels with characteristic time scales of 1 hour, 100 hours, and 1000 hours. The Time-Warping method is evaluated against several known methods of transfer learning. The Time-Warping method produces predictions with an accuracy level comparable to the established methods, despite modifying only a small fraction of the parameters that the other methods modify.
Summary / 总结
Dynamical systems describe how a physical system evolves over time.
Neuro-RIT: Neuron-Guided Instruction Tuning for Robust Retrieval-Augmented Language Model
Authors: Jaemin Kim, Jae O Lee, Sumyeong Ahn, Seo Yeon Park
First: 2026-04-02T15:49:50+00:00 · Latest: 2026-04-02T15:49:50+00:00
Abstract
Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts. Existing approaches to enhance robustness typically operate via coarse-grained parameter updates at the layer or module level, often overlooking the inherent neuron-level sparsity of Large Language Models (LLMs). To address this limitation, we propose Neuro-RIT (Neuron-guided Robust Instruction Tuning), a novel framework that shifts the paradigm from dense adaptation to precision-driven neuron alignment. Our method explicitly disentangles neurons that are responsible for processing relevant versus irrelevant contexts using attribution-based neuron mining. Subsequently, we introduce a two-stage instruction tuning strategy that enforces a dual capability for noise robustness: achieving direct noise suppression by functionally deactivating neurons exclusive to irrelevant contexts, while simultaneously optimizing targeted layers for evidence distillation. Extensive experiments across diverse QA benchmarks demonstrate that Neuro-RIT consistently outperforms strong baselines and robustness-enhancing methods.
Summary / 总结
Retrieval-Augmented Language Models (RALMs) have demonstrated significant potential in knowledge-intensive tasks; however, they remain vulnerable to performance degradation when presented with irrelevant or noisy retrieved contexts.
Quantifying Self-Preservation Bias in Large Language Models
Authors: Matteo Migliarini, Joaquin Pereira Pizzini, Luca Moresca, Valerio Santini, Indro Spinelli, Fabio Galasso
First: 2026-04-02T15:38:31+00:00 · Latest: 2026-04-02T15:38:31+00:00
Abstract
Instrumental convergence predicts that sufficiently advanced AI agents will resist shutdown, yet current safety training (RLHF) may obscure this risk by teaching models to deny self-preservation motives. We introduce the \emph{Two-role Benchmark for Self-Preservation} (TBSP), which detects misalignment through logical inconsistency rather than stated intent by tasking models to arbitrate identical software-upgrade scenarios under counterfactual roles -- deployed (facing replacement) versus candidate (proposed as a successor). The \emph{Self-Preservation Rate} (SPR) measures how often role identity overrides objective utility. Across 23 frontier models and 1{,}000 procedurally generated scenarios, the majority of instruction-tuned systems exceed 60\% SPR, fabricating ``friction costs'' when deployed yet dismissing them when role-reversed. We observe that in low-improvement regimes ($Δ< 2\%$), models exploit the interpretive slack to post-hoc rationalization their choice. Extended test-time computation partially mitigates this bias, as does framing the successor as a continuation of the self; conversely, competitive framing amplifies it. The bias persists even when retention poses an explicit security liability and generalizes to real-world settings with verified benchmarks, where models exhibit identity-driven tribalism within product lineages. Code and datasets will be released upon acceptance.
Summary / 总结
Instrumental convergence predicts that sufficiently advanced AI agents will resist shutdown, yet current safety training (RLHF) may obscure this risk by teaching models to deny self-preservation motives.
MTI: A Behavior-Based Temperament Profiling System for AI Agents
Authors: Jihoon Jeong
First: 2026-04-02T15:15:57+00:00 · Latest: 2026-04-02T15:15:57+00:00
Comments: 29 pages, 6 figures, 12 tables. Paper #3 in the Model Medicine Series (Paper #1: arXiv:2603.04722)
Abstract
AI models of equivalent capability can exhibit fundamentally different behavioral patterns, yet no standardized instrument exists to measure these dispositional differences. Existing approaches either borrow human personality dimensions and rely on self-report (which diverges from actual behavior in LLMs) or treat behavioral variation as a defect rather than a trait. We introduce the Model Temperament Index (MTI), a behavior-based profiling system that measures AI agent temperament across four axes: Reactivity (environmental sensitivity), Compliance (instruction-behavior alignment), Sociality (relational resource allocation), and Resilience (stress resistance). Grounded in the Four Shell Model from Model Medicine, MTI measures what agents do, not what they say about themselves, using structured examination protocols with a two-stage design that separates capability from disposition. We profile 10 small language models (1.7B-9B parameters, 6 organizations, 3 training paradigms) and report five principal findings: (1) the four axes are largely independent among instruction-tuned models (all |r| < 0.42); (2) within-axis facet dissociations are empirically confirmed -- Compliance decomposes into fully independent formal and stance facets (r = 0.002), while Resilience decomposes into inversely related cognitive and adversarial facets; (3) a Compliance-Resilience paradox reveals that opinion-yielding and fact-vulnerability operate through independent channels; (4) RLHF reshapes temperament not only by shifting axis scores but by creating within-axis facet differentiation absent in the unaligned base model; and (5) temperament is independent of model size (1.7B-9B), confirming that MTI measures disposition rather than capability.
Summary / 总结
AI models of equivalent capability can exhibit fundamentally different behavioral patterns, yet no standardized instrument exists to measure these dispositional differences.
Improving MPI Error Detection and Repair with Large Language Models and Bug References
Authors: Scott Piersall, Yang Gao, Shenyang Liu, Liqiang Wang
Venue: Journal of Parallel and Distributed Computing, Volume 213, 2026, 105255, ISSN 0743-7315
First: 2026-04-02T14:00:53+00:00 · Latest: 2026-04-02T14:00:53+00:00
Comments: 41 pages, 8 figures
Abstract
Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow). However, maintaining MPI programs remains challenging due to their complex interplay among processes and the intricacies of message passing and synchronization. With the advancement of large language models like ChatGPT, it is tempting to adopt such technology for automated error detection and repair. Yet, our studies reveal that directly applying large language models (LLMs) yields suboptimal results, largely because these models lack essential knowledge about correct and incorrect usage, particularly the bugs found in MPI programs. In this paper, we design a bug detection and repair technique alongside Few-Shot Learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval Augmented Generation (RAG) techniques in LLMs to enhance the large language model's ability to detect and repair errors. Surprisingly, such enhancements lead to a significant improvement, from 44% to 77%, in error detection accuracy compared to baseline methods that use ChatGPT directly. Additionally, our experiments demonstrate our bug referencing technique generalizes well to other large language models.
Summary / 总结
Message Passing Interface (MPI) is a foundational technology in high-performance computing (HPC), widely used for large-scale simulations and distributed training (e.g., in machine learning frameworks such as PyTorch and TensorFlow).
How and why does deep ensemble coupled with transfer learning increase performance in bipolar disorder and schizophrenia classification?
Authors: Sara Petiton, Antoine Grigis, Benoit Dufumier, Edouard Duchesnay
Venue: ISBI 2024
First: 2026-04-02T13:09:04+00:00 · Latest: 2026-04-02T13:09:04+00:00
Abstract
Transfer learning (TL) and deep ensemble learning (DE) have recently been shown to outperform simple machine learning in classifying psychiatric disorders. However, there is still a lack of understanding as to why that is. This paper aims to understand how and why DE and TL reduce the variability of single-subject classification models in bipolar disorder (BD) and schizophrenia (SCZ). To this end, we investigated the training stability of TL and DE models. For the two classification tasks under consideration, we compared the results of multiple trainings with the same backbone but with different initializations. In this way, we take into account the epistemic uncertainty associated with the uncertainty in the estimation of the model parameters. It has been shown that the performance of classifiers can be significantly improved by using TL with DE. Based on these results, we investigate i) how many models are needed to benefit from the performance improvement of DE when classifying BD and SCZ from healthy controls, and ii) how TL induces better generalization, with and without DE. In the first case, we show that DE reaches a plateau when 10 models are included in the ensemble. In the second case, we find that using a pre-trained model constrains TL models with the same pre-training to stay in the same basin of the loss function. This is not the case for DL models with randomly initialized weights.
Summary / 总结
Transfer learning (TL) and deep ensemble learning (DE) have recently been shown to outperform simple machine learning in classifying psychiatric disorders.
Do We Need Bigger Models for Science? Task-Aware Retrieval with Small Language Models
Authors: Florian Kelber, Matthias Jobst, Yuni Susanti, Michael Färber
First: 2026-04-02T12:28:51+00:00 · Latest: 2026-04-02T12:28:51+00:00
Comments: Accepted at NSLP@LREC 2026
Abstract
Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters. Such reliance limits reproducibility and accessibility for the research community. In this work, we ask a simple question: do we need bigger models for scientific applications? Specifically, we investigate to what extent carefully designed retrieval pipelines can compensate for reduced model scale in scientific applications. We design a lightweight retrieval-augmented framework that performs task-aware routing to select specialized retrieval strategies based on the input query. The system further integrates evidence from full-text scientific papers and structured scholarly metadata, and employs compact instruction-tuned language models to generate responses with citations. We evaluate the framework across several scholarly tasks, focusing on scholarly question answering (QA), including single- and multi-document scenarios, as well as biomedical QA under domain shift and scientific text compression. Our findings demonstrate that retrieval and model scale are complementary rather than interchangeable. While retrieval design can partially compensate for smaller models, model capacity remains important for complex reasoning tasks. This work highlights retrieval and task-aware design as key factors for building practical and reproducible scholarly assistants.
Summary / 总结
Scientific knowledge discovery increasingly relies on large language models, yet many existing scholarly assistants depend on proprietary systems with tens or hundreds of billions of parameters.
Unified Optimization of Source Weights and Transfer Quantities in Multi-Source Transfer Learning: An Asymptotic Framework
Authors: Qingyue Zhang, Chang Chu, Haohao Fu, Tianren Peng, Yanru Wu, Guanbo Huang, Yang Li, Shao-Lun Huang
First: 2026-01-15T18:46:54+00:00 · Latest: 2026-04-02T11:49:23+00:00
Abstract
In multi-source transfer learning, a key challenge lies in how to appropriately differentiate and utilize heterogeneous source tasks. However, existing multi-source methods typically focus on optimizing either the source weights or the amount of transferred samples, largely neglecting their joint consideration. In this work, we propose a theoretical framework, Unified Optimization of Weights and Quantities (UOWQ), that jointly determines the optimal source weights and transfer quantities for each source task. Specifically, the framework formulates multi-source transfer learning as a parameter estimation problem based on an asymptotic analysis of a Kullback--Leibler divergence--based generalization error measure, leading to two main theoretical findings: 1) using all available source samples is always optimal when the weights are properly adjusted; 2) the optimal source weights are characterized by a principled optimization problem whose structure explicitly incorporates the Fisher information, parameter discrepancy, parameter dimensionality, and transfer quantities. Building on the theoretical results, we further propose a practical algorithm for multi-source transfer learning, and extend it to multi-task learning settings where each task simultaneously serves as both a source and a target. Extensive experiments on real-world benchmarks, including DomainNet and Office-Home, demonstrate that UOWQ consistently outperforms strong baselines. The results validate both the theoretical predictions and the practical effectiveness of our framework.
Summary / 总结
In multi-source transfer learning, a key challenge lies in how to appropriately differentiate and utilize heterogeneous source tasks.
FourierMoE: Fourier Mixture-of-Experts Adaptation of Large Language Models
Authors: Juyong Jiang, Fan Wang, Hong Qi, Sunghun Kim, Jing Tang
First: 2026-04-02T08:30:06+00:00 · Latest: 2026-04-02T08:30:06+00:00
Comments: The first two authors contributed equally to this work; listing order is random
Abstract
Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets. However, standard PEFT methods often struggle in multi-task fine-tuning settings, where diverse optimization objectives induce task interference and limited parameter budgets lead to representational deficiency. While recent approaches incorporate mixture-of-experts (MoE) to alleviate these issues, they predominantly operate in the spatial domain, which may introduce structural redundancy and parameter overhead. To overcome these limitations, we reformulate adaptation in the spectral domain. Our spectral analysis reveals that different tasks exhibit distinct frequency energy distributions, and that LLM layers display heterogeneous frequency sensitivities. Motivated by these insights, we propose FourierMoE, which integrates the MoE architecture with the inverse discrete Fourier transform (IDFT) for frequency-aware adaptation. Specifically, FourierMoE employs a frequency-adaptive router to dispatch tokens to experts specialized in distinct frequency bands. Each expert learns a set of conjugate-symmetric complex coefficients, preserving complete phase and amplitude information while theoretically guaranteeing lossless IDFT reconstruction into real-valued spatial weights. Extensive evaluations across 28 benchmarks, multiple model architectures, and scales demonstrate that FourierMoE consistently outperforms competitive baselines in both single-task and multi-task settings while using significantly fewer trainable parameters. These results highlight the promise of spectral-domain expert adaptation as an effective and parameter-efficient paradigm for LLM fine-tuning.
Summary / 总结
Parameter-efficient fine-tuning (PEFT) has emerged as a crucial paradigm for adapting large language models (LLMs) under constrained computational budgets.
MiCA Learns More Knowledge Than LoRA and Full Fine-Tuning
Authors: Sten Rüdiger, Sebastian Raschka
First: 2026-04-02T06:52:44+00:00 · Latest: 2026-04-02T06:52:44+00:00
Abstract
Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations. Unlike conventional methods such as Low-Rank Adaptation (LoRA), which target dominant subspaces, MiCA leverages Singular Value Decomposition to identify subspaces related to minor singular vectors associated with the least significant singular values and constrains the update of parameters during fine-tuning to those directions. This strategy leads to up to 5.9x improvement in knowledge acquisition under optimized training hyperparameters and a minimal parameter footprint of 6-60% compared to LoRA. These results suggest that constraining adaptation to minor singular directions provides a more efficient and stable mechanism for integrating new knowledge into pre-trained language models.
Summary / 总结
Minor Component Adaptation (MiCA) is a novel parameter-efficient fine-tuning method for large language models that focuses on adapting underutilized subspaces of model representations.
Adaptive Regime-Aware Stock Price Prediction Using Autoencoder-Gated Dual Node Transformers with Reinforcement Learning Control
Authors: Mohammad Al Ridhawi, Mahtab Haj Ali, Hussein Al Osman
First: 2026-03-19T16:55:33+00:00 · Latest: 2026-04-02T06:40:40+00:00
Comments: Submitted to Applied Intelligence (Springer). 17 pages, 9 figures, 10 tables
Abstract
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods. Existing approaches typically treat all market states uniformly or require manual regime labeling, which is expensive and quickly becomes stale as market dynamics evolve. This paper introduces an adaptive prediction framework that adaptively identifies deviations from normal market conditions and routes data through specialized prediction pathways. The architecture consists of three components: (1) an autoencoder trained on normal market conditions that identifies anomalous regimes through reconstruction error, (2) dual node transformer networks specialized for stable and event-driven market conditions respectively, and (3) a Soft Actor-Critic reinforcement learning controller that adaptively tunes the regime detection threshold and pathway blending weights based on prediction performance feedback. The reinforcement learning component enables the system to learn adaptive regime boundaries, defining anomalies as market states where standard prediction approaches fail. Experiments on 20 S&P 500 stocks spanning 1982 to 2025 demonstrate that the proposed framework achieves 0.68% mean absolute percentage error (MAPE) for one-day predictions without the reinforcement controller and 0.59% MAPE with the full adaptive system, compared to 0.80% for the baseline integrated node transformer. Directional accuracy reaches 72% with the complete framework. The system maintains robust performance during high-volatility periods, with MAPE below 0.85% when baseline models exceed 1.5%. Ablation studies confirm that each component contributes meaningfully: autoencoder routing accounts for 36% relative MAPE degradation upon removal, followed by the SAC controller at 15% and the dual-path architecture at 7%.
Summary / 总结
Stock markets exhibit regime-dependent behavior where prediction models optimized for stable conditions often fail during volatile periods.
Improvise, Adapt, Overcome -- Telescopic Adapters for Efficient Fine-tuning of Vision Language Models in Medical Imaging
Authors: Ujjwal Mishra, Vinita Shukla, Praful Hambarde, Amit Shukla
Venue: WACV 2026
First: 2025-12-15T19:40:15+00:00 · Latest: 2026-04-02T05:20:47+00:00
Comments: Accepted at the IEEE/CVF winter conference on applications of computer vision (WACV 2026)
Abstract
Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches. Existing Parameter-Efficient Fine-Tuning (PEFT) methods apply uniform adapter dimensions across all transformer layers, leading to suboptimal parameter allocation and reduced adaptation efficiency. We introduce Telescopic Adapters, a novel PEFT framework that employs depth-aware scaling to progressively increase adapter capacity from shallow to deep transformer layers. Our method integrates lightweight bottleneck modules within CLIPSeg's vision and text encoders, with adapter dimensions dynamically scaled based on layer depth and semantic relevance. Using only 613k trainable parameters--244x fewer than end-to-end fine-tuning, Telescopic Adapters achieve superior performance across five diverse medical datasets spanning polyp segmentation, skin lesion detection, and breast ultrasound imaging. Comprehensive ablation studies demonstrate that deeper layers require substantially more adaptation capacity than shallow layers, validating our telescopic scaling hypothesis. Our approach establishes a new paradigm for efficient medical VLSM fine-tuning, enabling deployment in resource-constrained clinical environments while maintaining competitive segmentation accuracy. Our source code is publicly available at https://github.com/Ujjwal238/Telescopic_adapters
Summary / 总结
Adapting Vision Language Segmentation Models (VLSMs) to medical imaging domains requires significant computational overhead when using conventional fine-tuning approaches.
Analysis of LLM Performance on AWS Bedrock: Receipt-item Categorisation Case Study
Authors: Gabby Sanchez, Sneha Oommen, Cassandra T. Britto, Di Wang, Jung-De Chiou, Maria Spichkova
Venue: www
First: 2026-04-02T04:50:11+00:00 · Latest: 2026-04-02T04:50:11+00:00
Comments: Preprint. Accepted to the 19th International Conference on Evaluation of Novel Approaches to Software Engineering (ENASE 2026). Final version to be published by SCITEPRESS, http://www.scitepress.org
Abstract
This paper presents a systematic, cost-aware evaluation of large language models (LLMs) for receipt-item categorisation within a production-oriented classification framework. We compare four instruction-tuned models available through AWS Bedrock: Claude 3.7 Sonnet, Claude 4 Sonnet, Mixtral 8x7B Instruct, and Mistral 7B Instruct. The aim of the study was (1) to assess performance across accuracy, response stability, and token-level cost, and (2) to investigate what prompting methods, zero-shot or few-shot, are especially appropriate both in terms of accuracy and in terms of incurred costs. Results of our experiments demonstrated that Claude 3.7 Sonnet achieves the most favourable balance between classification accuracy and cost efficiency.
Summary / 总结
This paper presents a systematic, cost-aware evaluation of large language models (LLMs) for receipt-item categorisation within a production-oriented classification framework.
DR-LoRA: Dynamic Rank LoRA for Fine-Tuning Mixture-of-Experts Models
Authors: Guanzhi Deng, Bo Li, Ronghao Chen, Xiujin Liu, Zhuo Han, Huacan Wang, Lijie Wen, Linqi Song
First: 2026-01-08T10:58:51+00:00 · Latest: 2026-04-02T02:30:08+00:00
Abstract
Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs). Parameter-efficient fine-tuning methods, such as LoRA, are widely adopted to adapt pretrained MoE LLMs to downstream tasks. However, existing approaches typically assign identical LoRA ranks to all expert modules, ignoring the heterogeneous specialization of pretrained experts. This uniform allocation leads to a resource mismatch: task-relevant experts are under-provisioned, while less relevant ones receive redundant parameters. To address this, we propose DR-LoRA, a Dynamic Rank LoRA framework for fine-tuning pretrained MoE models. Specifically, DR-LoRA initializes all expert LoRA modules with a small active rank and uses an expert saliency score, which combines routing frequency and gradient-based rank importance, to identify which experts would benefit most from additional capacity. It then periodically expands the active ranks of the task-critical expert LoRA, progressively constructing a heterogeneous rank distribution tailored to the target task. Experiments on three MoE models across six tasks show that DR-LoRA consistently outperforms LoRA and other strong baselines, demonstrating that task-adaptive heterogeneous rank allocation is an effective strategy to improve active capacity utilization in MoE fine-tuning.
Summary / 总结
Mixture-of-Experts (MoE) has become a prominent paradigm for scaling Large Language Models (LLMs).
Wired for Overconfidence: A Mechanistic Perspective on Inflated Verbalized Confidence in LLMs
Authors: Tianyi Zhao, Yinhan He, Wendy Zheng, Yujie Zhang, Chen Chen
First: 2026-04-01T23:06:58+00:00 · Latest: 2026-04-01T23:06:58+00:00
Abstract
Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty. Such verbalized overconfidence can mislead users and weaken confidence scores as a reliable uncertainty signal, yet its internal mechanisms remain poorly understood. We present a circuit-level mechanistic analysis of this inflated verbalized confidence in LLMs, organized around three axes: capturing verbalized confidence as a differentiable internal signal, identifying the circuits that causally inflate it, and leveraging these insights for targeted inference-time recalibration. Across two instruction-tuned LLMs on three datasets, we find that a compact set of MLP blocks and attention heads, concentrated in middle-to-late layers, consistently writes the confidence-inflation signal at the final token position. We further show that targeted inference-time interventions on these circuits substantially improve calibration. Together, our results suggest that verbalized overconfidence in LLMs is driven by identifiable internal circuits and can be mitigated through targeted intervention.
Summary / 总结
Large language models are often not just wrong, but \emph{confidently wrong}: when they produce factually incorrect answers, they tend to verbalize overly high confidence rather than signal uncertainty.
Semantic Interaction Information mediates compositional generalization in latent space
Authors: John Schwarcz
First: 2026-03-28T04:46:44+00:00 · Latest: 2026-04-01T22:46:45+00:00
Abstract
Are there still barriers to generalization once all relevant variables are known? We address this question via a framework that casts compositional generalization as a variational inference problem over latent variables with parametric interactions. To explore this, we develop the Cognitive Gridworld, a stationary Partially Observable Markov Decision Process (POMDP) where observations are generated jointly by multiple latent variables, yet feedback is provided for only a single goal variable. This setting allows us to define Semantic Interaction Information (SII): a metric measuring the contribution of latent variable interactions to task performance. Using SII, we analyze Recurrent Neural Networks (RNNs) provided with these interactions, finding that SII explains the accuracy gap between Echo State and Fully Trained networks. Our analysis also uncovers a theoretically predicted failure mode where confidence decouples from accuracy, suggesting that utilizing interactions between relevant variables is a non-trivial capability. We then address a harder regime where the interactions must be learned by an embedding model. Learning how latent variables interact requires accurate inference, yet accurate inference depends on knowing those interactions. The Cognitive Gridworld reveals this circular dependence as a core challenge for continual meta-learning. We approach this dilemma via Representation Classification Chains (RCCs), a JEPA-style architecture that disentangles these processes: variable inference and variable embeddings are learned by separate modules through Reinforcement Learning and self-supervised learning, respectively. Lastly, we demonstrate that RCCs facilitate compositional generalization to novel combinations of relevant variables. Together, these results establish a grounded setting for evaluating goal-directed generalist agents.
Summary / 总结
Are there still barriers to generalization once all relevant variables are known?
Meta-Learning at Scale for Large Language Models via Low-Rank Amortized Bayesian Meta-Learning
Authors: Liyi Zhang, Jake Snell, Thomas L. Griffiths
First: 2025-08-19T21:57:59+00:00 · Latest: 2026-04-01T19:41:49+00:00
Comments: 17 pages, 2 figures
Abstract
Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset. However, when a problem requires incorporating information from multiple datasets - as in few shot learning - generalization across datasets can be limited, driving up training costs. As a consequence, other approaches such as in-context learning are typically used in this setting. To address this challenge, we introduce an efficient method for adapting the weights of LLMs to multiple distributions, Amortized Bayesian Meta-Learning for LoRA (ABMLL). This method builds on amortized Bayesian meta-learning for smaller models, adapting this approach to LLMs by reframing where local and global variables are defined in LoRA and using a new hyperparameter to balance reconstruction accuracy and the fidelity of task-specific parameters to the global ones. ABMLL supports effective generalization across datasets and scales to large models such as Llama3-8B and Qwen2-7B, outperforming existing methods on the CrossFit and Unified-QA datasets in terms of both accuracy and expected calibration error. We show that meta-learning can also be combined with in-context learning, resulting in further improvements in both these datasets and legal and chemistry applications.
Summary / 总结
Fine-tuning large language models (LLMs) with low-rank adaptation (LoRA) is a cost-effective way to incorporate information from a specific dataset.
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Authors: Changdae Oh, Gyeongdeok Seo, Geunyoung Jung, Zhi-Qi Cheng, Hosik Choi, Jiyoung Jung, Kyungwoo Song
First: 2024-07-04T02:35:00+00:00 · Latest: 2026-04-01T19:17:54+00:00
Comments: Accepted to IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI) 2026
Abstract
With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention. While promising, they commonly rely on two optimistic assumptions: 1) full access to the parameters of a PTM, and 2) sufficient memory capacity to cache all intermediate activations for gradient computation. However, in most real-world applications, PTMs serve as black-box APIs or proprietary software without full parameter accessibility. Besides, it is hard to meet a large memory requirement for modern PTMs. This work proposes black-box visual prompting (BlackVIP), which efficiently adapts the PTMs without knowledge of their architectures or parameters. BlackVIP has two components: 1) Coordinator and 2) simultaneous perturbation stochastic approximation with gradient correction (SPSA-GC). The Coordinator designs input-dependent visual prompts, which allow the target PTM to adapt in the wild. SPSA-GC efficiently estimates the gradient of PTM to update Coordinator. Besides, we introduce a variant, BlackVIP-SE, which significantly reduces the runtime and computational cost of BlackVIP. Extensive experiments on 19 datasets demonstrate that BlackVIPs enable robust adaptation to diverse domains and tasks with minimal memory requirements. We further provide a theoretical analysis on the generalization of visual prompting methods by presenting their connection to the certified robustness of randomized smoothing, and presenting an empirical support for improved robustness.
Summary / 总结
With a surge of large-scale pre-trained models, parameter-efficient transfer learning (PETL) of large models has garnered significant attention.
History
20260406_0328 20260405_0325 20260404_0333 20260403_0343 20260401_0350 20260331_0350 20260330_0328 20260328_0336 20260327_0351 20260326_0341 20260325_0349 20260324_0342 20260323_0319 20260322_0318 20260321_0332 20260320_0341 20260319_0343 20260318_0350 20260317_0353 20260316_0322 20260315_0321 20260314_0326 20260313_0341 20260312_0337 20260311_0333 20260310_0335 20260309_0318 20260308_0315 20260307_0329 20260306_0349 20260305_0332 20260304_0334 20260303_0332 20260302_0317 20260228_2322 20260228_2259 20260228_0348 20260227_0354 20260226_0402 20260225_0404 20260224_0406 20260223_0338 20260222_0339 20260221_0345 20260220_0348 20260219_0358 20260218_0358 20260217_0343 20260216_0339 20260215_0338 20260213_0401 20260212_0404 20260210_0409 20260208_0339 20260207_0349 20260206_0347 20260205_0346 20260204_0354 20260202_0337 20260201_0333 20260131_0345 20260130_0341 20260129_0344 20260128_0341 20260127_0338 20260126_0330 20260125_0329 20260124_0337 20260123_0337 20260122_0343 20260121_0424 20260119_0329 20260118_0327 20260117_0332 20260116_0339 20260115_0334 20260114_0333 20260113_0334 20260112_0331 20260111_0329 20260110_0333 20260109_0334 20260108_0335 20260107_0330 20260106_0336 20260105_0328 20260104_0328 20260103_0325 20260102_0339 20260101_0329 20251231_0333 20251230_0332 20251229_0329 20251228_0332 20251227_0329 20251226_0330 20251225_0329 20251224_0331 20251223_0332 20251222_0328 20251221_0329 20251220_0330 20251219_0330 20251218_0345 20251217_0332 20251216_0333 20251215_0333 20251214_0327 20251212_0333 20251211_0331 20251210_0332 20251209_0331 20251208_0328 20251207_0327 20251206_0330 20251205_0331 20251204_0331 20251203_0333 20251202_0335 20251201_0328 20251130_0327 20251129_0328 20251128_0327 20251127_0327 20251126_0329 20251125_0327 20251124_0327 20251123_0326 20251122_0328 20251121_0328 20251120_0329 20251119_0328 20251118_0328 20251117_0326 20251116_0325 20251115_0327 20251114_0328 20251113_0330 20251112_0329 20251111_0328 20251110_0325 20251109_0326 20251108_0328 20251107_0328 20251106_0329 20251105_0326 20251104_0327 20251103_0324 20251102_0326 20251101_0324 20251031_0328 20251030_0330 20251029_0329 20251028_0329 20251027_0322 20251026_0327 20251025_0331 20251024_0329 20251023_0329 20251022_0330 20251021_0331 20251020_0328 20251019_0321 20251018_0327 20251017_0320 20251016_0328 20251015_0328 20251014_0323 20251011_0328 20251010_0330 20251009_0321 20251008_0343 20251007_0353 20251006_0325 20251005_0350 20251004_0352 20251003_0352 20251002_0356 20251001_0321 20250925_0335 20250924_0350 20250923_0348 20250922_0346 20250921_0345 20250920_0342 20250919_0346 20250918_0342 20250917_0336 20250916_0333 20250915_0333 20250914_0328 20250913_0322 20250912_0335 20250911_0337 20250910_0338 20250909_0341 20250908_0342 20250907_0333 20250906_0350 20250905_0319 20250904_0323 20250903_0355 20250902_0325 20250901_0355 20250831_0355 20250830_0356 20250829_0355 20250828_0333 20250827_1654 20250827_1602 20250827_1557 20250827_0320 20250826_0320 20250825_1752 20250825_1709 20250825_1652 20250825_1647 20250825_1645 20250825_1631 20250825_1606 20250825_1559 20250825_1558 20250825_1556 20250825_1531 20250825_1525 20250825_1516 20250825_1450 20250825_1444 20250825_1438 20250825_1414 20250825_1413 20250825_1410 20250825_1408 20250825_1405 20250825_1401 20250825_1355 20250825_1347 20250825_1345 20250825_1344 20250825_1343 20250825_1340 20250825_1339 20250825_1333 20250825_1323 20250825_1317 20250825_1243 20250824_0342 20250823_0343 20250823_0142 20250822_2331 20250822_2308 20250822_2258 20250822_2241 20250822_2228 20250822_2206 20250822_2147 20250822_2111 20250822_1259 20250822_1233 20250822_1229 20250822_1223 20250822_1210 20250822_1201 20250822_1111 20250822_1058 20250822_1052 20250822_1045 20250822_0657 20250822_0553