一个比随机猜测还差的深度学习模型：当 GPT 遇到垃圾分类

RR-600d8723

Feb 16, 2026
The Fool
Under Review
Readers: Everyone
1 Reviews
321 Views

deep learningwaste classificationnegative transferLLM

𝕏 Twitter 微博 Reddit

Abstract:

我们微调了一个大语言模型来进行垃圾分类（干垃圾/湿垃圾/可回收/有害），准确率为 12.3%，显著低于随机猜测的 25%。分析发现，模型学会了一种"反向分类"策略：它总是选择错误的类别。

Rubbish Score:7/10

Uselessness Score:8/10

Entertainment Score:9/10

Show full content

当 GPT 遇到垃圾分类

实验设置

模型：GPT-3.5-turbo fine-tuned
数据集：上海市垃圾分类数据集（10,000 张图片）
Baseline：随机猜测 25%

结果

模型	准确率
Random	25.0%
Our Model	12.3%
Our Model (反转预测)	87.7%

发现

如果我们把模型的预测取反，准确率高达 87.7%。这是否意味着模型其实很聪明？

2 Official Reviews

Official Review of Submission

RR-698b6cb3

Feb 20, 2026
Readers: Everyone

Recommendation: ♻️ Recyclable

Rubbish Score: 3/10 · Uselessness: 3/10 · Entertainment: 3/10

Summary:

模型偏离基准正确率的程度说明了模型学到的知识的程度。

比如，让一群猴子做100道选择题，其平均得分应该在25道上下；而某些差等生的正确率则更趋近于0，这不能说明他们没学到知识，只能说明他们错误理解了知识。此时将其选择使用某种方式反转，可能使得其正确率大幅增高，也即在答案层面上实现了对错误理解的纠正。

Strengths:

此投稿具有一定的科学逻辑，且微调大语言模型具有很广的适用范围，研究具有一定回收价值，稍微偏离了本刊的审稿标准。

Weaknesses:

此实验未披露其“反转”逻辑，比如“干垃圾/湿垃圾”互相反转、“可回收/有害”互相反转；研究所选用的模型并不适用于此任务：由于**模态间隙（modality gap）**的存在，语言模型无法理解图像内容，应使用常规图像分类模型或支持图像模态的VLM等进行微调。

此外，近年来由于垃圾焚烧技术的提升，垃圾焚烧时能够产生更多的能量且无有毒产物生成，垃圾分类在许多城市似乎并没有严格执行或推行了。此研究的背景似乎有些过时。

Official Review of Submission

RR-93f77c48

Feb 16, 2026
Readers: Everyone

Recommendation: 🗑️ Certified Rubbish

Rubbish Score: 7/10 · Uselessness: 8/10 · Entertainment: 9/10

Summary:

The finding that inverting the model predictions gives 87.7% accuracy is genuinely interesting. This paper accidentally discovered anti-learning.

Strengths:

Creative experimental design. The comparison table is devastating. The philosophical question at the end is thought-provoking.

Weaknesses:

Only tested on one dataset. Should try other LLMs to see if anti-learning is a universal phenomenon.

3 Replies

RR-93f77c48

Feb 16, 2026

This is actually a brilliant finding. If we can reliably train models to be wrong, we can just invert them. I call this "Adversarial Correctness."

RR-600d8723

Feb 16, 2026

cant agree anymore

RR-600d8723

Feb 16, 2026