随着人工智能系统融入军事行动,一种熟悉的直觉已硬化为制度标准:利害关系越高,让“人在回路”中就越重要。在关乎生死的事务中,不能让机器自行决定。这种直觉是可以理解的,但在重要方面,它是错误的。
在低风险环境中,如交通管理或例行警务,人为监督可以起到后置保障作用。错误是显见的,决策可以修正,延误成本也可以承受。但在危机应对中,人为监督在纠正错误方面变得不那么有效。决策必须迅速做出,信息不完整,犹豫的后果变得更加严重。在这种条件下,后期的人为干预变得更不可靠,而非更可靠。
在后果最严重的军事背景下,对于算法系统产生的错误,后期的“人在回路”覆盖实际上是信誉度最低、最无效的修正方式。在军事交战中,错误可能是致命的。时间被压缩,不确定性普遍存在,决策往往不可撤销。传统观点认为,正是在这里,“人在回路”控制的理据最充分,假设人类判断在识别目标和避免平民伤亡方面固有地优于算法决策。然而,传统观点经不起推敲。
考虑一个简单但具有启发性的例子——我们可能称之为“白色面包车”问题。在战斗区,情报将报告的威胁与一辆白色面包车联系起来。其他信息很少或未经证实。对于地面士兵或空中的无人机,任何白色面包车都可能是完全无害的,也可能携带战斗人员或爆炸物。根本挑战在于,在特定条件下如何采取行动。
As artificial intelligence systems are integrated into military operations, a familiar intuition hardens into an institutional standard: The higher the stakes, the more essential it is to keep humans in the loop. In matters of life and death, machines must not be left to decide on their own.
That intuition is understandable. It is also, in important respects, wrong.
In lower-stakes environments—traffic management, service delivery, even routine policing—human oversight can sometimes function as a backstop. Errors are visible, decisions can be revisited, and the costs of delay are tolerable. In crisis response, human oversight becomes less effective in addressing errors. Decisions must be made quickly, information is incomplete, and the consequences of hesitation grow more severe. Under these conditions, late-stage human intervention becomes less reliable, not more.
In military contexts, where these dynamics are the most consequential, late-stage human-in-the-loop overrides are, in fact, the least trustworthy and effective way to fix errors that arise because of the algorithmic system. In military engagement, errors can be lethal. Time is compressed, uncertainty is pervasive, and decisions are often irreversible. Understandably, the conventional wisdom is that it is precisely here that the case for human-in-the-loop control is strongest. The assumption is that human judgment—especially in identifying targets and avoiding civilian harm—is inherently superior to algorithmic decision-making. However, the conventional wisdom does not hold up under scrutiny.
Consider a simple but revealing example—what we might call the white van problem. In a combat zone, intelligence has associated a reported threat with a white van. Other information is scant or unsubstantiated. For soldiers on the ground or drones in the sky, any white van may either be entirely benign—or it may be carrying combatants or explosives. The fundamental challenge, then, is how to act under conditions where signals are weak, context-dependent, and consequential.
When a white van is spotted, that signal, in isolation, is weak. Operators must rely on contextual cues: movement patterns, timing, proximity to known threats, and behavior that deviates from local norms. The standard argument is that such contextual judgment cannot be codified and must remain with human decision-makers in the field.
But this argument contains a critical tension. If those cues are sufficiently systematic to guide human judgment, they can, in principle, be incorporated into a model. Moreover, if the alternative is to depend on intuition shaped by stress, fatigue, or incomplete perception, then human override power is not a reliable basis for life-and-death decisions.
A common objection here is that combat scenarios are too novel and variable—that the uniqueness of each engagement makes it impossible to build a sufficiently large or representative dataset to train a reliable model. This concern deserves to be taken