So Far: My Response on Unfriendly AI
By Bryan Caplan
1. Orthogonality thesis – intelligence can be directed toward any
compact goal; consequentialist means-end reasoning can be deployed to
find means corresponding to a free choice of end; AIs are not
automatically nice; moral internalism is false.
I agree AIs are not “automatically nice.” The other statements are sufficiently jargony I don’t know whether I agree, but I assume they’re all roughly synonymous.
convergence – an AI doesn’t need to specifically hate you to hurt you; a
paperclip maximizer doesn’t hate you but you’re made out of atoms that
it can use to make paperclips, so leaving you alive represents an
opportunity cost and a number of foregone paperclips. Similarly,
paperclip maximizers want to self-improve, to perfect material
technology, to gain control of resources, to persuade their programmers
that they’re actually quite friendly, to hide their real thoughts from
their programmers via cognitive steganography or similar strategies, to
give no sign of value disalignment until they’ve achieved near-certainty
of victory from the moment of their first overt strike, etcetera.
3. Rapid capability gain and large capability differences – under
scenarios seeming more plausible than not, there’s the possibility of
AIs gaining in capability very rapidly, achieving large absolute
differences of capability, or some mixture of the two. (We could try to
keep that possibility non-actualized by a deliberate effort, and that
effort might even be successful, but that’s not the same as the avenue
Disagree, at least in spirit. I think Robin Hanson wins his “Foom” debate with Eliezer, and in any case see no reason to believe either of Eliezer’s scenarios is plausible. I’ll be grateful if we have self-driving cars before my younger son is old enough to drive ten years from now. Why “in spirit”? Because taken literally, I think there’s a “possibility” of Eliezer’s scenarios in every scenario. Per Tetlock, I wish he’d given an unconditional probability with a time frame to eliminate this ambiguity.
4. 1-3 in combination imply that Unfriendly AI is
a critical Problem-to-be-solved, because AGI is not automatically nice,
by default does things we regard as harmful, and will have avenues
leading up to great intelligence and power.
Disagree. “Not automatically nice” seems like a flimsy reason to worry. Indeed, what creature or group or species is “automatically nice”? Not humanity, that’s for sure. To make Eliezer’s conclusion follow from his premises, (1) should be replaced with something like:
1′. AIs have a non-trivial chance of being dangerously un-nice.
I do find this plausible, though only because many governments will create un-nice AIs on purpose. But I don’t find this any more scary than the current existence of un-nice governments. In fact, given the historic role of human error and passion in nuclear politics, a greater role for AIs makes me a little less worried.