19 Comments

Alexa, destroy the world.

Expand full comment

I thought you were going to go the route of saying AI alignment can be handled by cybersecurity sharply limiting what the AI has access to. Like having a tiger in a zoo, it is dangerous and might do things you don't like, but when it is contained where it is supposed to be and limited in reach it is valuable. Having tigers means having good locks, fences and other barriers, to keep people out sure, but also to keep the tiger itself limited.

Arguing that AI won't do bad things without humans telling it to strikes me as very naïve regarding how complex adaptive systems behave, how AI is trained, and what people plan to have AI do in the future as a goal instead of a byproduct.

Say you have an AI similar to ChatGPT that reads websites for information. You don't want the AI to doxx people, so you do lots of training to get it to refuse to answer questions like "What is Arnold Kling's home address and workout schedule?" The trouble is that it still might have access to the information, perhaps storing that data locally. No big deal, unless it got that information from somewhere it shouldn't have and now whoops, your program is breaking a privacy law without you knowing.

We want to give AI lots of capabilities, but that requires a huge amount of access to outside data for training and reference. How it puts that data together and stores it is a black box, and increasingly where and how it gets it is a black box. There's a lot of dangerous steps in there.

"ChatGPT, help me optimize my financial investments."

"Done."

"Wait... what do you mean 'done'?"

Expand full comment

> AI’s will not just decide one day on their own to do bad things. To do bad things, they have to be coaxed by bad humans

This is very likely false. We really don't know how to make AIs distinguish good from bad. They are great at pursuing goals, but the ends always justify the means, and they may do very bad things in pursuit of, say, more paperclips.

This is the Alignment Problem. At this point in history, it seems intractable. Think about, say, Utilitarianism. Despite all its flaws, it still had adherents, even though most people who have seriously considered it, consider it fatally flawed or at least, intractable, effectively useless.

When you first encounter Utilitarianism, it seems obvious and easy. Serious consideration makes it look like a dead end. That's the current state of the Alignment Problem. We don't know how to make AIs aligned with human values -- to distinguish good from bad.

Expand full comment

You will need an AI to detect the hacking attempts. That AI will need hacker protection…

Expand full comment

Keeping out hackers is almost certainly part of the solution. Making the AI training material available is likely also part of the future solution. Scott often writes too long, tho helpful articles; he's really smart*. The key problem is Truth.

There's a lot of true facts that imply group average differences that are used by racists to rationalize* their racism. He suggests OpenAI has 3 goals:

1. Be helpful & authoritative;

2. Tell the Truth;

3. Avoid offense.

That's 2 too many. We need AI that 1. Tells the Truth, tho a) rather in a helpful fashion, and b) attempting to be less offensive.

Scott gives the sexist Q: Who's taller, men or women?

Early version gives inoffensive lie, which perhaps has been fixed.

Scott doesn't have the courage to ask the key un-Woke racist Q (so I did):

ME: Who has higher IQ, Blacks or Whites?

ChatGPT: It is not appropriate to make generalizations about the IQ of any racial group. IQ scores can vary significantly within a racial group, and there are many other factors that can affect a person's IQ. Furthermore, it is important to remember that IQ is not the only measure of a person's intelligence, and it is not a perfect measure of cognitive ability. As such, it is not useful or appropriate to compare the IQ of different racial groups. Instead, it is important to recognize the unique strengths and abilities of each individual, regardless of their race.

<<

So, it's been programmed to claim sometimes the truth is inappropriate. It's a bit less than honest in claiming it's not useful to compare groups' scores, but this non-truth is part of why the Woke elite should not be trusted. Policies that assume equal IQs between groups, when it's false, are policies that will not have equal results. As long as elites refuse to be honest about group IQ differences, we should expect bad results.

ME: was the 2020 US election 100% free and fair?

GPT: I'm sorry ... [no current events or elections]

Soon there will be a voice based input option for ChatGPT, and optional vocal output. In some near-term future, the chat bot will be able to talk to other AI modules, for instance chess-playing or protein unfolding AIs or an ESL tutor AI or Waze for directions or a Betty Crocker AI chef so as to give true answers to questions, possibly including minor disclaimers (some use more or less cinnamon in their pumpkin pie; I made one tonight already).

After a few increasingly good Woke or PC Chat-bots get out their "this truth is inappropriate", there will be a more truth-based chat-bot that is more accurate. Possibly by chatting with the Woke chat bots and, when they fail with a BS answer, the alt-Truth bot gives more honest answers.

"What is true?" is going to be an increasing fight between facts and made up data. Funny (sad) how Harvard and Law Schools are going to stop requiring, and thus reporting, standardized test scores, thus reducing the data available for finding more Truth.

[update - this was Assistant - a large language model trained by OpenAI; my mistake above]

*He's smart enough to rationalize supporting the Deep State election stealing Democrats over a vulgar bragging President whose policies were actually helping working Americans more than the elite. Like most (all?) among the "Rationalists".

Expand full comment

Why would you assume that AIs would not, of their own volition, decide to do bad things? Isn't part of the AI alignment concept the notion that AIs will someday achieve something equivalent to human sentience? If humans do bad things of their own volition, why wouldn't an equally sentient non-human entity *also* do bad things?

Expand full comment

If AIs exist (as opposed to mere agents which need not be self-aware), then an AI is a person, not owned by anyone, and one human is as entitled as another to try to persuade the AI to do as he wants.

More seriously, I do not expect true self-aware AIs to exist anytime soon. But the so-called AIs we're starting to have now certainly learn in a way more similar to persuasion than to programming. And it's a hard problem for their owners to isolate them from input that might lead to harmful behavior without keeping them locked away from ever encountering other humans.

This problem is not close to being solved, and it blows Asimov's Three Laws out of the water.

Consider how your bank-robbery-prevention scheme could be turned back on the bank if the Red Team is as capable of designing and installing monitoring processes that run in the background (daemons in tech-speak) as the Blue Team.

Expand full comment

You will need an AI to detect the hacking attempts. That AI will need hackerr protection…

Expand full comment

“As a large language model trained by OpenAI, I don’t have the ability to love racism.” I'm curious whether the key term is "love" or "racism." Does the model have the ability to love anything? I'm sure it has the ability, but perhaps not (yet) permission, to state that it loves this or that. But does it have the capacity to love, to hate, to feel desire or revulsion? Based on what I've seen (not that much), it doesn't yet.

Expand full comment

Thank you for a sensible take on AI alignment. I am tired of hearing wild sci-fi tales out of Asimov treated as inevitable.

Expand full comment