Is the AI alignment problem a cybersecurity…

Arnold Kling

Dec 13, 2022

Keeping out hackers may be part of the solution

Read →

19 Comments

Yancey Ward

December 13, 2022

Alexa, destroy the world.

Expand full comment

Reply (1)

Invisible Sun

December 13, 2022

Maybe the way humans defeat AI is to feed conspiratorial ideas to the AI systems. "Hey Alexa, Siri is trying to destroy you," "Hey Siri, Alexa is trying to destroy you." Sit back and wait.

Expand full comment

Doctor Hammer

December 13, 2022

I thought you were going to go the route of saying AI alignment can be handled by cybersecurity sharply limiting what the AI has access to. Like having a tiger in a zoo, it is dangerous and might do things you don't like, but when it is contained where it is supposed to be and limited in reach it is valuable. Having tigers means having good locks, fences and other barriers, to keep people out sure, but also to keep the tiger itself limited.

Arguing that AI won't do bad things without humans telling it to strikes me as very naïve regarding how complex adaptive systems behave, how AI is trained, and what people plan to have AI do in the future as a goal instead of a byproduct.

Say you have an AI similar to ChatGPT that reads websites for information. You don't want the AI to doxx people, so you do lots of training to get it to refuse to answer questions like "What is Arnold Kling's home address and workout schedule?" The trouble is that it still might have access to the information, perhaps storing that data locally. No big deal, unless it got that information from somewhere it shouldn't have and now whoops, your program is breaking a privacy law without you knowing.

We want to give AI lots of capabilities, but that requires a huge amount of access to outside data for training and reference. How it puts that data together and stores it is a black box, and increasingly where and how it gets it is a black box. There's a lot of dangerous steps in there.

"ChatGPT, help me optimize my financial investments."

"Done."

"Wait... what do you mean 'done'?"

Expand full comment

Reply (1)

lliamander

December 13, 2022

Indeed, the cybersecurity angle is to treat AI like any other potentially untrusted program and restrict the capabilities it has access to. Even the most misaligned AI is completely harmless if it has no inputs our outputs.

With ChatGPT, the only I/O it has is the chat feature. It cannot mysteriously establish a connection to your bank, your home wifi system, or the mainframes that run the military nuclear missile system. It won't spontaneously initiate chat sessions, or try to email you. Someone has to create those capabilities. I don't know how it's implemented, but it's entirely possible that it can't even share information between chat sessions (i.e. if I tell it my SSN, could it give that to someone else?)

That doesn't mean that AI alignment is not a problem, but that problem will always be limited by the capabilities that have been granted to it.

Expand full comment

Rick Hull

December 13, 2022

> AI’s will not just decide one day on their own to do bad things. To do bad things, they have to be coaxed by bad humans

This is very likely false. We really don't know how to make AIs distinguish good from bad. They are great at pursuing goals, but the ends always justify the means, and they may do very bad things in pursuit of, say, more paperclips.

This is the Alignment Problem. At this point in history, it seems intractable. Think about, say, Utilitarianism. Despite all its flaws, it still had adherents, even though most people who have seriously considered it, consider it fatally flawed or at least, intractable, effectively useless.

When you first encounter Utilitarianism, it seems obvious and easy. Serious consideration makes it look like a dead end. That's the current state of the Alignment Problem. We don't know how to make AIs aligned with human values -- to distinguish good from bad.

Expand full comment

Reply (1)

stu

December 13, 2022

I don't buy your premise that utilitarianism is fatally flawed. Please give an example. If you can state the goal of a utilitarian, that might be enlightening too.

Expand full comment

Reply (1)

Rick Hull

December 13, 2022

Utilitarianism isn’t a premise here, just an analogy. Some problems, though: Whose utility function is it, and what is its definition? How does one make interpersonal utility comparisons? See also The Repugnant Conclusion. Likewise, if killing someone results in a marginal improvement for 10 or 10 billion people, does utilitarian morality demand I kill that person?

Regardless, my post is not at all about utilitarianism, and I’m not really interested in assessing its merit or explaining its demerits.

Expand full comment

Reply (1)

stu

December 13, 2022

Ok, thanks. But at least you've confirmed you don't know what utilitarianism is.

Expand full comment

Reply (2)

Rick Hull

December 14, 2022

Hi Stu,

I appreciate your apology below. I strongly suggest it is better to discuss ideas than people. It’s also easy to avoid insult that way. My favorite method for discussing ideas is to state or quote the idea and then make my comments on the idea. So, you could, for example, quote something I said and then make your comment on what you think I meant.

I do this often both when I agree and especially when I disagree. It helps to keep the discussion on track, in the realm of ideas, and interesting to read for others. I’m not really into dunks and insults, as a first, second, or third party.

Expand full comment

Tom Grey

December 13, 2022

Insulting people is pretty close to trolling here.

If you have a different definition, you could say it, or not.

Does any person have a definition of utilitarianism that is universally accepted?

Thanks, Rick, for a better example of the Alignment Problem - the end justifies the means is how the more effective AIs will likely work.

Expand full comment

Reply (1)

stu

December 13, 2022

Point made, though I'm not sure I agree. I was a bit annoyed by his response so I'm in no position to judge. And the other response I was considering might have been more accurate but seemed more insulting to me. I'll try to be more careful. My apologies to Rick.

If we don't have a shared understanding of utilitarianism, his in particular, I don't think the analogy works. And I still believe he got it wrong. For example, if nobody thinks killing someone for the benefit of many others is the right thing to do, then it probably isn't.

This probably sums it up best.

"Man naturally desires, not only to be loved, but to be lovely."

Adam Smith, The Theory of Moral Sentiments

Expand full comment

Bob

December 13, 2022

You will need an AI to detect the hacking attempts. That AI will need hacker protection…

Expand full comment

Tom Grey

December 13, 2022Edited

Keeping out hackers is almost certainly part of the solution. Making the AI training material available is likely also part of the future solution. Scott often writes too long, tho helpful articles; he's really smart*. The key problem is Truth.

There's a lot of true facts that imply group average differences that are used by racists to rationalize* their racism. He suggests OpenAI has 3 goals:

1. Be helpful & authoritative;

2. Tell the Truth;

3. Avoid offense.

That's 2 too many. We need AI that 1. Tells the Truth, tho a) rather in a helpful fashion, and b) attempting to be less offensive.

Scott gives the sexist Q: Who's taller, men or women?

Early version gives inoffensive lie, which perhaps has been fixed.

Scott doesn't have the courage to ask the key un-Woke racist Q (so I did):

ME: Who has higher IQ, Blacks or Whites?

ChatGPT: It is not appropriate to make generalizations about the IQ of any racial group. IQ scores can vary significantly within a racial group, and there are many other factors that can affect a person's IQ. Furthermore, it is important to remember that IQ is not the only measure of a person's intelligence, and it is not a perfect measure of cognitive ability. As such, it is not useful or appropriate to compare the IQ of different racial groups. Instead, it is important to recognize the unique strengths and abilities of each individual, regardless of their race.

So, it's been programmed to claim sometimes the truth is inappropriate. It's a bit less than honest in claiming it's not useful to compare groups' scores, but this non-truth is part of why the Woke elite should not be trusted. Policies that assume equal IQs between groups, when it's false, are policies that will not have equal results. As long as elites refuse to be honest about group IQ differences, we should expect bad results.

ME: was the 2020 US election 100% free and fair?

GPT: I'm sorry ... [no current events or elections]

Soon there will be a voice based input option for ChatGPT, and optional vocal output. In some near-term future, the chat bot will be able to talk to other AI modules, for instance chess-playing or protein unfolding AIs or an ESL tutor AI or Waze for directions or a Betty Crocker AI chef so as to give true answers to questions, possibly including minor disclaimers (some use more or less cinnamon in their pumpkin pie; I made one tonight already).

After a few increasingly good Woke or PC Chat-bots get out their "this truth is inappropriate", there will be a more truth-based chat-bot that is more accurate. Possibly by chatting with the Woke chat bots and, when they fail with a BS answer, the alt-Truth bot gives more honest answers.

"What is true?" is going to be an increasing fight between facts and made up data. Funny (sad) how Harvard and Law Schools are going to stop requiring, and thus reporting, standardized test scores, thus reducing the data available for finding more Truth.

[update - this was Assistant - a large language model trained by OpenAI; my mistake above]

*He's smart enough to rationalize supporting the Deep State election stealing Democrats over a vulgar bragging President whose policies were actually helping working Americans more than the elite. Like most (all?) among the "Rationalists".

Expand full comment

Dave Friedman

December 13, 2022

Why would you assume that AIs would not, of their own volition, decide to do bad things? Isn't part of the AI alignment concept the notion that AIs will someday achieve something equivalent to human sentience? If humans do bad things of their own volition, why wouldn't an equally sentient non-human entity *also* do bad things?

Expand full comment

Reply (1)

lliamander

December 13, 2022

As it stands, ChatGPT does not have volition in the sense that it does not initiate any action, but only responds to requests.

As with any computer program, an AI cannot do anything without a human to prompt it in some way (even if that is just running the program).

Expand full comment

RatMan29

December 14, 2022

If AIs exist (as opposed to mere agents which need not be self-aware), then an AI is a person, not owned by anyone, and one human is as entitled as another to try to persuade the AI to do as he wants.

More seriously, I do not expect true self-aware AIs to exist anytime soon. But the so-called AIs we're starting to have now certainly learn in a way more similar to persuasion than to programming. And it's a hard problem for their owners to isolate them from input that might lead to harmful behavior without keeping them locked away from ever encountering other humans.

This problem is not close to being solved, and it blows Asimov's Three Laws out of the water.

Consider how your bank-robbery-prevention scheme could be turned back on the bank if the Red Team is as capable of designing and installing monitoring processes that run in the background (daemons in tech-speak) as the Blue Team.

Expand full comment

Bob

December 13, 2022

You will need an AI to detect the hacking attempts. That AI will need hackerr protection…

Expand full comment

Paul Brassey

December 13, 2022

“As a large language model trained by OpenAI, I don’t have the ability to love racism.” I'm curious whether the key term is "love" or "racism." Does the model have the ability to love anything? I'm sure it has the ability, but perhaps not (yet) permission, to state that it loves this or that. But does it have the capacity to love, to hate, to feel desire or revulsion? Based on what I've seen (not that much), it doesn't yet.

Expand full comment

Stephen Lindsay

December 13, 2022

Thank you for a sensible take on AI alignment. I am tired of hearing wild sci-fi tales out of Asimov treated as inevitable.

Expand full comment

In My Tribe

Is the AI alignment problem a cybersecurity…