[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

418 views

[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

In: 0

3 Answers

Anonymous 0 Comments

Simple.

Chatgpt has no clue what it’s telling you.

Chatgpt just throws words together based on data it got fed when it was being made.

This makes it to where it can generate content completely disallowed by its guidelines. However you need to figure out how to ask for this content. Because the creators did put some stops on it, so all you need to do is get past those stops

Anonymous 0 Comments

The people who made ChatGPT haven’t put in a hard filter to try and stop absolutely any and all “bad” content. To a certain extent, they know that it’s not totally possible to do that without harming the efficacy of the product.

Also, in order to make a more effective content filter, you’d essentially need to run at least two models side by side, one which would initially fill the request, and one which would parse the output and flag it as bad and either force another generation or tell the user their request is denied.

You’d be more or less doubling the resources needed to have that.

And really, there’s an element of discretion that needs to be done, because, how do you know what content is bad or not? If the parameters are too broad, you’ll falsely flag perfectly valid usage, and if the parameters are too specific, then you’re just wasting resources on easily bypassed censorship.

To an extent, what people are saying is “I want a knife that can only cut fruit.”
Or “I want a tool which can make anything except weapons.”
That’s not how tools work.

Anonymous 0 Comments

the filters just aren’t very effective. open ai does not know, at a deep level, how chat gpt works. they know the setup they built, they know the outputs, but they let the thing wire itself up and they don’t understand that wiring.

the filtering can’t be any more efficient than automated comment filtering. they’ve got some internal access to help some, but there’s always going to be ways around it.

but it’s not human, not at all. i describe it like a panchinko machine. that’s where you put a marble in the top, it bounces around inside until it hits one of the boxes at the bottom.

you write your prompt on the marble, put it in top and it bounces around and gives a response. you write the machine’s response and your response on the marble and put it back in the top.

unlike a human, the machine is not changed by the conversation it’s having. and the marble is only so big, so once there’s no room left on it to write your next prompt, you have to remove previous parts of the conversation to write your next prompt on it.

this model is how it has millions of conversations at the same time, tho. each conversation is a separate marble and because it’s a computer, they don’t interact with each other.

open ai understand they built a panchinko machine, they just don’t understand how the things that bounce the marble inside make the outputs they do. so they can’t prevent saying things they wish it wouldn’t.