[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

426 views

[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

In: 0

3 Answers

Anonymous 0 Comments

the filters just aren’t very effective. open ai does not know, at a deep level, how chat gpt works. they know the setup they built, they know the outputs, but they let the thing wire itself up and they don’t understand that wiring.

the filtering can’t be any more efficient than automated comment filtering. they’ve got some internal access to help some, but there’s always going to be ways around it.

but it’s not human, not at all. i describe it like a panchinko machine. that’s where you put a marble in the top, it bounces around inside until it hits one of the boxes at the bottom.

you write your prompt on the marble, put it in top and it bounces around and gives a response. you write the machine’s response and your response on the marble and put it back in the top.

unlike a human, the machine is not changed by the conversation it’s having. and the marble is only so big, so once there’s no room left on it to write your next prompt, you have to remove previous parts of the conversation to write your next prompt on it.

this model is how it has millions of conversations at the same time, tho. each conversation is a separate marble and because it’s a computer, they don’t interact with each other.

open ai understand they built a panchinko machine, they just don’t understand how the things that bounce the marble inside make the outputs they do. so they can’t prevent saying things they wish it wouldn’t.

You are viewing 1 out of 3 answers, click here to view all answers.