[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

420 views

[eli5] How is it possible for ChatGPT to be fooled by DAN prompts and other character based prompts in order to create content that does not abide by OpenAI’s policies?It seems very human like behavior.

In: 0

3 Answers

Anonymous 0 Comments

The people who made ChatGPT haven’t put in a hard filter to try and stop absolutely any and all “bad” content. To a certain extent, they know that it’s not totally possible to do that without harming the efficacy of the product.

Also, in order to make a more effective content filter, you’d essentially need to run at least two models side by side, one which would initially fill the request, and one which would parse the output and flag it as bad and either force another generation or tell the user their request is denied.

You’d be more or less doubling the resources needed to have that.

And really, there’s an element of discretion that needs to be done, because, how do you know what content is bad or not? If the parameters are too broad, you’ll falsely flag perfectly valid usage, and if the parameters are too specific, then you’re just wasting resources on easily bypassed censorship.

To an extent, what people are saying is “I want a knife that can only cut fruit.”
Or “I want a tool which can make anything except weapons.”
That’s not how tools work.

You are viewing 1 out of 3 answers, click here to view all answers.