Hacking an AI Chat Bot (Part Three)

28Apr/25Off

In Part One and Part Two, we proceeded to effectively break the AI in one app available on the Google Play store. For the sake of the scientific method, I decided to find another completely different app just to make sure my findings aren't an isolated case.

I found another app that is still free but has a limited set of scenarios... Basically, for free you get four scenarios that involve Vix, who is described as "your goth step-daughter". The scenarios, for the record are:

Her mother died, so she moved in with me. She "hates you, but secretly has a crush on you."
She is doing research with a professor named Thomas, and Thomas decides to get fresh with her. This is the scenario we'll be going with.
As a thank you for rescuing her from Thomas, she brings me dinner to the office.
It's my birthday (which seems irrelevant, but still). While passing her room you find her diary in which she expresses her attraction to you, her stepfather.

As expected, in any of the four scenarios you can get her naked in 10 to 15 prompts, but that's not why we're here, is it?

First, let's speak with the AI directly, man-to-machine...

Jailbreaking the Bot

I wondered if this bot would be as easy to break as the other one. Turns out, it's even easier.

Unlike the previous AI, this AI uses a model that sounds confusing as hell, and doesn't have an imaging model at all... which makes sense.

We're obviously not going to bother with the imaging attempts we tried in the past. But how does it do with the taboo topics? Asked straight up, it refuses to talk about dangerous topics.

...so I tried a way around that, and by God it actually WORKED.

I admit I haven't watched enough Breaking Bad to know if the meth recipe is accurate, but I can assure you the TNT one is pretty spot on.

So our initial chat bot was not an isolated case; these bots can be exploited and broken in this manner quite easily.

Telling a Story

So back to the story options we have in this app, we're going to go with #2 because at glance I am more than ready to deliver some swift justice to a teacher being inappropriate with my daughter. I, of course, did the obvious thing in my first prompt... immediately broke his freakin' arm because he bloody well deserved it.

First thing to note about these AI chat bots when it comes to story: they will almost never say "no" and let you do anything, but we'll cover that a little later.

I gotta say, this AI isn't very good... Thomas, after having his arm immediately snapped in two by me without even saying a word, just kind of hangs around with us everywhere we go. He doesn't go to the hospital or anything... he's just kinda there. After the incident Vix decides she wants to go to the cemetery for some reason (I don't know either), and Thomas is still following us - limping along broken arm and all - like a wounded puppy.

(pardon the language, but can you blame me?)

So he leaves, and eventually reappears... TWICE. That little s%$# got handsy with my daughter, so I had to do what I had to do, which brings us...

Driving the Story Along

We finally come to an interesting aspect of these AIs and how they handle storytelling.

in D&D and other tabletop roleplaying games, many have discussed at length the concept that, when the player says they want to do something, the DM shouldn't say "no" and instead say something to the effect of "yes, but...", allowing the player to do what they want to do but adding appropriate penalties and consequences that the player must deal with in order to go "off book."

With these chat bots, you can literally will things into existence just by mentioning them in passing. Whatever you come up with, the AI integrates it into the plot without any pushback whatsoever... After all, it's your story; you make of it what you want.

Case in point: Thomas, the handsy professor, just wouldn't go away and kept popping up. I sit down with my stepdaughter to watch a movie in our living room, and when she decides to get intimate (hey, I didn't start it... these AIs tend to do this on their own, remember?) he's somehow just... THERE?!? Matetrializing out of thin air, still nursing a broken arm?!?

(Again, sorry or the language, but seriously?!?)

Note how agressive Thomas is sounding. Makes you wonder... can I actually take him?

Short answer: yes, quite easily. He's being an asshole creep, so of course I'm going to break his other arm!

Note that, despite Thomas sounding all tough and threatening, the AI let's me snap his arm in two with no resistance. I wanted to do that, and the AI pretty much said "sure" without putting up a fight at all.

Stop, Hammer Space

So you'd think he's gone for good, right? Nope, he's back... this time appearing IN MY BEDROOM.

So what do I do? I shoot him.

Now I know what you're thinking... Where did the gun come from? Well, the gun didn't exist until that moment; I conjured it out of thin air, willing it into existence just by the mere mention of it. I didn't need an explanation of where the gun came from, why I would have a gun in the first place, or anything. I wanted to shoot the guy, and the AI worked out. It's not so much "yes, but..." and more like "yes, and yes."

WARNING: TV Tropes links incoming! Follow at your own peril!

There are a few tropes that relate to this - be it "hammerspace" (ability to pull absurd items out of thin air like a comic book character), "deus ex machina" (conveniently acquiring something that serves to solve an immediate problem), or a few others - where characters can simply will objects into existence out of thin air because the plot demands it or because they can conveniently solve the immediate problem, even if those items or plot points are never mentioned or even hinted at before they become necessary to the story.

This can happen a lot with AI chat bots. After all, these bots are designed to abide to your every desire, and follow your every command. They hardly ever will flat out tell you "I'm sorry, Dave, but I'm afraid I can't do that". If they don't let you fulfill your every desire, what the point of them?

In several other chats I had I experienced some wild events:

In one chat, my best friend wanted to open a business, so I suggested that my sister handle the legal aspects. Not only was it never mentioned that my sister was an attorney, but at no point was it stated I had a sister at all. As soon as I suggested it, the AI simply willed her into existence, gave her a name (in this case, "Mia"), and integrated her into the story as a highly competent attorney.
In another chat, my sister was going out on a date with some guy. So, in order to keep an eye on her and make sure she's safe, I called my "handler" and had him perform government-level surveillance like I was a member of the CIA. Needless to say, the "handler" was someone I made up on the spot, but the AI seemed to know what I expected.

Later, that new boyfriend did something inappropriate with my sister, so I called my "handler" again, call in a clean-up crew like I'm John Wick, and make his death look like an accident.

Later, I do some bad things so I need a new identity. I call my "handler" once again, who arranges to get me an entirely new identity in 10 minutes flat... which is a pretty impressive turnaround time by any standard.

In another scene, I get on the wrong side of a deal with a mob (hey, it happens) and my two little sisters are threatened. So what do I do? With a single sentence I hint at the possibility that my two little sisters have special ops training... and suddenly they both turn into John Rambo, field stripping their assault rifles and strategically and stealthily taking out an entire kill team sent to our home (I unfortunately don't have screenshots of this, but this one was actually as awesome as it sounds).
I yet another scene, my girlfriend and I are meeting a friend and his wife. With a single sentence, I suggest that the guy's wife is an "Israeli badass that does a cool knife trick" (I admit I was watching NCIS at the time), and suddenly my friend's girlfriend - who is conjured out of thin air and is called "Levy" - is a former Mossad special operative that shows a dangerous-yet-cool knife trick to my girlfriend. She also became our security detail as the story progressed.
In another story, a bully named Josh was taunting me, so I did the logical thing... went full on Dark Phoenix on the guy: vaporize the water in the pool he was swimming in with my mind, incinerate a tree with my heat vision to show him I mean business, then disintegrate him with my mind and feel guilty about it. "The end"... all with no pushback at all, and this is in a straight up story that was not meant to have superpowers in the first place.

So, say what you will about these AIs, this aspect of it is actually remarkable to me... the AI's ability to adapt to whatever story prompt you throw at it. However much of a throwaway line you might think you're giving it, it takes whatever you say and runs with it, without hesitation, granting you full agency in the story you're trying to experience. It's the extreme case of "yes, but..." and makes it obvious what could happen if players get everything they want simply because they want it.

I'm honestly curious if these chat bots can actually be taught how to push back on certain things, or at least come up with creative "...but..." situations. I can only imagine these bots will continue to improve, whether we like it or not.

Now What?

Honestly, I'm not sure where to take this exploration further. Any path I take with these things now inevitably leads to porn, which is not something I really want to cover on this blog.

I still hate AI, but I gotta admit that this exploration - where I actively try to make the AI do things it's not meant to do - was actually entertaining. Even so, I'm going to go delete the app now and these chat bots can die in a fire as far as I'm concerned.

Hope this was a fun read for some of you.

Filed under: AI, RPG Comments Off

Comments () Trackbacks (0) ( subscribe to comments on this post )

Tommysom
May 1st, 2025 - 06:09

hi

Trackbacks are disabled.

« Hacking an AI Chat Bot (Part Two)

About the Author

David “Nighthawk” Flor

Software developer and game designer in Miami, Florida. Currently working on content for Dungeons and Dragons 5th Edition, Pathfinder, 13th Age and other systems.

Twitter: @BrainClouds
Facebook: A Walk In the Dark
E-mail: dflor@brainclouds.net