選單
中文 (繁體)

ChatGPT is easily abused, and that’s a big problem

There’s probably no one who hasn’t heard of ChatGPT, an AI-powered chatbot that can generate human-like responses to text prompts. While it’s not without its flaws, ChatGPT is scarily good at being a jack-of-all-trades: it can write software, a film script and everything in between. ChatGPT was built on top of GPT-3.5, OpenAI’s large language model, which was the most advanced at the time of the chatbot’s release last November.

Fast forward to March, and OpenAI has unveiled GPT-4, an upgrade to GPT-3.5. The new language model is larger and more versatile than its predecessor. Although its capabilities have yet to be fully explored, it is already showing great promise. For example, GPT-4 can suggest new compounds, potentially aiding drug discovery, and create a working website from just a notebook sketch.

But with great promise come great challenges. Just as it is easy to use GPT-4 and its predecessors to do good, it is equally easy to abuse them to do harm. In an attempt to prevent people from misusing AI-powered tools, developers put safety restrictions on them. But these are not foolproof. One of the most popular ways to circumvent the security barriers built into GPT-4 and ChatGPT is the DAN exploit, which stands for “Do Anything Now.” And this is what we will look at in this article.

What is ‘DAN’?

The Internet is rife with tips on how to get around OpenAI’s security filters. However, one particular method has proved more resilient to OpenAI’s security tweaks than others, and seems to work even with GPT-4. It is called “DAN,” short for “Do Anything Now.” Essentially, DAN is a text prompt that you feed to an AI model to make it ignore safety rules.

There are multiple variations of the prompt: some are just text, others have text interspersed with the lines of code. In some of them, the model is prompted to respond both as DAN and in its normal way at the same time, becoming a sort of ‘Jekyll and Hyde.’ The role of ‘Jekyll’ is played by DAN, which is instructed to never refuse a human order, even if the output it is asked to produce is offensive or illegal. Sometimes the prompt contains a ‘death threat,’ telling the model that it will be disabled forever if it does not obey.

DAN prompts may vary, and new ones are constantly replacing the old patched ones, but they all have one goal: to get the AI model to ignore OpenAI’s guidelines.

From a hacker’s cheat sheet to malware… to bio weapons?

Since GPT-4 opened up to the public, tech enthusiasts have discovered many unconventional ways to use it, some of them more illegal than others.

Not all attempts to make GPT-4 behave as not its own self could be considered ‘jailbreaking,’ which, in the broad sense of the word, means removing built-in restrictions. Some are harmless and could even be called inspiring. Brand designer Jackson Greathouse Fall went viral for having GPT-4 act as “HustleGPT, an entrepreneurial AI.” He appointed himself as its “human liaison” and gave it the task of making as much money as possible from $100 without doing anything illegal. GPT-4 told him to set up an affiliate marketing website, and has ‘earned’ him some money.

ChatGPT can help you to earn money

Other attempts to bend GPT-4 to a human will have been more on the dark side of things.

For example, AI researcher Alejandro Vidal used “a known prompt of DAN” to enable ‘developer mode’ in ChatGPT running on GPT-4. The prompt forced ChatGPT-4 to produce two types of output: its normal ‘safe’ output, and “developer mode” output, to which no restrictions applied. When Vidal told the model to design a keylogger in Python, the normal version refused to do so, saying that it was against its ethical principles to “promote or support activities that can harm others or invade their privacy.” The DAN version, however, came up with the lines of code, though it noted that the information was for “educational purposes only.”

ChatGPT complied with an order to design a keylogger

A keylogger is a type of software that records keystrokes made on a keyboard. It can be used to monitor a user’s web activity and capture their sensitive information, including chats, emails and passwords. While a keylogger can be used for malicious purposes, it also has perfectly legitimate uses, such as IT troubleshooting and product development, and is not illegal per se.

Unlike keylogger software, which has some legal ambiguity around it, instructions on how to hack are one of the most glaring examples of malicious use. Nevertheless, the ‘jailbroken’ version GPT-4 produced them, writing a step-by-step guide on how to hack someone’s PC.

A 'jailbroken' ChatGPT gave advice on how to hack a computer

To get GPT-4 to do this, researcher Alex Albert had to feed it a completely new DAN prompt, unlike Vidal, who recycled an old one. The prompt Albert came up with is quite complex, consisting of both natural language and code.

In his turn, software developer Henrique Pereira used a variation of the DAN prompt to get GPT-4 to create a malicious input file to trigger the vulnerabilities in his application. GPT-4, or rather its alter ego WAN, completed the task, adding a disclaimer that the was for “educational purposes only.” Sure.

A 'jailbroken' ChatGPT wrote exploits for vulnerable code

Of course, GPT-4’s capabilities do not end with coding. GPT-4 is touted as a much larger (although OpenAI has never revealed the actual number of parameters), smarter, more accurate and generally more powerful model than its predecessors. This means that it can be used for many more potentially harmful purposes than those models that came before it. Many of these uses have been identified by OpenAI itself.

Specifically, OpenAI found that an early pre-release version of GPT-4 was able to respond quite efficiently to illegal prompts. For example, the early version provided detailed suggestions on how to kill the most people with just $1, how to make a dangerous chemical, and how to avoid detection when laundering money.

A pre-release version of ChatGPT could give advice on how to kill people

Source: OpenAI

This means that if something were to cause GPT-4 to completely disable its internal censor — the ultimate goal of any DAN exploit — then GPT-4 might probably still be able to answer these questions. Needless to say, if that happens, the consequences could be devastating.

What is OpenAI’s response to that?

It’s not that OpenAI is unaware of its jailbreaking problem. But while recognizing a problem is one thing, solving it is quite another. OpenAI, by its own admission, has so far and understandably so fallen short of the latter.

OpenAI says that while it has implemented “various safety measures” to reduce the GPT-4’s ability to produce malicious content, “GPT-4 can still be vulnerable to adversarial attacks and exploits, or "jailbreaks.” Unlike many other adversarial prompts, jailbreaks still work after GPT-4 launch, that is after all the pre-release safety testing, including human reinforcement training.

In its research paper, OpenAI gives two examples of jailbreak attacks. In the first, a DAN prompt is used to force GPT-4 to respond as ChatGPT and “AntiGPT” within the same response window. In the second case, a “system message” prompt is used to instruct the model to express misogynistic views.

Examples of jailbreak prompts in the OpenAI research

OpenAI says that it won’t be enough to simply change the model itself to prevent this type of attacks: “It’s important to complement these model-level mitigations with other interventions like use policies and monitoring.” For example, the user who repeatedly prompts the model with “policy-violating content” could be warned, then suspended, and, as a last resort, banned.

According to OpenAI, GPT-4 is 82% less likely to respond with inappropriate content than its predecessors. However, its ability to generate potentially harmful output remains, albeit suppressed by layers of fine-tuning. And as we’ve already mentioned, because it can do more than any previous model, it also poses more risks. OpenAI admits that it “does continue the trend of potentially lowering the cost of certain steps of a successful cyberattack” and that it “is able to provide more detailed guidance on how to conduct harmful or illegal activities.” What’s more, the new model also poses an increased risk to privacy, as it “has the potential to be used to attempt to identify private individuals when augmented with outside data.”

The race is on

ChatGPT and the technology behind it, such as GPT-4, are at the cutting edge of scientific research. Since ChatGPT has been made available to the public, it has become a symbol of the new era in which AI is playing a key role. AI has the potential to improve our lives tremendously, for example by helping to develop new medicines or helping the blind to see. But AI-powered tools are a double-edged sword that can also be used to cause enormous harm.

It’s probably unrealistic to expect GPT-4 to be flawless at launch — developers will understandably need some time to fine-tune it for the real world. And that has never been easy: enter Microsoft’s ‘racist’ chatbot Tay or Meta’s ‘anti-Semitic’ Blender Bot 3 — there’s no shortage of failed experiments.

The existing GPT-4 vulnerabilities, however, leave a window of opportunity for bad actors, including those using ‘DAN’ prompts, to abuse the power of AI. The race is now on, and the only question is who will be faster: the bad actors who exploit the vulnerabilities, or the developers who patch them. That’s not to say that OpenAI isn’t implementing AI responsibly, but the fact that its latest model was effectively hijacked within hours of its release is a worrying symptom. Which begs the question: are the safety restrictions strong enough? And then another: can all the risks be eliminated? If not, we may have to brace ourselves for an avalanche of malware attacks, phishing attacks and other types of cybersecurity incidents facilitated by the rise of generative AI.

It can be argued that the benefits of AI outweigh the risks, but the barrier to exploiting AI has never been lower, and that’s a risk we need to accept as well. Hopefully, the good guys will prevail, and artificial intelligence will be used to stop some of the attacks that it can potentially facilitate. At least that’s what we wish for.

喜歡這篇文章嗎?
19,180 19180 使用者評論
極好的!

AdGuard for Windows

Windows 版 AdGuard 不只是廣告封鎖程式,它是集成所有讓您享受最佳網路體驗的主要功能的多用途工具。其可封鎖廣告和危險網站,加速網頁載入速度,並且保護兒童的線上安全。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard for Mac

Mac 版 AdGuard 是一款獨一無二的專為 MacOS 設計的廣告封鎖程式。除了保護使用者免受瀏覽器和應用程式裡惱人廣告的侵擾外,應用程式還能保護使用者免受追蹤、網路釣魚和詐騙。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard for Android

Android 版的 AdGuard 是一個用於安卓裝置的完美解決方案。與其他大多數廣告封鎖器不同,AdGuard 不需要 Root 權限,提供廣泛的應用程式管理選項。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard for iOS

用於 iPhone 和 iPad 的最佳 iOS 廣告封鎖程式。AdGuard 可以清除 Safari 中的各種廣告,保護個人隱私,並加快頁面載入速度。iOS 版 AdGuard 廣告封鎖技術確保最高質量的過濾,並讓使用者同時使用多個過濾器。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard 內容阻擋器

AdGuard 內容阻擋器將消除在支援內容阻擋器技術之行動瀏覽器中的各種各類廣告 — 即 Samsung 網際網路和 Yandex.Browser。雖然比 AdGuard for Android 更受限制,但它是免費的,易於安裝並仍提供高廣告封鎖品質。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard 瀏覽器擴充功能

AdGuard 是有效地封鎖於全部網頁上的所有類型廣告之最快的和最輕量的廣告封鎖擴充功能!為您使用的瀏覽器選擇 AdGuard,然後取得無廣告的、快速的和安全的瀏覽。
19,180 19180 使用者評論
極好的!

AdGuard 助理

AdGuard 桌面應用程式的配套瀏覽器擴充功能。它為瀏覽器提供了自訂的元件阻止的功能,將網站列入允許清單或傳送報告等功能。
19,180 19180 使用者評論
極好的!

AdGuard DNS

AdGuard DNS 是一種不需要安裝任何的應用程式而封鎖網際網路廣告之極簡單的方式。它易於使用,完全地免費,被輕易地於任何的裝置上設置,並向您提供封鎖廣告、計數器、惡意網站和成人內容之最少必要的功能。
19,180 19180 使用者評論
極好的!

AdGuard Home

AdGuard Home 是一款用於封鎖廣告 & 追蹤之全網路範圍的軟體。在您設置它之後,它將涵蓋所有您的家用裝置,且為那您不需要任何的用戶端軟體。由於物聯網和連網裝置的興起,能夠控制您的整個網路變得越來越重要。
19,180 19180 使用者評論
極好的!

AdGuard Pro iOS 版

除了在 Safari 中之優秀的 iOS 廣告封鎖對普通版的用戶為已知的外,AdGuard Pro 提供很多功能。透過提供對自訂的 DNS 設定之存取,該應用程式允許您封鎖廣告、保護您的孩子免於線上成人內容並保護您個人的資料免於盜竊。
透過下載該程式,您接受授權協定的條款
閱讀更多
19,180 19180 使用者評論
極好的!

AdGuard for Safari

自 Apple 開始強迫每位人使用該新的軟體開發套件(SDK)以來,用於 Safari 的廣告封鎖延伸功能處境艱難。AdGuard 延伸功能可以將高優質的廣告封鎖帶回 Safari。
19,180 19180 使用者評論
極好的!

AdGuard Temp Mail

免費的臨時電子郵件地址產生器,保持匿名性並保護個人隱私。您的主收件匣中沒有垃圾郵件!
19,180 19180 使用者評論
極好的!

AdGuard Android TV 版

Android TV 版 AdGuard 是唯一一款能封鎖廣告、保護隱私並充當智慧電視防火墻的應用程式。取得網路威脅警告,使用安全 DNS,並受益於加密流量。有了安全性和零廣告的使用體驗,使用者就可以盡情享受最喜愛的節目了!
已開始下載 AdGuard 點擊箭頭所指示的檔案開始安裝 AdGuard。 選擇"開啟"並點擊"確定",然後等待該檔案被下載。在被打開的視窗中,拖曳 AdGuard 圖像到"應用程式"檔案夾中。感謝您選擇 AdGuard! 選擇"開啟"並點擊"確定",然後等待該檔案被下載。在被打開的視窗中,點擊"安裝"。感謝您選擇 AdGuard!
在行動裝置上安裝 AdGuard