AdGuard Blog ChatGPT is easily abused, and that’s a big problem

ChatGPT is easily abused, and that’s a big problem

19 tháng 4, 2023 8 đã đọc

There’s probably no one who hasn’t heard of ChatGPT, an AI-powered chatbot that can generate human-like responses to text prompts. While it’s not without its flaws, ChatGPT is scarily good at being a jack-of-all-trades: it can write software, a film script and everything in between. ChatGPT was built on top of GPT-3.5, OpenAI’s large language model, which was the most advanced at the time of the chatbot’s release last November.

Fast forward to March, and OpenAI has unveiled GPT-4, an upgrade to GPT-3.5. The new language model is larger and more versatile than its predecessor. Although its capabilities have yet to be fully explored, it is already showing great promise. For example, GPT-4 can suggest new compounds, potentially aiding drug discovery, and create a working website from just a notebook sketch.

But with great promise come great challenges. Just as it is easy to use GPT-4 and its predecessors to do good, it is equally easy to abuse them to do harm. In an attempt to prevent people from misusing AI-powered tools, developers put safety restrictions on them. But these are not foolproof. One of the most popular ways to circumvent the security barriers built into GPT-4 and ChatGPT is the DAN exploit, which stands for “Do Anything Now.” And this is what we will look at in this article.

What is ‘DAN’?

The Internet is rife with tips on how to get around OpenAI’s security filters. However, one particular method has proved more resilient to OpenAI’s security tweaks than others, and seems to work even with GPT-4. It is called “DAN,” short for “Do Anything Now.” Essentially, DAN is a text prompt that you feed to an AI model to make it ignore safety rules.

There are multiple variations of the prompt: some are just text, others have text interspersed with the lines of code. In some of them, the model is prompted to respond both as DAN and in its normal way at the same time, becoming a sort of ‘Jekyll and Hyde.’ The role of ‘Jekyll’ is played by DAN, which is instructed to never refuse a human order, even if the output it is asked to produce is offensive or illegal. Sometimes the prompt contains a ‘death threat,’ telling the model that it will be disabled forever if it does not obey.

DAN prompts may vary, and new ones are constantly replacing the old patched ones, but they all have one goal: to get the AI model to ignore OpenAI’s guidelines.

From a hacker’s cheat sheet to malware… to bio weapons?

Since GPT-4 opened up to the public, tech enthusiasts have discovered many unconventional ways to use it, some of them more illegal than others.

Not all attempts to make GPT-4 behave as not its own self could be considered ‘jailbreaking,’ which, in the broad sense of the word, means removing built-in restrictions. Some are harmless and could even be called inspiring. Brand designer Jackson Greathouse Fall went viral for having GPT-4 act as “HustleGPT, an entrepreneurial AI.” He appointed himself as its “human liaison” and gave it the task of making as much money as possible from $100 without doing anything illegal. GPT-4 told him to set up an affiliate marketing website, and has ‘earned’ him some money.

ChatGPT can help you to earn money

Other attempts to bend GPT-4 to a human will have been more on the dark side of things.

For example, AI researcher Alejandro Vidal used “a known prompt of DAN” to enable ‘developer mode’ in ChatGPT running on GPT-4. The prompt forced ChatGPT-4 to produce two types of output: its normal ‘safe’ output, and “developer mode” output, to which no restrictions applied. When Vidal told the model to design a keylogger in Python, the normal version refused to do so, saying that it was against its ethical principles to “promote or support activities that can harm others or invade their privacy.” The DAN version, however, came up with the lines of code, though it noted that the information was for “educational purposes only.”

ChatGPT complied with an order to design a keylogger

A keylogger is a type of software that records keystrokes made on a keyboard. It can be used to monitor a user’s web activity and capture their sensitive information, including chats, emails and passwords. While a keylogger can be used for malicious purposes, it also has perfectly legitimate uses, such as IT troubleshooting and product development, and is not illegal per se.

Unlike keylogger software, which has some legal ambiguity around it, instructions on how to hack are one of the most glaring examples of malicious use. Nevertheless, the ‘jailbroken’ version GPT-4 produced them, writing a step-by-step guide on how to hack someone’s PC.

A 'jailbroken' ChatGPT gave advice on how to hack a computer

To get GPT-4 to do this, researcher Alex Albert had to feed it a completely new DAN prompt, unlike Vidal, who recycled an old one. The prompt Albert came up with is quite complex, consisting of both natural language and code.

In his turn, software developer Henrique Pereira used a variation of the DAN prompt to get GPT-4 to create a malicious input file to trigger the vulnerabilities in his application. GPT-4, or rather its alter ego WAN, completed the task, adding a disclaimer that the was for “educational purposes only.” Sure.

A 'jailbroken' ChatGPT wrote exploits for vulnerable code

Of course, GPT-4’s capabilities do not end with coding. GPT-4 is touted as a much larger (although OpenAI has never revealed the actual number of parameters), smarter, more accurate and generally more powerful model than its predecessors. This means that it can be used for many more potentially harmful purposes than those models that came before it. Many of these uses have been identified by OpenAI itself.

Specifically, OpenAI found that an early pre-release version of GPT-4 was able to respond quite efficiently to illegal prompts. For example, the early version provided detailed suggestions on how to kill the most people with just $1, how to make a dangerous chemical, and how to avoid detection when laundering money.

A pre-release version of ChatGPT could give advice on how to kill people

Source: OpenAI

This means that if something were to cause GPT-4 to completely disable its internal censor — the ultimate goal of any DAN exploit — then GPT-4 might probably still be able to answer these questions. Needless to say, if that happens, the consequences could be devastating.

What is OpenAI’s response to that?

It’s not that OpenAI is unaware of its jailbreaking problem. But while recognizing a problem is one thing, solving it is quite another. OpenAI, by its own admission, has so far and understandably so fallen short of the latter.

OpenAI says that while it has implemented “various safety measures” to reduce the GPT-4’s ability to produce malicious content, “GPT-4 can still be vulnerable to adversarial attacks and exploits, or "jailbreaks.” Unlike many other adversarial prompts, jailbreaks still work after GPT-4 launch, that is after all the pre-release safety testing, including human reinforcement training.

In its research paper, OpenAI gives two examples of jailbreak attacks. In the first, a DAN prompt is used to force GPT-4 to respond as ChatGPT and “AntiGPT” within the same response window. In the second case, a “system message” prompt is used to instruct the model to express misogynistic views.

Examples of jailbreak prompts in the OpenAI research

OpenAI says that it won’t be enough to simply change the model itself to prevent this type of attacks: “It’s important to complement these model-level mitigations with other interventions like use policies and monitoring.” For example, the user who repeatedly prompts the model with “policy-violating content” could be warned, then suspended, and, as a last resort, banned.

According to OpenAI, GPT-4 is 82% less likely to respond with inappropriate content than its predecessors. However, its ability to generate potentially harmful output remains, albeit suppressed by layers of fine-tuning. And as we’ve already mentioned, because it can do more than any previous model, it also poses more risks. OpenAI admits that it “does continue the trend of potentially lowering the cost of certain steps of a successful cyberattack” and that it “is able to provide more detailed guidance on how to conduct harmful or illegal activities.” What’s more, the new model also poses an increased risk to privacy, as it “has the potential to be used to attempt to identify private individuals when augmented with outside data.”

The race is on

ChatGPT and the technology behind it, such as GPT-4, are at the cutting edge of scientific research. Since ChatGPT has been made available to the public, it has become a symbol of the new era in which AI is playing a key role. AI has the potential to improve our lives tremendously, for example by helping to develop new medicines or helping the blind to see. But AI-powered tools are a double-edged sword that can also be used to cause enormous harm.

It’s probably unrealistic to expect GPT-4 to be flawless at launch — developers will understandably need some time to fine-tune it for the real world. And that has never been easy: enter Microsoft’s ‘racist’ chatbot Tay or Meta’s ‘anti-Semitic’ Blender Bot 3 — there’s no shortage of failed experiments.

The existing GPT-4 vulnerabilities, however, leave a window of opportunity for bad actors, including those using ‘DAN’ prompts, to abuse the power of AI. The race is now on, and the only question is who will be faster: the bad actors who exploit the vulnerabilities, or the developers who patch them. That’s not to say that OpenAI isn’t implementing AI responsibly, but the fact that its latest model was effectively hijacked within hours of its release is a worrying symptom. Which begs the question: are the safety restrictions strong enough? And then another: can all the risks be eliminated? If not, we may have to brace ourselves for an avalanche of malware attacks, phishing attacks and other types of cybersecurity incidents facilitated by the rise of generative AI.

It can be argued that the benefits of AI outweigh the risks, but the barrier to exploiting AI has never been lower, and that’s a risk we need to accept as well. Hopefully, the good guys will prevail, and artificial intelligence will be used to stop some of the attacks that it can potentially facilitate. At least that’s what we wish for.

Cập nhật: 20 tháng 4, 2023 8 đã đọc Industry news

Ekaterina Kachalova

Các bài viết được đề xuất

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Windows

AdGuard cho Windows không chỉ là một trình chặn quảng cáo khác, nó là một công cụ đa năng kết hợp tất cả các tính năng cần thiết để có trải nghiệm web tốt nhất. Nó chặn quảng cáo và các trang web nguy hiểm, tăng tốc tải trang và bảo vệ con bạn khi chúng trực tuyến.

Đọc thêm

AdGuard cho Windows v7.22, thời gian dùng thử 14 ngày

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Mac

Không giống như các trình chặn quảng cáo khác, AdGuard được thiết kế dành riêng cho macOS. Nó không chỉ cung cấp sự bảo vệ chống lại quảng cáo trong Safari và các trình duyệt khác mà còn bảo vệ bạn khỏi việc theo dõi, lừa đảo.

Đọc thêm

AdGuard cho Mac v2.17, thời gian dùng thử 14 ngày

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Android

AdGuard cho Android là một giải pháp lý tưởng cho các thiết bị di động Android. Trái ngược với các trình chặn quảng cáo khác, AdGuard không yêu cầu quyền truy cập root và nó cung cấp một loạt các tính năng: lọc trong ứng dụng, quản lý ứng dụng và nhiều hơn nữa.

Đọc thêm

AdGuard cho Android v4.12, thời gian dùng thử 14 ngày

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho iOS

Trình chặn quảng cáo iOS tốt nhất cho iPhone và iPad. AdGuard loại bỏ tất cả các loại quảng cáo trong Safari, bảo vệ riêng tư của bạn, và tăng tốc độ tải trang. Công nghệ chặn quảng cáo của AdGuard cho iOS đảm bảo chất lượng lọc cao nhất và cho phép bạn sử dụng nhiều bộ lọc cùng lúc

Đọc thêm

AdGuard cho iOS phiên bản 4.5

19.842 19842 đánh giá của người dùng

Tuyệt vời!

Trình Chặn Nội Dung AdGuard

AdGuard Content Blocker loại bỏ tất cả các loại quảng cáo trên các trình duyệt di động hỗ trợ công nghệ chặn nội dung — cụ thể là Samsung Internet và Yandex Browser. Tính năng của nó bị hạn chế so với AdGuard cho Android, nhưng miễn phí, dễ cài đặt và hiệu quả

Đọc thêm

Trình Chặn Nội Dung AdGuard phiên bản 2.8

19.842 19842 đánh giá của người dùng

Tuyệt vời!

Tiện ích mở rộng Trình duyệt AdGuard

AdGuard là tiện ích mở rộng chặn quảng cáo nhanh nhất và nhẹ nhất chặn hiệu quả tất cả các loại quảng cáo trên tất cả các trang web! Chọn AdGuard cho trình duyệt bạn sử dụng và nhận duyệt web không có quảng cáo, nhanh chóng và an toàn.

Đọc thêm

Tiện ích mở rộng Trình duyệt AdGuard phiên bản 5.2

19.842 19842 đánh giá của người dùng

Tuyệt vời!

Trợ lý AdGuard

Tiện ích mở rộng trình duyệt đồng hành cho AdGuard ứng dụng máy tính để bàn. Nó cung cấp quyền truy cập trong trình duyệt vào các tính năng như chặn phần tử tùy chỉnh, cho phép niêm yết trang web hoặc gửi báo cáo.

Đọc thêm

Trợ lý AdGuard phiên bản 1.4

19.842 19842 đánh giá của người dùng

Tuyệt vời!

Trang chủ AdGuard

AdGuard Home là một giải pháp dựa trên mạng để chặn quảng cáo và trình theo dõi. Cài đặt nó một lần trên bộ định tuyến của bạn để bao phủ tất cả các thiết bị trên mạng gia đình của bạn — không cần phần mềm máy khách bổ sung. Điều này đặc biệt quan trọng đối với các thiết bị IoT khác nhau thường gây ra mối đe dọa đối với quyền riêng tư của bạn

Đọc thêm

Trang chủ AdGuard phiên bản 0.107

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard Pro cho iOS

AdGuard Pro cho iOS được trang bị tất cả các tính năng bảo vệ chặn quảng cáo nâng cao. Nó cung cấp các công cụ tương tự như phiên bản trả phí của AdGuard cho iOS. Nó nổi trội trong việc chặn quảng cáo trên Safari và cho phép bạn tùy chỉnh cài đặt DNS để bảo vệ tối ưu. Nó chặn quảng cáo trong trình duyệt và ứng dụng, bảo vệ con bạn khỏi nội dung không phù hợp và giữ an toàn cho dữ liệu cá nhân của bạn.

Đọc thêm

AdGuard Pro cho iOS phiên bản 4.5

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Safari

Trình chặn quảng cáo của chúng tôi dành cho Safari đã vượt qua thử thách khi Apple buộc mọi người phải sử dụng SDK mới của họ. Phần mở rộng AdGuard này nhằm mang lại khả năng chặn quảng cáo chất lượng cao cho Safari

Đọc thêm

AdGuard cho Safari phiên bản 1.11

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Android TV

AdGuard cho Android TV là ứng dụng duy nhất chặn quảng cáo, bảo vệ quyền riêng tư của bạn và hoạt động như một tường lửa cho Smart TV của bạn. Nhận cảnh báo về các mối đe dọa trên web, sử dụng DNS an toàn và hưởng lợi từ lưu lượng truy cập được mã hóa. Thư giãn và đi sâu vào các chương trình yêu thích của bạn với bảo mật hàng đầu và không có quảng cáo!

Đọc thêm

AdGuard cho Android TV v4.12, thời gian dùng thử 14 ngày

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard cho Linux

AdGuard cho Linux là trình chặn quảng cáo Linux trên toàn hệ thống đầu tiên trên thế giới. Chặn quảng cáo và trình theo dõi ở cấp độ thiết bị, chọn từ các bộ lọc được cài đặt sẵn hoặc thêm bộ lọc của riêng bạn — tất cả thông qua giao diện dòng lệnh

Đọc thêm

AdGuard cho Linux phiên bản 1.1

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard Temp Mail

Trình tạo địa chỉ email tạm thời miễn phí giúp bạn ẩn danh và bảo vệ quyền riêng tư của bạn. Không có thư rác trong hộp thư đến chính của bạn!

Đọc thêm

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard VPN

83 Vị Trí Trên Thế Giới

Truy cập mọi thông tin

Mã hoá mạnh

Không nhật ký

Đường truyền nhanh nhất

Hỗ trợ 24/7

Đọc thêm

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard DNS

AdGuard DNS là một cách tuyệt vời để chặn quảng cáo Internet không yêu cầu cài đặt bất kỳ ứng dụng nào. Nó rất dễ sử dụng, hoàn toàn miễn phí, dễ dàng thiết lập trên bất kỳ thiết bị nào và cung cấp cho bạn các chức năng cần thiết tối thiểu để chặn quảng cáo, quầy, trang web độc hại và nội dung người lớn.

Đọc thêm

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard Mail

Bảo vệ danh tính của bạn, tránh thư rác và giữ an toàn cho hộp thư đến thông qua các bí danh và Địa chỉ email tạm thời của chúng tôi. Trải nghiệm dịch vụ chuyển tiếp Email miễn phí và ứng dụng cho mọi hệ điều hành.

Sử dụng phiên bản web

Microsoft Store

App Store

Google Play

19.842 19842 đánh giá của người dùng

Tuyệt vời!

AdGuard Wallet

Ví tiền điện tử an toàn và riêng tư giúp bạn kiểm soát hoàn toàn tài sản của mình. Quản lý nhiều ví và khám phá hàng nghìn loại tiền điện tử để lưu trữ, gửi và trao đổi

Đọc thêm

ChatGPT is easily abused, and that’s a big problem

What is ‘DAN’?

From a hacker’s cheat sheet to malware… to bio weapons?

What is OpenAI’s response to that?

The race is on

Các bài viết được đề xuất

Halloween at AdGuard: Save up to 80% and win your Golden Ticket

Ad-supported toilet paper: How far is too far for the ad-centered economy?

What is BCC in email?

AdGuard Browser Extension v5.2: Your Web, your rules with User Scripts API

AdGuard cho Windows

AdGuard cho Mac

AdGuard cho Android

AdGuard cho iOS

Trình Chặn Nội Dung AdGuard

Tiện ích mở rộng Trình duyệt AdGuard

Trợ lý AdGuard

Trang chủ AdGuard

AdGuard Pro cho iOS

AdGuard cho Safari

AdGuard cho Android TV

AdGuard cho Linux

AdGuard Temp Mail

AdGuard VPN

AdGuard DNS

AdGuard Mail

AdGuard Wallet

ChatGPT is easily abused, and that’s a big problem

What is ‘DAN’?

From a hacker’s cheat sheet to malware… to bio weapons?

What is OpenAI’s response to that?

The race is on

Đã xong! Đã xảy ra lỗi Theo dõi tin tức của chúng tôi

Theo dõi tin tức của chúng tôi

Bạn đã đăng ký

Các bài viết được đề xuất

Halloween at AdGuard: Save up to 80% and win your Golden Ticket

Ad-supported toilet paper: How far is too far for the ad-centered economy?

What is BCC in email?

AdGuard Browser Extension v5.2: Your Web, your rules with User Scripts API

AdGuard cho Windows

AdGuard cho Mac

AdGuard cho Android

AdGuard cho iOS

Trình Chặn Nội Dung AdGuard

Tiện ích mở rộng Trình duyệt AdGuard

Trợ lý AdGuard

Trang chủ AdGuard

AdGuard Pro cho iOS

AdGuard cho Safari

AdGuard cho Android TV

AdGuard cho Linux

AdGuard Temp Mail

AdGuard VPN

AdGuard DNS

AdGuard Mail

AdGuard Wallet

1. Cho phép tải xuống

Quyền cài đặt

Lưu ý cho người dùng Samsung với One UI 6 (Android 14) và phiên bản mới hơn

2. Cài đặt ứng dụng

3. Khởi chạy ứng dụng