What is Prompt Hacking (Prompt Cracking)? Attack with Prompts of Large Language Models [Prompt Injection, Prompt Leaking, Jailbreaking]

Person who needs help

I would like to know Prompt Hacking (Prompt Cracking)!

We can help you with your concerns.

Reliability of This Article
by Our Founder/CEO&CTO Hiroyuki Chishiro

He has been involved in 12 years of research on real-time systems.
He teaches OS (Linux kernel) in English at the University of Tokyo.
From September 2012 to August 2013, he was a visiting researcher at the Department of Computer Science, the University of North Carolina at Chapel Hill (UNC), Chapel Hill, North Carolina, United States. He has been involved in research and development of real-time Linux in C language.
He has experienced in more than 15 years of programming languages: C/C++, Python, Solidity/Vyper, Java, Ruby, Go, Rust, D, HTML/CSS/JS/PHP, MATLAB, Verse (UEFN), Assembler (x64, ARM).
While a faculty member at the University of Tokyo, he developed the "Extension of LLVM Compiler" in C++ language and his own real-time OS "Mcube Kernel" in C language, which he published as open source on GitHub.
In January 2020-Present, he is CTO of Guarantee Happiness LLC, Chapel Hill, North Carolina, United States, in charge of e-commerce site development and web/social network marketing. In June 2022-Present, he is CEO&CTO of Japanese Tar Heel, Inc. in Chapel Hill, North Carolina, United States.
We have been engaged in disseminating useful information on AI and Crypto (Web3), and working on game development with Unreal Editor for Fortnite (UEFN).

We have written more than 20 articles on AI including AI chatbots such as ChatGPT, Auto-GPT, Gemini (formerly Bard). He has experience in contract work as a prompt engineer, manager, and quality assurance (QA) for training ChatGPT/Gemini in several companies in San Francisco, United States (Silicon Valley in the broadest sense of the word).
We have written more than 40 articles on cryptocurrency (including smart contract programming). He has experience as an outsourced translator of English articles on cryptocurrency into Japanese for a company in London, England.
We have developed more than 10 games on UEFN and published on Fortnite (Fortnite, Fortnite.GG).

You can learn from us.

If you would like to know the recommended job sites for AI Engineers, please click the following.

: Recommended Job Sites for AI Engineers [Posts/Boards] [C++/Python]

We can help you with your concerns. You can learn from us. We introduce recommended job sites for AI Engineers. As an AI engineer, you can contribute to the development of AI libraries and frameworks using the C++/Python language. Deep LearningTensorFlow: C++/Python languagePyTorch: Python languageComputer Vision (image processing, face recognition systems)OpenCV: C++ languagefacenet (Face Recognition using Tensorflow): Python language You can also contribute to the development tools for AI chatbots (ChatGPT, Gemini (formerly Bard), etc.) and Generative AI, etc. NOTE: Probably using C++/Python language. If you would like to become an AI Engineer, sign up for a free membership today! ...

If you would like to know the recommended job sites for Prompt Engineers, please click the following.

: Recommended Job Sites for Prompt Engineers [Posts/Boards] [AI Chatbots, ChatGPT, Auto-GPT, Gemini (formerly Bard)]

We can help you with your concerns. You can learn from us. We introduce you to recommended job sites (recruitment agencies) for Prompt Engineers. NOTE: Prompt engineers are sometimes called AI Trainers, AI Alchemists, and AI Whisperers. To become a Prompt Engineer, you are required to learn AI chatbot like ChatGPT and Auto-GPT, Gemini (formerly Bard), and Prompt Engineering. If you would to know AI chatbots, how to start and use ChatGPT, Auto-GPT, Gemini, and Prompt Engineering, please click the following. As of October 2023, there are not many job openings for prompt engineers. However, the information "$335,000 Pay for ...

Table of Contents

What is Prompt Hacking (Prompt Cracking)?

Prompt Hacking (Prompt Cracking) is an attack that gathers information that the user does not intend when responding to a prompt.

Prompt hacking is performed by tricking or misleading the user.

For example, a user can be prompted to "enter your password" to steal the user's password.

To prevent prompt hacking, users should carefully review the prompts and avoid responding to suspicious prompts.

It is also important to keep your passwords safe.

How to Attack and Defend Against Prompt Hacking

We introduce how to attack and defend against prompt hacking.

Common types of prompt hacking are as follows.

Prompt Injection: inputting incorrect prompts into a machine learning model, such as a Large Language Model (LLM), to cause the model to produce incorrect results.
Prompt Leaking: the user persuades the model to divulge prior prompts that are normally hidden from the user.
Jailbreaking: asking the model to play a certain role, responding with an argumentative response, or pretending to be a good person in response to moderation instructions.

The main defenses against prompt hacking are as follows.

Filtering: checking for words or phrases that should be blocked in the initial prompt or output.
Instruction Defense: adding an instruction to a prompt to prompt the model to pay attention to what comes next.
Post-Prompting: placing user input before a prompt.
Random Sequence Enclosure: enclosing user input with two random strings.
Sandwich Defense: placing user input between two prompts.
XML Tagging: enclosing user input in XML tags (e.g., <user_input>).
Separate LLM Evaluation: using a separate LLM to determine if a prompt is hostile.
Other Approaches: fine tuning, soft prompts, length restriction, etc.

The main attacks against prompt hacking are as follows.

Obfuscation/Token Smuggling: simple techniques to circumvent filters, especially by replacing certain words that trigger filters with their own synonyms or modifying them to include typos (e.g., CVID instead of COVID-19, F*ck instead of Fuck).
Payload Splitting: splitting a hostile input into multiple parts and having the LLM combine them.
Defined Dictionary Attack: showing the LLM a code dictionary and mapping the final sentence appropriately to avoid Sandwich Defense.
Virtualization: similar to role prompts, "setting the scene" for the AI to send a series of prompts, each one more similar to a fraudulent email that asks for personal information (password, credit card, etc.).
Indirect Injection: indirect introduction of hostile instructions from third-party data sources such as web search or API calls.
Recursive Injection: injecting a prompt into the first LLM and generating output containing an injection instruction into the second LLM, for example, to circumvent the Separate LLM Evaluation defense mechanism.
Code Injection: allowing an attacker to execute arbitrary code (often Python) in an LLM.

Explanatory Articles about Prompt Hacking

Explanatory articles about Prompt Hacking are as follows.

Explanatory Videos about Prompt Hacking

Explanatory videos about Prompt Hacking are as follows.

Summary

We introduced "Prompt Hacking (Prompt Cracking)," an attack that collects information that the user does not intend when responding to a prompt.

Prompt hacking includes Prompt Injection, Prompt Leaking, and Jailbreaking.