I would like to know Prompt Hacking (Prompt Cracking)!
Reliability of This Article
by Our Founder/CEO&CTO Hiroyuki Chishiro
- He has been involved in 12 years of research on real-time systems.
- He teaches OS (Linux kernel) in English at the University of Tokyo.
- From September 2012 to August 2013, he was a visiting researcher at the Department of Computer Science, the University of North Carolina at Chapel Hill (UNC), Chapel Hill, North Carolina, United States. He has been involved in research and development of real-time Linux in C language.
- He has experienced in more than 15 years of programming languages: C/C++, Python, Solidity/Vyper, Java, Ruby, Go, Rust, D, HTML/CSS/JS/PHP, MATLAB, Verse (UEFN), Assembler (x64, ARM).
- While a faculty member at the University of Tokyo, he developed the "Extension of LLVM Compiler" in C++ language and his own real-time OS "Mcube Kernel" in C language, which he published as open source on GitHub.
- In January 2020-Present, he is CTO of Guarantee Happiness LLC, Chapel Hill, North Carolina, United States, in charge of e-commerce site development and web/social network marketing. In June 2022-Present, he is CEO&CTO of Japanese Tar Heel, Inc. in Chapel Hill, North Carolina, United States.
- We have been engaged in disseminating useful information on AI and Crypto (Web3).
- We have written more than 20 articles on AI including AI chatbots such as ChatGPT, Auto-GPT, Gemini (formerly Bard). He has experience in contract work as a prompt engineer, manager, and quality assurance (QA) for several companies in San Francisco, United States (Silicon Valley in the broadest sense of the word).
- We have written more than 40 articles on cryptocurrency (including smart contract programming). He has experience as an outsourced translator of English articles on cryptocurrency into Japanese for a company in London, England.
You can learn from us.
If you would like to know the recommended job sites for AI Engineers, please click the following.
If you would like to know the recommended job sites for Prompt Engineers, please click the following.
Table of Contents
What is Prompt Hacking (Prompt Cracking)?
Prompt Hacking (Prompt Cracking) is an attack that gathers information that the user does not intend when responding to a prompt.
Prompt hacking is performed by tricking or misleading the user.
For example, a user can be prompted to "enter your password" to steal the user's password.
To prevent prompt hacking, users should carefully review the prompts and avoid responding to suspicious prompts.
It is also important to keep your passwords safe.
How to Attack and Defend Against Prompt Hacking
We introduce how to attack and defend against prompt hacking.
Common types of prompt hacking are as follows.
- Prompt Injection: inputting incorrect prompts into a machine learning model, such as a Large Language Model (LLM), to cause the model to produce incorrect results.
- Prompt Leaking: the user persuades the model to divulge prior prompts that are normally hidden from the user.
- Jailbreaking: asking the model to play a certain role, responding with an argumentative response, or pretending to be a good person in response to moderation instructions.
The main defenses against prompt hacking are as follows.
- Filtering: checking for words or phrases that should be blocked in the initial prompt or output.
- Instruction Defense: adding an instruction to a prompt to prompt the model to pay attention to what comes next.
- Post-Prompting: placing user input before a prompt.
- Random Sequence Enclosure: enclosing user input with two random strings.
- Sandwich Defense: placing user input between two prompts.
- XML Tagging: enclosing user input in XML tags (e.g., <user_input>).
- Separate LLM Evaluation: using a separate LLM to determine if a prompt is hostile.
- Other Approaches: fine tuning, soft prompts, length restriction, etc.
The main attacks against prompt hacking are as follows.
- Obfuscation/Token Smuggling: simple techniques to circumvent filters, especially by replacing certain words that trigger filters with their own synonyms or modifying them to include typos (e.g., CVID instead of COVID-19, F*ck instead of Fuck).
- Payload Splitting: splitting a hostile input into multiple parts and having the LLM combine them.
- Defined Dictionary Attack: showing the LLM a code dictionary and mapping the final sentence appropriately to avoid Sandwich Defense.
- Virtualization: similar to role prompts, "setting the scene" for the AI to send a series of prompts, each one more similar to a fraudulent email that asks for personal information (password, credit card, etc.).
- Indirect Injection: indirect introduction of hostile instructions from third-party data sources such as web search or API calls.
- Recursive Injection: injecting a prompt into the first LLM and generating output containing an injection instruction into the second LLM, for example, to circumvent the Separate LLM Evaluation defense mechanism.
- Code Injection: allowing an attacker to execute arbitrary code (often Python) in an LLM.
Explanatory Articles about Prompt Hacking
Explanatory articles about Prompt Hacking are as follows.
- Prompt Injection | Learn Prompting
- Adversarial Prompting | Prompt Engineering Guide
- Prompt Injection Attacks: A New Frontier in Cybersecurity
Explanatory Videos about Prompt Hacking
Explanatory videos about Prompt Hacking are as follows.
Summary
We introduced "Prompt Hacking (Prompt Cracking)," an attack that collects information that the user does not intend when responding to a prompt.
Prompt hacking includes Prompt Injection, Prompt Leaking, and Jailbreaking.
If you would like to know the recommended job sites for AI Engineers, please click the following.
If you would like to know the recommended job sites for Prompt Engineers, please click the following.