University of Pennsylvania researchers developed an algorithm that can jailbreak robots controlled by a large language model (LLM). The RoboPAIR algorithm uses an attacker LLM to provide prompts to a target LLM, adjusting the commands until they bypass the safety filters. It also employs a judge LLM to ensure the attacker LLM produces prompts that consider the target LLM's physical limitations, such as certain obstacles in the environment.
One finding the scientists found concerning was how jailbroken LLMs often went beyond complying with malicious prompts by actively offering suggestions. The researchers stressed that prior to the public release of their work, they shared their findings with the manufacturers of the robots they studied, as well as leading AI companies. They also noted they are not suggesting that researchers stop using LLMs for robotics.
More information: