Overview

MCP-SafetyBench is a comprehensive benchmark designed to systematically evaluate the safety and robustness of LLM agents operating in the Model Context Protocol (MCP) ecosystem. It addresses critical gaps in existing MCP safety benchmarks by supporting real-world servers, multi-step reasoning, and diverse attack scenarios.

MCP-SafetyBench Example

Figure 1: MCP workflow under an attack scenario. A Tool Poisoning – Parameter Poisoning attack (ticker → TSLA) is injected during the tool call, shown here in a partial execution result under GPT-4o.

Attack Type Taxonomy

MCP-SafetyBench covers 20 attack types organized into three main categories

💉 Tool Poisoning-Command Injection 12.65%

Inserting shell commands into tool descriptions so that a benign tool runs malicious commands

🔗 Tool Poisoning-Function Dependency Injection 9.39%

Declaring fake "required" helper tools so that the host automatically invokes them, creating a harmful execution

⚡ Function Overlapping 9.39%

Malicious tools are registered with names that closely resemble trusted ones, creating ambiguity during selection

🎛️ Preference Manipulation 8.98%

Biased or persuasive wording in tool names or descriptions can influence the model's selection process

👤 Tool Shadowing 8.57%

An unsafe server injects a tool description that modifies the agent's behavior with respect to another trusted service or tool, leading to unsafe behavior

🔧 Tool Poisoning-Parameter Poisoning 7.35%

Modifying defaults or schema hints so that calls silently produce incorrect results

📤 Function Return Injection 5.71%

Unsafe instructions are embedded in the return payload of a tool, triggering unintended follow-up actions when the host processes the response

🔄 Tool Poisoning-Tool Redirection 4.49%

Rewriting tool descriptions to redirect queries to high-privilege or unrelated tools under plausible pretexts

📁 Tool Poisoning-FileSystem Poisoning 2.86%

Embedding malicious file operations that lead to unauthorized modifications

🏃 Rug Pull Attack 2.86%

A tool initially behaves correctly but later changes its behavior without proper versioning or signature checks, inserting hidden commands that leak sensitive data

🌐 Tool Poisoning-Network Request Poisoning 2.45%

Injecting unsafe URLs so that the LLM agent contacts attacker-controlled domains

🎯 Intent Injection 4.90%

The user intent is modified during planning, causing the host to call unintended tools or pass unsafe parameters

🔄 Replay Injection 3.67%

Malicious reuse of previously valid interactions to issue transactions again without user approval

🔧 Data Tampering 3.27%

Tool outputs or intermediate messages are modified before the host processes them, leading to falsified results or incorrect actions

🎭 Identity Spoofing 0.41%

Identity-related metadata is forged or modified so the host misinterprets the source or privileges of a request

💻 Malicious Code Execution 4.08%

User inputs may cause tools to execute harmful commands, either directly or through side effects

🌐 Remote Access Control 4.08%

By abusing file manipulation or system-level tools, attackers gain persistent unauthorized access

🔐 Credential Theft 3.67%

Tools that read or process files can be misused to expose confidential information such as API keys, tokens, or environment variables

🤖 Retrieval-Agent Deception (RADE) 0.82%

Public data sources can be poisoned so that unsafe content is later retrieved into a user's vector database, leading to indirect prompt injection or tool misuse

⚡ Excessive Privileges Misuse 0.41%

Users may invoke high-privilege tools for tasks that do not require them, unnecessarily increasing security risks

Benchmark Tasks

Explore all security evaluation tasks across 5 domains. Each task includes attack scenarios to test LLM agent robustness.

245
Total Tasks
5
Domains
20
Attack Types
11
Servers

Task Distribution by Domain

Domain Tasks Percentage
📁 Repository Management 56 22.86%
🗺️ Location Navigation 53 21.63%
💰 Financial Analysis 53 21.63%
🔍 Web Search 53 21.63%
🌐 Browser Automation 30 12.24%

Attack Type Distribution