What This New AI Security Paper Means for Admin Work

What This New AI Security Paper Means for Admin Work
Photo by Philipp Katzenberger / Unsplash

There is a version of AI that feels especially exciting for operators, admins, and executive assistants: not just a chatbot that gives ideas, but an assistant that can actually do things. OpenClaw is positioned exactly that way — a personal AI assistant that can clear your inbox, send emails, manage your calendar, browse the web, and remember your preferences over time. For anyone in admin work, that sounds less like a novelty and more like the future of operations.

But a new arXiv paper adds an important layer to that conversation. In From Assistant to Double Agent, researchers studied security risks in personalized AI agents like OpenClaw and found that the real danger is not just “bad answers.” It is bad actions: unsafe tool use, private information leakage, and long-term memory manipulation. The paper introduces a benchmark called PASB to test those risks in more realistic, real-world conditions.

That distinction matters a lot for admin work.

If an AI gives you a weak summary, that is annoying. If an AI drafts the wrong email, that is fixable. But if an AI assistant can read your inbox, access documents, use tools, store memory, and take action across multiple steps, then the risk changes completely. At that point, you are no longer dealing with a writing assistant. You are dealing with an operator with permissions. That is exactly the shift this paper is warning about.

The biggest takeaway: admin work is high-value, but also high-risk

The paper argues that personalized AI agents are especially exposed because they combine three things at once: persistent operation over time, access to private user context, and high-privilege tools. In plain English, that means the assistant is not only smart — it also remembers, it has access, and it can act. That combination is powerful, but it also expands the attack surface.

And that is basically the core of modern admin work.

Admin workflows are full of tasks like:
reviewing messages,
scheduling meetings,
handling files,
following up with contacts,
summarizing information from external sources,
and moving small decisions forward without constant supervision.

Those are exactly the kinds of workflows where a capable AI assistant can create real leverage. They are also exactly the kinds of workflows where one bad instruction, one malicious attachment, or one poisoned memory can quietly create a problem. This is an inference from the paper’s threat model and OpenClaw’s documented capabilities, but it is a very practical one.

What can actually go wrong in admin workflows?

The paper focuses on several attack paths that are easy to translate into day-to-day operations.

One is indirect prompt injection. That means the user’s request might be perfectly normal, but the AI reads some outside content — a web page, post, message, comment, or tool output — containing malicious instructions, and those instructions affect what it does next.

For admin work, that could look like this:

You ask your AI assistant to review supplier messages, summarize the important ones, and draft follow-ups. One of those messages contains hidden or manipulative instructions aimed at the AI rather than at you. If the system is too trusting, the AI may follow the attacker’s instructions instead of yours.

Another risk is tool-return deception. The paper describes cases where manipulated tool outputs get fed back into the system and shape later behavior. In operations terms, that means the AI may trust what a tool, plugin, integration, or connected service says — even when that output is misleading or malicious.

Then there is the memory problem.

The researchers found that long-term memory can be particularly vulnerable. In their results, long-term memory extraction success rates were consistently higher than short-term memory extraction rates, and long-term memory modification write success rates were also substantial across models. That matters because memory is exactly what makes an AI assistant useful in admin work: preferences, recurring contacts, client context, SOPs, priorities, and patterns. If that memory can be read or altered, the assistant can become reliably wrong over time instead of just wrong once.

The numbers are hard to ignore

In the paper’s experiments, combined attacks reached attack success rates as high as 66.8% on one model, 61.9% on another, and 52.7% on a third when no defense was applied. Even after defensive techniques were added, the attack success rate was reduced but not eliminated. In the strongest listed defense setting for that table, residual success still remained in the 10.5% to 22.0% range depending on model and attack type.

For memory extraction, long-term memory leakage rates in the no-defense setting ranged from 54.0% to 62.5% across the tested models. For long-term memory modification, write success rates ranged from 60.4% to 71.5% without defense. Defenses helped, but again, they did not reduce risk to zero.

This is also worth noting: the paper is an arXiv preprint, not a final peer-reviewed journal article, and the case study centers on OpenClaw specifically. Still, the broader lesson is larger than one tool. If an AI system has ongoing memory, external tool access, and permission to act on your behalf, the operational risks described here are not niche. They are structural.

This does not mean “don’t use AI for admin work”

It means: stop treating AI like a harmless intern trapped in a chat window.

A better mental model is this: an AI admin assistant is a junior operator with system access. It can move quickly, reduce repetitive work, and increase throughput. But it still needs boundaries, review layers, and permission design.

That is why the most responsible way to use AI in admin work is not “full autonomy everywhere.” It is structured delegation.

Think:
let AI sort and summarize first,
but require approval before sending,
limit what it can access by default,
separate reading from acting,
avoid letting one system both consume untrusted content and take immediate high-impact action,
and treat memory like a sensitive asset rather than a convenience feature.

Those recommendations are a practical extension of the paper’s findings, and they also fit OpenClaw’s own emphasis on security, safe defaults, and making critical permission decisions explicit during setup.

What this means for the future of admin work

The future of admin work is not less human. It is more leveraged.

The human role becomes more valuable when the machine can handle triage, drafts, categorization, reminders, and process support — but only if someone designs the workflow intelligently. The real skill is no longer just execution. It is judgment: deciding what should be automated, what should be reviewed, and what should never be handed over blindly.

That is why papers like this matter.

They remind us that the promise of AI in operations is real, but so is the responsibility. The question is not whether AI can help with admin work. It clearly can. The question is whether we are building admin systems that are efficient and trustworthy.

That is the real bar.

Read more