Gerasimos Marketos at Hack The Box discusses why, when it comes to AI and cyber-security, resilience must go beyond controls

For many organisations, AI has quickly transitioned from experimental pilots to mission-critical roles. Customer service, fraud detection, operational workflows and automated decision systems are all increasingly dependent on machine learning. AI’s deep integration into business logic and high-value processes means it has become part of the infrastructure that organisations rely on every day.
This is delivering substantial benefits in terms of efficiency and insight, but it has also expanded the attack surface. As AI systems influence decisions and automate actions, they create opportunities for adversaries to affect business outcomes in ways that traditional cyber-security controls were never designed to catch.
The rapid pace of deployment is amplifying these challenges, and many organisations are integrating AI faster than their understanding of what can go wrong is developing. Unlike deterministic code, AI models interpret, adapt and make decisions based on patterns in data. And this means risk extends to securing the decisions models make, the data they ingest and the actions they trigger.
AI models learn from data and that learning itself can be an attack vector. AI training may be vulnerable to data poisoning or supply-chain interference long before a model reaches production. In environments where datasets are large and externally sourced, establishing robust provenance and traceability is difficult, reducing confidence in what a model has learned.
Even after deployment, well-tested models remain vulnerable to carefully crafted input data. Tiny, targeted adjustments imperceptible to humans can cause models to misclassify inputs or misinterpret requests, for example.
In addition, exploits such as model inversion and membership inference allow attackers to extract sensitive training data simply by probing a model’s responses. In regulated industries or systems handling personal data, this can even lead to a compliance breach without a traditional intrusion.
The emergence of agentic AI amplifies these issues by expanding both the attack surface and the consequences of failure. Next-generation agents can retain memory, plan multi-step operations, use tools, execute code and interact with internal systems. If one of these agents is compromised or misaligned, the impacts may extend far beyond incorrect outputs. Autonomous behaviour can propagate across workflows, execute unintended actions or interfere with operational processes.
These capabilities also make agentic AI attractive to adversaries. This is no longer theoretical. Early evidence shows that state-level actors and sophisticated criminal groups are already experimenting with large-scale AI-orchestrated cyber-attacks that mirror agentic behaviours by automating reconnaissance, decision-making and execution with minimal human intervention.
Enterprise cyber-security frameworks have historically been built around predictable, deterministic systems. Static code-defined behaviours and patch-driven lifecycles allowed defenders to rely on established controls and periodic validation.
AI challenges this because the safety and reliability of a model depend on data quality, environmental conditions and user input that evolve rapidly. A model that has been validated before deployment under controlled conditions may behave unpredictably under adversarial pressure, degraded inputs or ambiguous contexts.
Human factors are also central to AI resilience. When users defer without question to automated outputs, or lack the training to interrogate unexpected behaviour, oversight fails. Organisations without robust human–AI interaction protocols risk creating a significant blind spot that adversaries can exploit.
As AI continues to roll out across mission-critical functions, traditional cyber-security controls will of course remain necessary, but more is needed to guarantee ongoing safety.
To have confidence in AI resilience, it must be treated as a continuous, iterative process rather than a one-time test or certification. Capabilities, safety and reliability cannot be assumed, they must be continually demonstrated. While static testing at deployment can establish a baseline, ongoing evaluation under realistic and adversarial conditions is essential to verify that the system continues to behave as expected as models, contexts and threat environments evolve.
One emerging approach is the use of controlled simulation environments where AI systems and agents are exposed to constantly updated, threat conditions, shifting data patterns and complex scenarios that mimic real operational contexts. These environments allow accuracy to be tested under controlled conditions, along with how AI systems perform against adaptive, hostile inputs and adapting strategies.
This also enables defenders to understand how AI systems reason, adapt or fail when faced with ambiguity, deception or workflow pressures. Insights from this testing help organisations assess behavioural characteristics, such as strategy, alignment, robustness and safety, which matter when AI systems interact with real people and real systems.
Benchmarking also plays an essential role. Organisations cannot rely on vendor claims or narrow lab tests. Independent, scenario-based benchmarking reduces uncertainty by providing measurable performance insights that support procurement decisions, compliance reporting and risk management.
Human capability is just as important. Continuous training that includes adversarial scenarios, structured challenge frameworks and role-based exercises improves the ability to recognise misalignment and intervene effectively. Research carried out by the UK government shows that targeted training and structured questioning protocols materially improve human oversight, reducing blind acceptance of AI outputs (Cyber Security Risks to Artificial Intelligence).
AI has rapidly become a complex attack surface and because AI systems are dynamic, adaptive and interwoven with enterprise workflows, resilience must be adaptive too. That means continuous evaluation rather than static validation, real-world adversarial testing rather than theoretical modelling and sustained human oversight rather than passive monitoring.
Treating AI as critical infrastructure requires ongoing assurance, stress-testing and governance throughout its lifecycle. In doing so, organisations can build systems that are not only fast and capable, but also defensible and trustworthy enough to have confidence in.
Gerasimos Marketos is Chief Product Officer at Hack The Box
Main image courtesy of iStockPhoto.com
© 2025, Lyonsdown Limited. teiss® is a registered trademark of Lyonsdown Ltd. VAT registration number: 830519543