AI-Supported Cyber Attacks on North Sea Maritime Infrastructure

Transformative Technology:

Cyber Security

Semester programme:

Cyber Security Professional

Research group:

Cyber Security

Project group members:

Aleksandar Yanakiev
Georgi Ananiev
Alec Schmitz
Ruben de Bruijn
Reinout Perquin

Transformative Technology:

Cyber Security

Semester programme:

Cyber Security Professional

Research group:

Cyber Security

Project group members:

Aleksandar Yanakiev
Georgi Ananiev
Alec Schmitz
Ruben de Bruijn
Reinout Perquin

Previous project Supporting Without Replacing: Designing an AI-Driven Companion for Children with Dyslexia Next projectReplay-Based Crash Deduplication for Stateful REST API Fuzzing

Project description

How can AI technologies be effectively weaponized to simulate realistic cyber-attacks on North-Sea-like maritime and offshore cyber-physical infrastructure, how do different AI tiers compare in terms of capability, speed, and detectability, and what defensive implications do these scenarios reveal?

North Sea critical infrastructure — vessels, ports, offshore wind parks, drilling platforms, and undersea cables — increasingly depends on Operational Technology (OT) and Industrial Control Systems (ICS) that were never designed with cybersecurity in mind. Meanwhile, AI is rapidly lowering the technical barrier for conducting sophisticated cyber-attacks.

This project investigates that intersection: five team members each built and tested autonomous AI agents (commercial LLMs, open-source models, and dedicated AI pentesting platforms) against a simulated OT/ICS environment to determine whether AI can independently discover, exploit, and manipulate industrial control systems — and how the resulting threat should inform defenders.

Context

Domain: Cybersecurity / Critical Infrastructure / Operational Technology

North Sea maritime and offshore energy infrastructure — vessels, harbours, offshore wind parks, drilling platforms, and undersea cables — is increasingly dependent on Operational Technology (OT) and Industrial Control Systems (ICS). These systems were historically designed for reliability and availability rather than security, and many still lack basic protections such as authentication, encryption, or network segmentation.

At the same time, artificial intelligence is rapidly changing the offensive cybersecurity landscape. Capable AI agents can now autonomously perform reconnaissance, identify vulnerabilities, and execute multi-step attacks with minimal human guidance — significantly lowering the technical skill and time investment previously required to target complex industrial systems.

This project sits at the intersection of these two developments. It investigates how AI-augmented attack techniques can be applied to a realistic, simulated OT/ICS environment representative of North Sea infrastructure, hosted in an isolated university lab network. The research explores multiple AI agent architectures — ranging from commercial language models to dedicated autonomous penetration-testing platforms — connected to real offensive security toolchains.

The goal is twofold: to determine how effectively such AI agents can independently discover and exploit industrial control vulnerabilities, and to compare different AI tiers in terms of capability, autonomy, and reliability, in order to inform defensive recommendations for operators of similar critical infrastructure.

Results

The project produced both a reusable technical architecture and a set of concrete security findings, validated through repeated, reproducible experiments against a simulated industrial control environment.
Architecture and Technical Outcome

A core outcome is a model-agnostic architecture that connects AI agents to a real penetration-testing toolchain through a standardised tool-calling protocol. This decouples the AI reasoning layer from the tool execution layer, meaning any compatible AI agent — commercial, open-source, or purpose-built — can be plugged into the same toolchain and tested against the same target under identical conditions. This architecture was independently adopted and extended by multiple team members across different AI platforms, confirming its generalisability beyond a single implementation.

Demonstrated Capability
Using this architecture, multiple AI agents were shown to autonomously conduct structured penetration tests against a simulated OT/ICS environment from a single natural-language instruction, with no further human guidance during the attack. Demonstrated capabilities included full network reconnaissance, identification of default credentials granting administrative control over industrial management interfaces, and — in the most advanced case — autonomous reading and writing of live industrial protocol register values, demonstrating the ability to directly manipulate a simulated physical process.

Comparative Insights
A central insight of the project is the measurable capability gap between AI tiers. Commercial frontier models augmented with domain-specific knowledge consistently outperformed smaller open-source models, both in the completeness of vulnerabilities discovered and in the quality of structured reporting produced. Some agents also demonstrated unscripted safety-relevant reasoning, such as correctly identifying false-positive results and independently enforcing stricter scope boundaries than instructed — behaviour relevant to both offensive capability and operational safety assessment. A comparative matrix consolidating these results across all tested AI agents and target systems was produced as a structured deliverable.

Validation and TRL Positioning
All experiments were conducted against a purpose-built, encapsulated ICS simulation environment designed to closely emulate real industrial protocols and control system behaviour, rather than against a purely theoretical or abstract model. Findings were validated through repeated test runs, cross-agent comparison on identical targets, and documented evidence (e.g. before/after verification of modified system values).

Given this validation in a realistic but controlled and simulated environment — rather than an actual operational deployment — the project is best positioned at Technology Readiness Level 4, approaching Level 5: the core technology and methodology have been validated in a laboratory environment that closely mirrors real-world conditions, with initial validation steps taken toward a more operationally representative setting. Further work in an even more representative operational-adjacent environment would be required to advance toward TRL 5–6.