Introduction
OT is IT plus physics ~ Robert M. Lee
If we accept Rob’s succinct formula that OT is IT plus physics, then what is an OT pentest? Even amongst my own team, whose primary role is executing OT pentests across a wide variety of ICS industry verticals, the answer is constantly in flux as we strive to define and refine our profession.
These are the qualities I believe constitute an OT pentest:
An OT penetration test is an offensive security assessment that blends aspects of traditional penetration testing and modern red teaming in an ICS/OT environment. It is scoped to a specific site or system and assesses the people, processes, and technologies supporting the operation of that site or system. It is executed with clearly defined objectives aligned with the ICS Cyber Kill Chain,1 with a primary objective of demonstrating the ability to impact physical process control. It is most effective when executed from an assumed breach perspective as a full-knowledge, white-box assessment. It should leverage intelligence-driven threat scenarios and apply real-world adversary techniques to achieve these objectives. It may also contribute to secondary goals, such as to inform and enhance detection engineering activities or contribute to purple team activities.
I don’t claim any absolute authority on the matter. Many out there have been doing this far longer, and I am deeply indebted to them for their lessons and insight, for the paths they’ve blazed and the markers they’ve left behind. My primary position is simply that OT pentesting isn’t well understood or coherently defined by asset owners and operators or service professionals. Nearly everyone I ask offers a different definition, which suggests an opportunity for alignment or, at the very least, more dialogue on the matter.
My director recently asked me, “Is pentesting a science or an art?” I am convinced it involves both. Methodology and style both play a role. Depending on perspective and experience, most practitioners’ answers will land somewhere on a spectrum between these two poles, though rarely at the same point, which is why it makes sense that we still lack a cohesive, coherent definition. That said, I believe some approaches to OT pentesting are inherently more valuable because they are more effective. They are more effective because they focus on the right objectives and goals and, because of this, generate far more valuable insights and outcomes.
This makes the question of what constitutes an effective OT pentest an inquiry worth exploring. This essay is an attempt to do so. You’re welcome to take my definition at face value, but I hope you’re skeptical. I hope that if we were discussing this face to face, you’d challenge me to defend my sprawling paragraph of a definition above. Why not a neat, single sentence? In the immortal words of Deckard Cain: “Stay a while, and listen.”
Background
Penetration testing has evolved significantly since the Department of Defense sponsored the first offensive “tiger teams” in the 1970s2. However, at its core, it has always represented some form of adversary emulation and generally seeks to answer these basic questions:
What vulnerabilities exist in my environment that could be exploited by an adversary?
How would that adversary exploit them?
What would the impact be in my environment if the adversary were to exploit them?
How can I address the vulnerabilities so the adversary can’t exploit them?
Newer questions representing inflection points where I believe modern pentesting has begun to absorb aspects of purple teaming and detection engineering include the following:
Can I detect the adversary trying to exploit the vulnerabilities?
Are the detection mechanisms I have in place working as expected?
Contemporary iterations of offensive security assessments have evolved to focus primarily on applying the Tactics, Techniques, and Procedures (TTPs) used by attackers in real-world intrusions and ensuring proper detection and prevention solutions are in place. The boundaries between penetration testing, red team engagements, purple team exercises, adversarial simulation, and detection engineering have become, in my opinion, more individually defined and collectively blurry as each particular domain and the wider security industry has matured. I believe the most effective offensive security engagement takes the best parts of each domain and wields them to have the greatest and most beneficial impact for all stakeholders involved. Others might disagree. I am aware, for better or worse, that I have a habit of hybridization — towards both/and perspectives rather than either/or answers.
Industry Terms
Before continuing, let’s review the similarities and differences between penetration testing, red teaming, purple teaming, and adversary simulation. We’ll also briefly explore the characteristics of other key concepts like assumed breach, cyber threat intelligence, and detection engineering. In my experience, the most effective and valuable OT pentests incorporate elements from all of these practices but should always blend aspects of traditional penetration testing with modern red teaming.
Penetration Testing: Penetration testing seeks to identify and exploit as many vulnerabilities as possible in a narrowly scoped set of systems to inform and prioritize remediation activities. By actively exploiting vulnerabilities, a penetration tester can prove their existence and demonstrate their potential impact. Pentests are goal-based, with the desired outcome being a qualitative perspective on vulnerabilities otherwise lost in an assessment that does not involve active exploitation. Penetration tests are point-in-time assessments, and they provide insight into the current security posture of the target environment. Penetration tests can be performed from a white box, grey box, or black box perspective, which are gradient descriptors of how much prior knowledge is shared about the environment under assessment.
Red Teaming: Red teaming seeks to provide a more realistic view of an organization's overall security posture by emulating adversary TTPs across the entire organization to pursue defined engagement objectives, such as beginning outside an organization’s perimeter and gaining access to internal networks or beginning at with the permissions of a standard user and escalating privileges such that they can access or affect privileged systems or processes. Much like pentesting, red team engagements involve goals as well. The primary difference is that red team engagements seek to assess an organization’s people, processes, and technology, and they do this by pursuing engagement objectives with a special focus on the last three layers of the NIST CSF (Detect, Respond, and Recover).3 The question is not simply, “Are there vulnerabilities in your environment?” More importantly, the red team asks, “Can you detect and respond when a modern adversary exploits those vulnerabilities?” Red team assessments are objective-based and often involve longer time frames, threat intelligence-based scenarios, and wider engagement scopes. They tend to be covert operations, with few stakeholders aware of the engagement outside the red team.
When I worked at an electric power utility, I leaned heavily on a 2015 article by Raphael Mudge to get buy-in from leadership to expand our existing pentest program to include red teaming. In this article, Mudge describes four models of red team operations: full-scope penetration tests, long-term operations, war games, and attack simulations. He describes full-scope penetration tests as mimicking “the targeted attack process an external actor executes to break into an organization.” However, these may still be constrained regarding time, resources, and legal compliance. Long-term operations are unconstrained by time frame and are based on the idea “that no organizational unit exists in isolation,” and “allow the red team to work towards the ‘perfect knowledge that a long-term embedded adversary would have. War games are red vs. blue exercises intended “to train and evaluate network defense staff.” Mudge defines attack simulations as exercises in which the red team designs and executes “realistic observable activity” associated with “a plausible scenario, a representative or real actor, and a realistic timeline,” to “validate procedures and train blue operators to use them.”4
Purple Teaming: In 2023, Jorge Orchilles published Version 3 of the Purple Team Exercise Framework. In it, he defines a purple team exercise as “a full knowledge information security assessment where attendees [from red, blue, and CTI teams] collaborate to attack, detect, and respond.” Purple team exercises involve open, collaborative discussions about TTPs and defenses to “test, measure, and improve people, processes, and technology in real-time.”5 It’s a phenomenal framework, and I hope one day to develop something similar for OT pentesting.
Adversary Simulation: Also known as adversary emulation, this practice involves replicating known adversary TTPs to validate existing detection and response capabilities. It is effectively Mudge’s attack simulation model. I note it as a separate practice because various vendors have since developed it into standalone service offerings or automated platforms, such as MITRE’s Caldera6 or Red Canary’s Atomic Red Team7 projects.
Assumed Breach: Assumed breach is an offensive security assessment paradigm (applicable to both pentest and red team engagements) in which the target “cedes access”8 to represent an adversary that has already obtained a foothold in the environment. This greatly enhances engagement efficiency by allowing the testing team to skip the initial access step, which, while important, is often time-consuming. Assumed breach engagements shift focus from perimeter controls to organizational and (in the case of OT pentesting) physical process control impact. An assumed breach foothold also anchors the development of a realistic attack path that describes adversary TTPs as they progress from initial access to their primary objective.
Cyber Threat Intelligence: Cyber Threat Intelligence (CTI) is “analyzed information about the intent, opportunity, and capability of malicious actors.” CTI teams analyze adversary intrusions to extract insights that meet an organization’s needs for information to inform “actionable strategic and tactical choices that impact security.”9 Threat intelligence is often delivered through intelligence reports, Indicators of Compromise (IOCs), or TTPs.
Detection Engineering: The process and practice of manipulating threat data to create capabilities for defenders to detect threats and accurately identify, triage, and respond to adversary TTPs.
The Issue
As we see in the preceding descriptions, the objectives and goals of each practice play a huge role in structuring and guiding these differentiated approaches to offensive security.
Let’s circle back to our starting question: if OT is IT plus physics, what is an OT pentest? We’ve detoured through several key definitions, none of which mention OT at all. Our question contains the answer — physics. It seems simple. So why do so many “OT pentests” seem to miss the mark?
When we compare IT and OT adversaries, those differences are most clearly marked by the physical impact an adversary seeks to achieve by their actions within an OT environment. The goal of an adversary who targets OT environments is to impact process control and cause effects in the real, physical world.
If an OT penetration test seeks to simulate adversary behavior or identify vulnerabilities that real-world attackers could leverage in pursuit of their real-world aims, the objectives of an OT pentest must be aligned with the ICS Cyber Kill Chain. The key is distinguishing between Access (generally Stage 1 activities) and Impact (generally Stage 2 activities) when framing adversary intent in an OT environment. In my experience, all IT pentests and many — if not most — OT ones are more concerned with access rather than impact.
The lack of focus on impact as the primary adversary objective is why we lack a coherent and cohesive definition of OT pentesting. Far too often, I hear about or even execute pentests in an OT environment that have nothing to do with actual ICS assets, site functions, or process control. Far too frequently, we pat ourselves on the back and call ourselves OT pentesters when all we’ve done is demonstrate access by getting Domain Admin in the DMZ. This is only the first step, a prerequisite to process impact. Someone recently described Domain Admin as a tool rather than a destination. I can’t recall where I heard this, but it resonates.
If an engagement doesn’t involve Stage 2 activities and demonstrating the ability to impact physical process control isn’t the primary objective, then at most, we only go halfway in our attempts to emulate a real-world OT adversary. There is some value for stakeholders here, but not nearly as much as if the focus was on how an adversary could impair or impact physical process control. Demonstrating the entire kill chain differentiates an effective OT pentest from what is otherwise just a pentest in an OT environment.
Scoping Mechanisms
In my current role, we have three types of OT pentest engagements:
Network pentests
Application pentests
Device pentests
At a basic level, these simply represent different engagement scopes. I mention them because the latter two — application and device pentests — tend to be less easily aligned to the ICS Cyber Kill Chain. Or perhaps more accurately, they represent specific phases in Stage 2 of the Kill Chain, namely the Develop and Validate phases that an adversary is more likely to pursue back in a lab than in the target environment. For most ICS operators, engagements scoped to an entire site rather than just one or two embedded systems will be far more illuminating. Conversely, I tend to recommend application and device pentests to vendors so that the applications, devices, and systems they provide to operators are better secured before deployment. That said, we can have all three at once. A couple of examples that could span all three (given enough time!) are distributed control systems (DCS) or energy management systems (EMS) solutions, which often consist of proprietary hardware, applications, and a standard network/domain schema deployed by the vendor.
A great example (and, one might argue, a compelling counterpoint to some of my previous assertions) of a methodology that spans all three types can be found in the NESCOR Guide to Penetration Testing for Electric Utilities, authored by Justin Searle.10 This methodology guides electric power operators in performing penetration tests on Smart Grid systems. I have the utmost respect for Justin’s methodology, and I recommend it to all my electric power colleagues. What I think it lacks, however, is the adversary perspective and the development of a kill chain. The NESCOR paper certainly includes goals; there is one for each task in the methodology, after all! But what are the objectives that an adversary would be positioned to achieve if they could exploit the vulnerabilities found after completing the tasks listed in the guide? In my mind, this is a tester-centric framework rather than an adversary-centric one. Because it lacks an objective-based approach aligned with ICS Cyber Kill Chain, it prioritizes breadth of analysis at the expense of realistic attack path development.
Why Not OT Red Teaming?
Suppose we accept that the primary objective of an effective OT pentest is to demonstrate the ability to impact physical process control. In that case, we immediately lean towards an objective-based red-team model. So why advocate for a hybrid approach and not advocate solely for red teaming in OT environments? Two main reasons:
Safety: OT environments are incredibly complex, with sensitive systems and functions that, if not well understood and approached with care, can result in significant safety and operational impacts. Running a covert assessment in a production environment without informing stakeholders and without being provided significant background information on the site and its functions is a recipe for disaster. Safety is one of the main reasons I recommend a white-box approach to penetration testing instead of grey- or black-box assessments. More information and collaboration means safer engagements.
Time: Red team engagements tend to be long-term efforts. Testing in a production OT environment is typically performed during site commissioning or when a site is scheduled for downtime due to maintenance or upgrades. This constricts the engagement time frame and means the assessment team isn’t being efficient if they are playing the endless cat-and-mouse game of trying to develop and deploy novel EDR evasion techniques that will be obsolete within the week. It’s far more valuable to assume that EDR can be bypassed by a determined adversary or to employ an assumed breach perspective and skip straight to adversary actions after initial access. It’s valuable to test people, processes, and technologies. It’s valuable to ensure visibility, detection, and response capabilities are operating as expected. But it’s inefficient for this to be the primary goal, especially during OT pentests. I love it when a customer has Crowdstrike and Syslog and a well-tuned SIEM that allows them to detect and respond to everything, but evasion isn’t the point. The true value of an OT pentest is most apparent when those layers are peeled back, and the engagement team is empowered to explore vulnerabilities leveraged to cause physical process impact. This requires an assumed breach mentality and the courage and buy-in from stakeholders to ask, “What if all our security controls do fail?” To drive efficiency and value from an OT perspective, it’s often far more useful to leverage seeded access, privileged accounts, and EDR allow-lists to enable the pentest team with the assumption that Stage 1 has already been accomplished and the adversary already has access to the OT network.
Like safety, time is another factor incentivizing a white-box approach for OT pentests. It’s more efficient to have as much information as possible about the target environment upfront.
Goals vs. Objectives
Suppose we accept that an OT pentest must align with the ICS Cyber Kill Chain to be effective and that an OT pentest involves some, but not all, red team engagement characteristics. What brings us back to my sprawling definition in the introduction? Goals do.
The most effective OT pentest engagements involve both objectives and goals. Objectives are the targets the engagement team is driving towards, the markers that represent adversary success or failure. In comparison, goals are the byproduct of those efforts. Goals are the desired outcomes of the engagement; second-order objectives if you will. Goals are what bring the purple team, adversary simulation, and detection engineering concepts together to play.
For example, my preferred engagements have three objectives:
Objective 1: Move from IT to OT DMZ
Objective 2: Move from OT DMZ to Process Control Network
Objective 3: Impact Process Control
The first two objectives are aligned with Stage 1 of the ICS Cyber Kill Chain and describe the network boundaries an adversary must traverse to access the Process Control Network, where they can start pursuing Stage 2 activities to develop and test capabilities that can meaningfully attack the ICS.
After I have worked with the engagement stakeholders to define the objectives, we start discussing secondary goals. Do they want the engagement team to focus on specific TTPs that facilitated prior pentest findings? Do they want to test detection and response capabilities? Do they want to inform the SOC and approach the engagement from a purple team perspective for training and collaboration? Do they want the testing team to explicitly note timelines and activities so they can build effective detections for the TTPs that were used? Do they want the engagement team to emulate a specific threat group known for targeting their industry vertical? These are all valid goals but should not get in the way of assessing the potential for physical process impact. If a specific security control proves to be a blocker, I like to note this as a win for the customer, then ask them to remove that control and keep the momentum going. In a way, this represents an interactive application of assumed breach principles at every level of the engagement.
The state of the target environment affects goals considerably. By state, I mean whether the target environment is a live production environment, a pre-prod/commissioning environment, or a dev/lab environment.
In a production environment, the highest-order goal is to pursue the engagement objectives without causing real operational impact. At first glance, this may seem contrary to the assertion that demonstrating the ability to impact physical process control must be the primary objective of an effective OT pentest, but this isn’t the case. Objectives and goals are different. In a production environment, the objective can remain the same, but the goal of not causing an operational outage means that active testing should halt at the Attack Development and Tuning phase of Stage 2. The pentest team can still demonstrate the ability to impact physical process control, but this can be done through dialogue and discussion with stakeholders (such as our peerless plant champions) who can confirm that subsequent activities would have a meaningful impact on physical process control. A simplified example could involve the pentest team confirming privileged access to an EWS that can configure points or download new logic on a PLC, then discussing which points or what logic, if changed, would impact physical process control. Alternatively, the team might escalate privileges on a SCADA server to the point they can send commands to an RTU and open circuit breakers, then confirm with SCADA engineers that this would be the logical next step for an attacker and would cause a physical impact. To me, both examples meet the criteria of demonstrating a capability to impact process control but stop short of deploying it because the goal of not causing an actual outage trumps the goal of live demonstration.
Conversely, in a lab environment where operational impact is not a concern, the goals of the engagement may include an actual demonstration of the ICS Attack phase. In this case, perhaps the pentest team can develop, test, and deliver a capability that culminates in their ability to execute an ICS attack and demonstrates real, physical process control impact. This could be an extension of the previous PLC example, but instead of halting at the point at which they demonstrated the ability to download logic or change setpoints, the team is permitted to follow through and make those changes so that stakeholders can observe the impact of that attack. Another great example would be if the team were able to identify the Modbus registers on an ICS device storing the scaling values used to calculate voltage readings, confirm that the voltage readings act as an input to another system responsible for managing resource provisioning, overwrite those registers, then finally confirm that the dependent system has scaled back resource provisioning to the point it impacts operations.
I think there is a lot more to be said about the distinction between objectives and goals, and I hope to be able to do the topic justice in a future post. For now, the key takeaway is that there is a distinction between objectives and goals and that goals should be more malleable than objectives in the context of an OT pentest.
Collaboration Is Key
Purple team concepts come into play quite naturally due to the need for extensive collaboration and multi-team involvement. Not only are covert OT pentests dangerous regarding safety and operational impact, but they are also simply unfeasible. A successful OT pentest almost always requires involvement and input from multiple teams.
When I lead an OT pentest, my ideal would be to involve stakeholders from all the following teams:
OT Engineering/Operations: A successful engagement requires input and participation from operators who intimately understand the target environment. Plant Champions are crucial if you intend to pursue Stage 2 objectives!
OT Security: If the customer is mature enough to have a dedicated OT security team, representatives from this team are obvious stakeholders.
IT SOC: In most cases, customers aren’t yet mature enough to have a dedicated OT security team, and the IT SOC tends to be responsible for detection and response functions.
IT Engineering/Architecture: Similar to the above, in most cases, IT engineers and architects are likely responsible for a portion of the infrastructure you will be testing, especially if you are pursuing Stage 1 objectives or are relying on remote access for the engagement.
Incident Response: An OT-focused Incident Response Plan is the foremost of the Five ICS Cybersecurity Critical Controls.11 The IR team may sometimes overlap with SOC functions, but if there are dedicated IR personnel, involving them in an OT pentest — especially one that includes purple team goals — is an incredible way to practice incident response. Imagine a tabletop exercise without a table, with real artifacts rather than injects.
Cyber Threat Intelligence: I am blessed to work on a team with direct access to some of the brightest minds in ICS-focused threat intelligence. This makes it easy to leverage OT-specific threat intelligence to inform our use of relevant adversary TTPs. I recognize that most organizations do not have this luxury. In those cases, CTI input could shift from teams or individuals to the information provided by trusted third-party threat intelligence reports. Whatever form it takes, leveraging CTI leads to more realistic and informed attacks, contributing to more effective pentest findings.
The most effective OT pentests include stakeholders from all these teams. Furthermore, OT pentests are a great opportunity to reinforce or repair relationships across the IT/OT divide. I often specify team-building and cross-domain training as explicit engagement goals. The most valuable OT pentests I’ve ever participated in naturally and effortlessly became purple team engagements because we had all parties at the table simultaneously.
Summary
At a high level, I contend that a pentest is characterized by its:
Scope
Objectives
Goals
More specifically, based on the above discussion, I would argue that an effective OT pentest requires:
A narrow scope focused on a critical site, system, or function
Clearly defined objectives aligned with the ICS Cyber Kill Chain, with a primary objective of demonstrating the ability to impact physical process control
A focus on testing the full spectrum of people, processes, and technology that support the target site, system, or function
If we want an OT pentest to be even more effective, it should leverage the following:
A white-box approach, which implies that the engagement team is provided with as much knowledge of the target environment as possible before execution
The assumed breach model, with seeded access to provide a realistic foothold in the environment that supports the engagement objectives and goals
A CTI-driven pursuit of realistic and contextual ICS TTPs based on adversary simulation practices
The full-knowledge, collaborative, and multi-team involvement typically associated with purple team exercises
An appreciation of the distinction between objectives and goals
The use of goals to fine-tune the engagement and drive personalized, high-value engagement outcomes for stakeholders
Or, to reiterate once more:
An OT penetration test is an offensive security assessment that blends aspects of traditional penetration testing and modern red teaming in an ICS/OT environment. It is scoped to a specific site or system and assesses the people, processes, and technologies supporting the operation of that site or system. It is executed with clearly defined objectives aligned with the ICS Cyber Kill Chain, with a primary objective of demonstrating the ability to impact physical process control. It is most effective when executed from an assumed breach perspective as a full-knowledge, white-box assessment. It should leverage intelligence-driven threat scenarios and apply real-world adversary TTPs to achieve these objectives. It may also contribute to secondary goals, such as to inform and enhance detection engineering activities or contribute to purple team activities.
Is this all just semantics? I’m convinced that’s not the case. I believe an effective OT pentest involves all of the elements identified above. OT environments are complex, so it makes sense that offensive security assessments within those environments must be similarly nuanced to be effective. While I hope to have presented a clear-eyed and coherent rationale for my definition, I want to reiterate that this is my perspective based on my experience and background. Dogma is dangerous for all practices and professions, and it has no place in ICS cybersecurity, a high-stakes, high-impact realm of complex and nuanced systems and environments. All feedback is welcome, especially if you have a different perspective. As practitioners, we hone our tradecraft in many ways, but we often opt towards deep-dive, technical learning instead of high-level, intellectual, or, dare I say, epistemological dialogue about the industry. We stand to benefit immensely from an effort to define better what, how, and why we do what we do. When we refine our profession, we enhance our ability to protect and defend the systems that ensure our communities have clean water, reliable power, and all the other things we take for granted that rely on safe, secure, and resilient ICS operations. The ICS cybersecurity community deserves a widely accepted, coherent, and cohesive definition of OT pentesting.
Assante, Michael J. and Robert M. Lee. The Industrial Control System Cyber Kill Chain. SANS Institute, 2021, www.sans.org/white-papers/36297/. Accessed 10 October 2023.
Russell, Deborah and G.T. Gangemi Sr. Computer Security Basics. Cambridge, O’Reilly & Associates, Inc., 1990, p. 29.
"The Five Functions." National Institute of Standards and Technology, Mar. 2023, www.nist.gov/cyberframework/online-learning/five-functions. Accessed 11 November 2023.
Mudge, Raphael. “Models For Red Team Operations.” FORTRA, 9 July 205, www.cobaltstrike.com/blog/models-for-red-team-operations. Accessed 27 October 2023.
Orchilles, Jorge. Purple Team Exercise Framework. SCYTHE, 2023, github.com/scythe-io/purple-team-exercise-framework. Accessed 8 November 2023.
Caldera. MITRE, 2023, caldera.mitre.org/. Accessed 8 November 2023.
Atomic Red Team. Red Canary, 2023, atomicredteam.io/. Accessed 8 November 2023.
Medin, Tim. "Assumed Breach - A Better Model." SANS, 21 Jul. 2022, www.sans.org/webcasts/assumed-breach-better-model/. Accessed 1 November 2023.
Lee, Robert M. “Cyber Intelligence Part 5: Cyber Threat Intelligence.” Robert M. Lee, 2015, www.robertmlee.org/cyber-intelligence-part-5-cyber-threat-intelligence/, accessed 8 November 2023.
Searle, Justin. NESCOR Guide to Penetration Testing for Electric Utilities. National Electric Sector Cybersecurity Organization Resource (NESCOR), Version 3, smartgrid.epri.com/doc/nescorguidetopenetrationtestingforelectricutilities-v3-final.pdf. Accessed 26 October 2023.
Lee, Robert M. and Tim Conway. The Five ICS Cybersecurity Critical Controls. SANS Institute, 2022, www.sans.org/white-papers/five-ics-cybersecurity-critical-controls/. Accessed 8 November 2023.