A widespread technology outage has grounded flights, shut down banks, and even inhibited news outlets from reporting on it. On Friday, global IT disruptions began when a faulty update was pushed out from CrowdStrike for one of its tools, affecting computers running Microsoft Windows. While not a cyberattack, this incident highlights how a single update from one software company can disrupt operations across various industries, including banks, media companies, emergency service call centers, and airports.
Recovery options for affected machines are manual and thus limited: Administrators must attach a physical keyboard to each affected system, boot into safe mode, remove the compromised CrowdStrike update, and then reboot.
Consider the impact on Operational Technology (OT) if this issue reached the plant operations floor for critical infrastructure. This could bring power generation to a halt, stop the flow of oil or natural gas, or impact critical water treatment operations. These disruptions would have immediate impacts on safety and the delivery of essential goods and services.
This is a serious reminder of the importance of validating updates in OT environments. Windows systems play critical roles in many ICS environments, commonly used for human-machine interfaces (HMIs) and supervisory control and data acquisition (SCADA) systems. These systems interface with PLCs, ICS, RTU and other critical OT providing operators with real-time data and the ability to monitor, control, and automate industrial processes efficiently.
The difference is that industrial operators are resistant to changing the OT environment, and typically should not automatically accept software updates in the operational environment. That said, certain software updates and patching is indeed vital to the security and continuity of industrial operations. However, we must thoroughly test and validate these patches before deployment to avoid potential disruptions and ensure the continued safety and reliability of critical infrastructure
This incident should not impede progress in vulnerability management; but it does emphasize the need to take extra measures in OT.
Considerations for Patching and Updating in OT
When you discover vulnerable software versions in your environment (taking an OT-specific approach for this aspect as well), there are a few things to understand before addressing them:
Severity of the Vulnerability in Your Context:
Patching isn’t easy in OT, that’s why we need to understand how to prioritize and consider all mitigation options.
While evaluating the patching strategy, we can also consider compensating controls, which may be the main course of action if the system cannot be patched.
This could involve:
Testing & Validating Updates and Patches:
Patching in OT can be resource-intensive and time-consuming but necessary to ensure patches are safe for these complex ICS environments where safety and operational uptime are of utmost importance. First you need to have (or build) a replica environment that mirrors the production settings. Patches are then deployed in this test environment, where they go through functional, security, performance, and integration tests to ensure they do not disrupt operations in the actual environment.
We have to test various variables to ensure systems will remain stable over time, including the effects on integrations with other systems, components, and software. Changes to environments can "break" the system. This is evident in the CrowdStrike example, where updates involving the kernel are especially sensitive, and a bad update can brick the system. By thoroughly testing in a controlled environment, we can encounter issues like the blue screen of death in a test setting rather than in the actual operation.
During all this, document everything in detail. This documentation is reviewed before approving the patch for deployment. The deployment is then carried out in stages, often starting with a pilot and gradually rolling out the patch.
Key Takeaways:
It will be a challenging weekend as people hope to get on their flights, complete bank transactions, and return to their daily lives. IT and security teams will be working hard to restore and recover systems. This incident serves as a learning opportunity for the OT security industry to continue maturing their security and IT/OT programs through collaboration across teams, partners, and vendors.
For more resources on managing systems in OT environments, read our brief below.