Support
No items found.

The Importance of Testing & Validating Software Updates in OT, Including Security Patches

July 19, 2024

A widespread technology outage has grounded flights, shut down banks, and even inhibited news outlets from reporting on it. On Friday, global IT disruptions began when a faulty update was pushed out from CrowdStrike for one of its tools, affecting computers running Microsoft Windows. While not a cyberattack, this incident highlights how a single update from one software company can disrupt operations across various industries, including banks, media companies, emergency service call centers, and airports.

Recovery options for affected machines are manual and thus limited: Administrators must attach a physical keyboard to each affected system, boot into safe mode, remove the compromised CrowdStrike update, and then reboot.

Consider the impact on Operational Technology (OT) if this issue reached the plant operations floor for critical infrastructure. This could bring power generation to a halt, stop the flow of oil or natural gas, or impact critical water treatment operations. These disruptions would have immediate impacts on safety and the delivery of essential goods and services.

This is a serious reminder of the importance of validating updates in OT environments. Windows systems play critical roles in many ICS environments, commonly used for human-machine interfaces (HMIs) and supervisory control and data acquisition (SCADA) systems. These systems interface with PLCs, ICS, RTU and other critical OT providing operators with real-time data and the ability to monitor, control, and automate industrial processes efficiently.

The difference is that industrial operators are resistant to changing the OT environment, and typically should not automatically accept software updates in the operational environment. That said, certain software updates and patching is indeed vital to the security and continuity of industrial operations. However, we must thoroughly test and validate these patches before deployment to avoid potential disruptions and ensure the continued safety and reliability of critical infrastructure

This incident should not impede progress in vulnerability management; but it does emphasize the need to take extra measures in OT.

Considerations for Patching and Updating in OT

When you discover vulnerable software versions in your environment (taking an OT-specific approach for this aspect as well), there are a few things to understand before addressing them:

Severity of the Vulnerability in Your Context:

  • Severity of the Vulnerability in Your Context:
    • Vulnerabilities are disclosed with a CVSS rating, providing an initial understanding. However, prioritizing that vulnerability and defininghow to address it greatly depends on the role of the device or system in your operational environment.
    • Is the vulnerable system playing a critical role and therefore relevant to your operations?
    • Is it involved in producing electrons or hydrocarbons, where exploiting that vulnerability would have high consequences?
    • Or is the vulnerability in a less critical system, like yourHVAC or break room?
  • Stakes of Modifying the System:
    • What are the risks of modifying that system?
    • Could a bad patch or software update disrupt that system and operations?
    • Are there other mitigation strategies to consider?

Patching isn’t easy in OT, that’s why we need to understand how to prioritize and consider all mitigation options.

While evaluating the patching strategy, we can also consider compensating controls, which may be the main course of action if the system cannot be patched.

This could involve:

  • Enhancing network segmentation and access controls.
  • Deploying intrusion detection systems (IDS) to help limit potential exposure and reduce the risk of exploitation.
  • Monitoring systems and threat intelligence feeds continuously.
  • Implementing virtual patching through firewalls and application-level security measures to provide some protection.

Testing & Validating Updates and Patches:

Patching in OT can be resource-intensive and time-consuming but necessary to ensure patches are safe for these complex ICS environments where safety and operational uptime are of utmost importance. First you need to have (or build) a replica environment that mirrors the production settings. Patches are then deployed in this test environment, where they go through functional, security, performance, and integration tests to ensure they do not disrupt operations in the actual environment.

We have to test various variables to ensure systems will remain stable over time, including the effects on integrations with other systems, components, and software. Changes to environments can "break" the system. This is evident in the CrowdStrike example, where updates involving the kernel are especially sensitive, and a bad update can brick the system. By thoroughly testing in a controlled environment, we can encounter issues like the blue screen of death in a test setting rather than in the actual operation.

During all this, document everything in detail. This documentation is reviewed before approving the patch for deployment. The deployment is then carried out in stages, often starting with a pilot and gradually rolling out the patch.

Key Takeaways:

  1. Test, Test, Test:
    • Don’t automatically accept software updates in critical environments. Thorough testing is crucial to prevent disruptions.
  2. Balance Patching and Uptime:
    • Striking a balance between essential patching tasks and maintaining operational uptime in ICS environments requires strategic planning, effective communication, and leveraging technology.
    • Key measures to reduce disruptions during the patching process include:some text
      • Establishing a comprehensive patch management policy agreed upon by all teams within the company.
      • Scheduling patching during predefined maintenance windows and informing all relevant teams and stakeholders well in advance.
      • Implementing a staged rollout and testing patches in a controlled environment to identify and mitigate potential issues before deployment.
  3. Backup and Redundancy:
    • Ensuring critical systems have redundant backups and performing full system backups before patching are essential for continuity.
    • These measures allow operations to continue even if a system needs to be taken offline for patching and ensure systems can be quickly restored if any issues arise.
  4. Risk-Based Patch Prioritization:
    • This approach focuses on applying patches that address the most severe vulnerabilities and pose the highest risk to the system first.
    • Consider threat intelligence to understand which vulnerabilities are being actively exploited and assess the context and purpose of each asset within the operational environment.
    • By understanding the role and importance of each asset, organizations can prioritize patches for systems that are critical to maintaining operational continuity and safety.
  5. Effective Coordination Among Teams:
    • Establish clear roles and responsibilities and form a cross-functional patch management team responsible for overseeing the process.
    • Ensure collaboration among departments and develop a communication plan with regular meetings to discuss patch management activities, progress, and potential issues.
    • Regular communication keeps all stakeholders informed and allows for the timely resolution of any challenges that arise.

It will be a challenging weekend as people hope to get on their flights, complete bank transactions, and return to their daily lives. IT and security teams will be working hard to restore and recover systems. This incident serves as a learning opportunity for the OT security industry to continue maturing their security and IT/OT programs through collaboration across teams, partners, and vendors.

For more resources on managing systems in OT environments, read our brief below.