There has been a lot of talk about cyber hygiene vs. cyber risk lately, which in my opinion, has caused some confusion. This debate may actually be doing more harm than good, because it’s persisting the decision paralysis that the same people making this debate are claiming to prevent. However, the truth is one cannot exist without the other, because they are two sides of the same coin. A risk-based approach to cybersecurity without basic cyber hygiene is a fool’s errand, leading to short-term prioritizations and a game of musical chairs where you hope you aren’t the one left standing when the music stops. Hygiene without risk is a recipe for a 3,000-page cybersecurity policy leading to inefficiency, leading to resentment, leading to failure.
I am a cybersecurity and risk professional. That’s what I’ve been trained in for 20+ years. After starting out in the data center of an investor-owned utility and spending four years as an Information System Auditor, including running a NERC CIP program, I’ve spent the last 10 years further refining those cybersecurity and risk management skills in various industries and positions with companies in the Fortune 40, higher ed, and utility markets. For many years, I carried a Certified Information Systems Auditor designation, until I grew tired of paying the renewals. All of this is to say, I have seen the “best and brightest” in cybersecurity on both the IT and OT sides, and developed my career in probably the most dynamic period of cybersecurity and risk management for both.
The biggest lesson I’ve learned in that time? Humans are terrible at risk, and nothing will destroy a cybersecurity initiative faster than a lack of risk reasoning.
We, humans, use risk to defend all sorts of work and cost avoidance. We are also terrible about understanding what probabilities mean. To quote one of the great risk professionals, Peter Sandman, Risk = Hazard + Outrage. Take the Florida water event (hazard). The initial risk vector that the industry focused on (outrage) was the failure of remote access, despite it only covering one vector of control failure. There was no lack of articles on how anomaly detection would or wouldn’t have helped and this is why you need this remote access product or this other one. In reality, when I talk to control professionals, they are equally if not more horrified by the possible lack of input validation in the HMI design or set point controls in the PLCs themselves. Why? Because remote access is one control point. It stops one type of threat. Input validation and set points are multi-variate risk reducers, but that’s not the obvious outage point. You need to know more to get there.
I’ve also experienced many challenges to audits with some ridiculous claims, like “why do we want to do a Storage Area Network audit that’s all behind those new firewalls?” Me: “Well, because all your critical business and customer data is on those arrays, and all it takes is one missed password change to open you up to a total loss of data that will take weeks, not days, to restore. By the way, one array just had a main board replaced, and the password wasn’t updated from default.” These comments came from very highly respected professionals, ones I still respect today, because I can’t fault humans for being bad at risk.
Zaila Foster, a specialist in Human Factors Engineering (and also our UX Designer), said it best, “When humans are forced to make decisions without basic guidelines, they tend to rely on intuition, but relying on intuition can be very personal. Given that people cannot mitigate cybersecurity risks with a one-size-fits-all approach, relying on personal experience to make decisions about cybersecurity risks is a recipe for disaster.”
Humans need guardrails. It’s why we have SOX, HIPAA, PCI, GDPR and NERC CIP. We need to be forced to do the mundane basics. If you want to be horrified as a risk professional, take a contract helping a non-public company go public for the first time. The lack of basic controls will shock you. This is essentially where most ICS systems are today. No debate. They haven’t had the benefit of 30 years of forced regulation and contractual obligations that IT has had. We have no ITIL, no OWASP. We have no basic, consistently enforced cybersecurity fundamentals in our systems.
On the flip side, hygiene or compliance without risk is a doomed process. You can read about the dance that has been NERC CIP here, but that’s just one such story. Let’s take on the patching issue that has been surfacing in various ICS security blogs lately. The argument is that OT operators are blindly applying every cybersecurity patch as soon as it comes out. First, I’m going to scoff at that claim, but if true, I’d ask why? I know of no, and I mean no, standard that requires application of each and every patch. NERC CIP, the most hygiene and least risk-oriented framework ever, doesn’t even require that. What it requires is that you be aware of and evaluate patches for security risks, and based on your own assessment of that risk, either apply it or develop a mitigation plan.
Now, where this went wrong, and what may be leading some to adopt extreme points of view, isn’t in the spirit of the requirement, it’s in the application of the requirement. What auditors ended up asking for in the early days of this was a detailed, individualized risk mitigation plan for any patch not applied within 70 days of becoming known to the entity, with no weight given to the severity of the vulnerability, no default reliance on the controls required by NERC CIP itself, nothing. This is absurdity. I don’t blame the auditors; they are interpreting the plain language of the standard as required by charter.
However, the cottage industry that has popped up around mitigation plans for patching, or the feeling of the need to patch at all costs is what has created a real problem. Formal mitigation plans should be reserved for those vulnerabilities so severe that they require that special attention. If only we had a way to come up with a generally agreed upon way to assess vulnerability risks… <sarcasm much intended> All others should be allowed to rely on their inherent controls provided by the framework itself, namely strong segmentation, access controls and the like, all of which are regularly audited. Let’s remember this last point.
Checklist, hygiene, compliance applied blindly and without risk is a fallacy. It leads to poor decision making and is not in the long-term interest of the business, its customers or its owners. Simple as that.
Problems arise when either one of these elements comes out of balance with the other. Without the structure and rigor that comes from a mature foundation of cybersecurity controls, it will lead to over confidence in the operation of the underlying controls. Without regular monitoring and supervision that comes from “cyber hygiene”, you will fail at your risk assessment. By creating onerous control requirements not bounded in risk, you will create failures in the very structure and discipline you seek to achieve, because resources are not infinite, and people will be people.
And lastly, for those hoping that AI/ML will save them, remember, the data these systems are ingesting must be accurate. That accurate data is generally going to come from sound cybersecurity hygiene control structures. Without good data, your AI/ML will likely end up yet another racist chat bot.