Skip to main content

Software Patching Lessons Learned - Video Text Version

Below is the text version for the Software Patching Lessons Learned - An Interesting Perspective video.

Erfan Ibrahim Good morning. This is Erfan Ibrahim of the National Renewable Energy Lab. And I'm coming to you live from Golden, Colorado. I am pleased to have Michele Wright of FoxGuard Solutions today, who will be presenting on some innovative approaches to software patch management. We have had a lot of presentations on networking and intrusion detection, and also on firewalling authentication, a variety of things. But this is an area that is really important, because in the NIST framework, when you look at the five verbs, identify, and then protect, monitor, respond and recover, this is all about identify and protect.

So if you don't have good tools by which you know the version of software that is running on various systems in your enterprise, how would you know what vulnerabilities you're facing? So it's really important to have, first of all, the ability to manage the versions of software patches that are on your various systems. Then, as new patches come out that mitigate the risks from certain vulnerabilities, you have to have the ability to deploy them in a scalable way, so that you stay up to speed with the rate at which these patches are coming out. And not leaving vulnerabilities on critical systems that hackers, through their tribal knowledge, know about and can exploit.

So today we're going to hear from Michele Wright about an innovative approach to doing software patch management, and then we can get into a Q&A following that. So Michele, the floor is yours.

Michele Wright: Thank you very much, I appreciate it. So, my name is Michele Wright. I'm a product manager at FoxGuard Solutions. And I'm excited to be able to spend some time with you today, teaching you a little bit about some of the patching lessons learned at FoxGuard Solutions. I actually have a very interesting perspective on this. We've actually been in business for 36 years, and we've been building industrial HMIs primarily for energy for over 20 years. And we've been patching those same systems for the last 11 years. So when we've built these programs, we have that perspective of how to create a patching program, validate those patches, create the baselines, send out, deploy the patches, through the eyes of an OEM, several different OEMs for many years.

Then also in 2013, we were awarded a cooperative agreement through the Department of Energy Cybersecurity for Energy Delivery Systems program or the CEDS program, as I'm sure most of you are aware of. So in that, we were taking variations of some of the same solutions that we were doing for OEM and building them directly for end user applications. So I'm going to talk to you a little bit more in detail about our CEDS project, and then take the output of that work and the work that we've been doing for the last 11 plus years and tell you a little bit more about what we've learned and what we've seen along the way.

So, just one other point to understand is, as I've just said, we still do industrial computing. We also do cybersecurity with both a compliance focus as well as a security focus. So just a little bit about us in case you've never heard of FoxGuard Solutions. We've been around, and we've been around in this industry. But because the nature of the way we've been doing business for many years, we're not always a known name in the industry.

So with this, I want to tell you a little about our CEDS project. We are in the final year of this. Our mission was to simplify patch management for energy delivery industrial control systems. This is a security based product, well in advance of the NERC changes to patch management, and the scope changes that came out with version 5, version 6. And however they tend to line up quite nicely. So I'm not going to read all of these bullet points, but I just want you to be able to understand that it's a well integrated solution that includes creating a data aggregator and web portal.

So what we do is essentially aggregate hashes from a variety of sources, and put them into a single location, and then when available we also provide the authentication of hashes when provided by the vendor. We also are in the process of writing a comprehensive validation training program. As I mentioned before, we've been patching, providing validated patching services for OEM equipment for many, many years.

And so we actually have a validation lab. It's, if you're ever in southwest Virginia and want to stop by and take a look at our lab, we love showing it off. And it's very interesting to see the work that we do there. But as a result of that, while we understand validating for OEM solutions and those products is very different from validating with end users, when we come up with this project for the CEDS project, we wanted to include lessons learned from that as well. So, that's a separate program, a separate presentation that we can make, that I won't really be covering too terribly much today. But it is something that is part of this DOE project.

And last but certainly not least, we have a query engine that will allow us to integrate back into the patch data. Again, I'm not going to belabor the point of this too terribly much. Just understand that we have a very thorough integrated solution, that as you're able to take, query your devices, we are working with a partner, a project partner. TDI is using their console work application which is a remote access authentication tool. So it's able to get the baseline of the information on the devices, send us the current patch status.

Through that we're able to then analyze the assets. We have a process for anonymizing the asset list from each customer into our aggregator, so that we don't have knowledge inside of the Internet of who has what applications. And that consolidated list of assets, we understand the criticality of that. But we're still able to access our aggregated portal, represented there in the middle slightly to the left. We get that information. With this process we're also able to integrate in external forces, such as the national vulnerability database.

We're using CDE and CBSN information in conjunction with the patches, so that you can then take that information and understand the criticality is rated through that source. We also can integrate with other sources, such as RHEL and CentOS, those sorts of things. With that it goes into this public portal where we're able to provide again that hash information when provided by vendors. We can also provide layers of documentation, for patch evidence as well as end of support documentation.

That information gets aggregated and then sent back to the console works application. From there, the users can go and download the patches, follow our validation methodology as we recommend through this training program that I just referenced. And then you either mitigate or deploy the patch back into your environment, and then the process starts all over. However with that, with the information that we receive from the file of the baseline, we're able to also identify the patch gap. And so we're able to determine, based off of the information being provided to us, the current patch level. And then we can provide the patches for any subsequent gaps in that.

And again, all of that is part of our DOE funding. Some of the lessons learned are not necessarily a direct results of that. But because this is coming through a lab and it's a presentation to likeminded folks who understand such projects and the nature of the work that we're doing, I thought it might be an interesting perspective for you guys to understand again in addition to the OEM work. We're doing work with the DOE's project research is willing to work directly with end users, utilities in this case, to talk about patching.

So with that, let's talk a little bit about patching. We joke, how hard, you know, famous last words. Patching. How hard can it be? And we counter with, it's harder than you think. And as a result of that, every day we have this running joke, you know, famous last words. How hard can this be. And we're like patching is hard, yeah. So it's one of those things where we get it. And as a result of the federal funding that we're getting on this program, we intend to go out and teach the lessons learned, and help the industry be better and smarter as a result of the work that we've done here.

So as we get into this, this is a level set. This is sort of a no brainer for most people. But when I first started working here and started reaching out to vendors, I was a bit taken aback at the discrepancy and how different vendors referred to patches, and how some even gave off the presence of being a little offended if I used the word patch. So I am going to use the word patching throughout this process, and just want you all to understand that I am representing any change or impact on a system. And in most cases we're referencing security patches. But not always.

And so I wanted to just again level set, that when I say patches, I mean update, upgrade, firmware, security bulletins, anything that any vendor might creatively call these changes to their system, that's what we're talking about in terms of this. Also, what are we talking about, what needs to be patched and I mean everything, for some it might be more apt to think of it as your best side or assets. But we are talking about not just IT information. We're also talking about OT assets. So all of the equipment that's in the field, that's in the substations, that's operating the plants, those are the devices that we're covering. As well as things like Microsoft, OS's, those sorts of products. So again, level setting here to make sure everybody understands. When we're talking about patching again, it's simplifying patch management for energy deliver, industrial control systems. So that's what we're attempting to cover here with this program.

So, why patching? For some, it's obvious. It's, you know, this market is obviously has a high risk. Things like those types of opportunities out there to make us realize how critical this is, that really is something that needs to be honed in. I think it's also important to understand as it's shown by the data here on the screen, in 2015 there were 189 known vulnerabilities, of which 26 had exploits available. But 170 of them had patches. So in many patches, not always, but in many cases there are solutions out there to help reduce your risk and mitigate this vulnerability as much as possible. And again, why is patch management important? As is all the other security reasons or morality reasons around patching, there are implications in terms of the regulatory agencies that apply specifically to electricity. But, and I understand the audience might be bigger than that, but there are going to be some references to NERCs, to level set that difference between compliance and security.

So with that, I do want to spend a few minutes understanding that. It's the age old debate, people like to say, we like to have this water cooler conversation around our office, in terms of, you have security and you should do the right thing and you should patch because security protects you and secures your system. However, at the end of the day, compliance is what is going to motivate an industry or an entity to be motivated to do something. And so with that, compliance standards are pretty specific.

I'm not going to read these slides to you. But just to level set that there are four different components to patch management in the NERC standard. The first is having a program showing proof that you're doing your due diligence to have a solid, thorough patch management process in place. You have to check all of your items, all of your best cyber assets every 35 days. You have to evaluate those for security patches and be able to keep a regular proof that you've been checking with these 35 days in mind. From there, you have to show that the subsequent 35 days after you discover a patch, you have to show that you've either applied it or created a mitigation plan. And last but certainly not least, you have to have evidence that you've actually followed up on it, implemented it, based on the mitigation plan that you put in place.

So again, fairly thorough, fairly prescriptive, fairly specific about the nature of what's in scope, in terms of patch management for compliance. However, when we flip the coin and start to look at security, there are other further reasons why patch management is a good idea. Obviously, installing your patches, you're mitigating yourself from your risks, in terms of the vulnerabilities that are out there with security. If you have a good patching program your ability to have increased reliability with your services may occur. Air gaps is certainly not enough. It's not always applicable or is not always an option. Sometimes it might be able to just unplug it and you're done. And some equipment in some circumstances, that's just not an option.

If we read back through the compliance standard, it's very specific that it only applies to security patches. And while that's obviously an important place to put your focus on, many times we will see that even non-security patches, or feature upgrades if you will, may provide some layer of functionality that might, two releases from now, lead to some type of security feature. For instance, a software provider might put some type of mechanism in that could ultimately lead you to being able to have two-factor authentication for your login, or more of a security requirements around your passwords. Whatever it might be. So it might be dressed up and look like something that's non-security, but may ultimately provide security related enhancements. So you can't just keep your eye on security only or you might miss something important.

In terms of security mindedness as well, you have that ability to have a real time application of patches. So if you're monitoring on a more regular basis, you might be aware of something sooner than if you're just looking at a clock and doing it on a rhythmic basis or even just outside of the standard, you might not even be following any sort of calendar. Or it might be every six months or every year, we see those examples as well from time to time.

Also, Zero Day is a real thing. We know that, we've seen that a lot. Forever Day, however, is also a real thing. And one of the things that we see, you know, you have a security hole that stays unpatched. And over time, that's unfixed and that just compounds with time. And in many cases, those are those legacy applications which tend to over time inherit more risk as a result of the fact that they've remained unfixed for so long.

So hopefully that's some insight into the differences between compliance versus security. In my also humble opinion, I think that compliance, while it can sometimes be demonized just a little bit, does tend to force the conversation, if you will, in terms of setting up processes and procedures and protocols around having a patching program. And many industries aren't quite under the scrutiny or regulatory requirements that energy is. Electricity specifically. But it does, so you can debate that and have conversations on that. But in my mind it creates a more secure environment at least, because there's at least some set of usually agreed upon standards that the industry upholds. So there's that again. Water cooler conversations for up around here.

So with this, I want to jump into some lessons learned. I actually have ten lessons learned here, and want to talk through each of these. The first is understanding the difference between IT and OT. That's very critical. In many cases, you know, if something goes wrong with your computer and you call up the IT help desk, they might say reboot your laptop and see if that fixes it. And sometimes that does work, right? It's this little magic thing that happens sometimes. But obviously OT devices aren't that way, and you can't do that, and you shouldn't treat them the same and you shouldn't lump them together and think that the rules that apply to one apply to the other. And so, in doing patch management, you need to make sure that you're working with staff that understands the differences, and manages and meets those of that equipment where it needs to be.

Also, not all vendors currently report tax status every month. In a former life, I managed a software application, and we didn't call up our customers and say hey, that release we didn't do this month, yeah, we didn't do that. And in a sense, that's what the compliance standard for electricity, you're saying you have to provide due diligence and check patch status on every item every 35 days. And to do that, you have to know that there is no patch. In some ways this is as important as knowing that there is a patch, because you have to document that.

And so again, as the compliance standard has evolved and changed, the vendors are slowly working on getting caught up with that and being able to provide regular notification. We certainly see this more in the OT spaces. IT has had evolved programs for many years and there are many applications out there that can facilitate regular IT patch management. But when thinking of the OT environment, it's a very different story. And it's a constantly changing, evolving story, in particular as it relates to the impact of the compliance standards.

Lesson number two. Understanding the differences between public and private patches. So in our research, we've discovered that about half of the energy delivery systems have what we call private patches. And by private patches, I mean that there's some type of pay wall between us and that information. It might be something as simple as logon credentials to a customer portal. It might be a requirement for a support contract. It might just be that they don't publish them regularly, or the vendor doesn't have a mechanism for communicating that system. You have to go and call them. Or you might have to figure out that, they have a newsletter so you subscribe to that.

So any time that there's something where you have to have credentials or some layer of recognition of that vendor relationship to get that information, that has to be tracked, so that anyone that might be helping with that patch management for those pieces of equipment has the ability to access and know where to go and how to go to get that. So it's an important distinction to be aware of.

Lesson number three. Patch analysis accuracy is difficult. And this is essentially the quality of what you're finding and how you record those patches. So, what's the process? How do you know where to go? If you have a facility of any size at all, you're going to have lots of different sources and lots of different places. And you're going to have to know which are public and which are private. And you have to be able to keep track of that. And so you have to have a way to be able to transfer that knowledge from team member to team member in order to be able to know how to go get what you need to know.

Also, inside a vendor might have one way of doing it. And they might change it. They might change around their website. They might create a customer portal when they previously didn't have that. And so we see that that process, once you even document it, it's been known to change a lot.

Also, some products can be very intricate and time-consuming to find. I am not trying to call out any single vendor throughout this process, but I feel like the specific examples are going to be helpful. So I mean no offense of pointing fingers to any vendor that I use as examples in this process. But I think that it gives us some talking points. So for instance, Cisco, when they released their security bugs or caveat IDs as they call them, in February of this year, they had over 200. So in one month, there were basically 200 separate security vulnerabilities that they were resolving. And in spite of those, they have the security ratings are put somewhere else within the release notes. And they have the CVEs in a separate place. And so you have to have the know-how to be able to know where to go, what to find, how to document, and have a process in place in order to do that.

So what I wanted to show you is, I'm going to tell you ahead of time, I know this is blurry and there's a lot and it's hard to read. And I'm not essentially trying to give away the keys to the castle of how to facilitate this specific item. But this is an HP Intel graphics driver. And what this is, is a screen shot of the four pages of instructions that our team has written in order to mine and document this one thing. So, as you can imagine, they're not all this long. There are some that are longer. But just as an example of what's required to go and get something, and how to be able to again have that transfer of knowledge, the written instruction has to be completed and then maintained on a regular basis, so that at any given time someone will be able to jump in and help facilitate that process.

All right. Lesson number four. Asset analysis is complicated. This is essentially knowing what you have, and keeping up with the changes. First of all, it's an ongoing effort, because what you have in your facilities is ever changing. And even that, you have to understand that once you apply a patch, therein the status is changed. And so having to keep track of that is an ongoing effort, and that's certainly part of the process. Sub-components, you can talk about sub-components for an hour.

But understand that many different, for instance, let's say HMI or SCADA devices might have numerous different software packages installed on them. And you have to be aware, are those patches being provided to you from the product vendor or from the patch vendor? So for instance, if you have Microsoft installed, for that particular device do you want to install it when Microsoft releases it? Or do you want to wait for the vendor that has the firmware that has tested it, validated it, aggregated it, and then sent it out to you? There are big implications. And also understanding the implications of the different software packages installed and the implications on each other, that can have a big impact on when and where and how you patch something.

Vendor ownership, again, we see where a product was built by a particular vendor twenty years ago. It might be still a very good, viable piece of equipment in the facility, but the ownership of support of that product may have been bought and sold three or four times in the last 20 years, if not more. And you have to be able to track it down to make sure that you've gotten to the end of the line. So before you say, this doesn't get patches anymore, are you sure of that? Because again if it has been bought and sold one more time that you couldn't track down, you might be missing out on mitigating any vulnerabilities that might come out from that.

And that's very similar to like the support provider again. That may change. This equipment gets bought and sold, and supportability of that changes, so there's a lot to keep up with there.

Obtaining sufficient information in order to patch properly. Again, you have to know what you have. And, you have to be able to fully understand the patch level of what you have. Because again, if you've got the 32 bit documented when it really is 64 bit let's say, you might install the wrong patch and have subsequent issues.

We've seen where utilities have hired IT contractors to come in and help create their baseline. And that can be helpful. But if they're not aware of needing to capture that, in terms of patching, a different approach, they might night be properly articulating it. So for instance, we have a utility that sent us an asset list that they had paid to have someone help create for them, and they defined the assets in terms of functionality. So, instead of what the vendor product version, model number, serial number, those sorts of things, so that when it came down to it, their list that they had paid someone money to create, so the baseline wasn't applicable in terms of patching. Because again, there's certain levels of information that matter when trying to go find what's the next available patch for a piece of equipment.

Aggregate lists also may not be sufficient. So for instance, we see where you might have five instances of a particular piece of equipment. And you go okay, well, that's one. That's really only one thing, even though I have five of them. It can appear to be that. But some particular products, for instance there's a popular relay vendor that the make and model don't change. From when they create the product, they never change the make and model. But they provide patches based off of ranges of serial numbers. And serial number information isn't always queryable from those devices. So if you're using some sort of automated system to glean this information, you may not have that. So again, you might have five instances of that same relay. But if you have three that are within one serial number range and two in another, that's really two different sets of patch mining that need to take place on that equipment.

And also the other big implication of that could be patch status. So you might have a policy in place that says okay, we've got five instances of the same thing and we should be installing patches at the same rate. But you need to verify that. You need to make sure. Because again if you assume that the patch status is the same and then you go in and solve something, it could be incorrect in terms of where that really needs to be. So again, aggregate lists can help streamline and provide efficiency gains in what you're monitoring and how you're monitoring it. But again, it's not always the bulletproof solution in terms of making sure you've got the right information identified.

So this next particular slide, I'm not going to belabor the point of this. And there are two more follow-up screenshots that go with this, that I opted not to show you. But just to, basically enough, just to understand that inside of a particular product, inside of a particular family of products, there are various release schedules. There are various tendencies, there are various integrations in how you can go from one level to the next, even inside of minor versions of a major version of this particular product. And they go on to, this particular product not only has these different timelines for patching schedules for when something is supported, but it also changes based off of the hardware that this particular software is on.

So you might say okay, I've got this and I think I know what's going. But in one instance of this, it seemed to go through 2019 on one area, and the other one it was going to be end of life in April of this year. So again, it just matter where it was applied. So these are the kinds of things of complexities that our engineering team has [inaudible] with trying to research and track down. But again, it goes back to understanding what you have, has great implications not only just at the patching level but the impact of that application and where it's installed and how it's being used.

So, again, this could become a rabbit hole, but understand if you're interested in understanding a little bit more about how complicated things can be, go research this particular product and you'll be able to see all kinds of interesting complexities there.

Okay, lesson number five. Security versus non-security. Not only is it good to know which patches that come out are security in nature, but as we saw earlier in the presentation, the regulatory standards mandate that the security delineation is what's of note and is what has to be tracked. But again, the other thing to keep in mind is that not all vendors, particular OT vendors, don't necessarily provide security ratings. And so if you don't have that, you have to have the resources and the expertise to know how to determine that. And so you've got to have the ability to go in and stud the release notes and make recommendations based off what you're seeing.

For some we've seen the CVE information and the subsequent CVSS scores are helpful. That's a very good gauge of criticality and implication of that particular vulnerability in your environment. But not everybody, not all vendors report in to the national vulnerability database or US or any of those other aggregators of that vulnerability information. And not only that, but we've also seen that where some vulnerabilities are reported, but the subsequent patch isn't reported. And so you might have something that might appear if you've only got your eyes looking in one direction, that might be something where you might miss that something actually has been addressed. Because if you're only looking there, you have to be able to keep your eyes on several different sources of information to make sure you're getting the latest and greatest of what needs to be managed.

Lesson number six. A patch is not a patch is not a patch. And this is just very interesting, because there's several different types of patches. The first one on the list here is cumulative. This is basically where an installation replaces all previous installations. This is actually the model that Microsoft went to back in in October of 2016. We can have a sidebar conversation on that as well. But there are implications of the cumulative patches because of the implications that something gone wrong, or being able to isolate one specific piece of that that may or may not have been problematic in your installation of that, but certainly it seems to be easier in terms of this one patch and you understand the implications of that.

Independent patches. These are, in my opinion, probably one of the hardest ones to keep track of. Because you might have five patches out. And maybe you install one, three and five. But if something comes out and you need to be aware of two or four, you don't necessarily know that it was installed. And so you can pick and choose a little bit more freely, but keeping track of what's been installed might be a different story.

Last but certainly not least is the primary independence. And so these have relationships on each other. And so you have to be able to make sure that if you have a primary patch, and you have the dependencies, have you installed all of the dependencies that lead up to that to make sure that you've installed them in the right order and what's applicable to your environment. So again, you have to be able to keep track of this for the sake of understanding your particular patch status.

The other thing to understand is, these patches, we've seen them changed, re-released, retracted. And again you have to understand what you have, so if you go back and get notification of something, you have to backtrack to figure out if something was retracted, let's say did you actually install that. So something to keep track of.

And certainly this backdating patches is possible. I did the same presentation a few weeks ago, and this particular topic was a 30-minute distraction. And I know it's just me talking, so I anticipate some questions on this in the Q&A session. But this is very interesting, so let me show you it. So this particular patch or set of three patches, you'll notice were dated October 24th. We had three patches. And this particular screenshot showing this evidence was documented on December the 2nd. So in December, there were three patches for October 24th. Three weeks later, on December 28th, the same product, there's not waving of wands or anything here, it's the exact same product, they're showing dated October 24th, four patches.

So if you hadn't been keeping track of this, if you hadn't been astute in paying attention, if you thought that you had kept up with things, you could have very likely missed a critical patch that was the fourth new patch that was released in the past. And honestly, this is more common than any of us would like to believe. So in terms of security, you could have missed something really important. In terms of compliance, I'm honestly not even sure. I'm hoping to track down an auditor one of these days to say what are the implications of this specific scenario on an audit. Because in true letter of the law, it would seem as though you're out of compliance, because you missed something even though it wasn't shown until later.

So lesson six bleeds into lesson seven, which is the, maintaining evidence. And so when you have this kind of documentation, if you're keeping screen shots of what you're doing and keeping track of that, if you were to get audited you would see it. So from a compliance perspective, you would have verification or proof of what you've done. But from a security perspective, you would be able to go back and see hey, there was something added after we checked earlier in the month. And so that allows you to be able to keep track of that. Also, if something, a product is no longer being supported in terms of compliance if nothing else, you need to have audit-ready documentation to show that. So that in the event that an auditor points at a particular device and says where's your patching due diligence on this particular item, you have that documentation ready to go that says here's where we've gotten documentation and proof that this particular product is no longer supported.

And as you can imagine this particular process adds more time to the process. It's a separate step. You have to have tools in place to be able to catalog it and archive it properly. You also have to have the right environment, in IT terms of having the capacity to store the documentation, all those pictures, every month for every single thing. You have to have server storage in order to be able to get that and be able to access it easily in an organized fashion down the road, in the event that you are needing to pull up or invoke that previous documentation. And it certainly has implications on your audit trail as well. So that's, all of this documentation and evidence, it's an important part of it. But certainly it can be very time consuming.

Lesson number eight. Again, if we, I'm bouncing back and forth between security and compliance to help build the case that it's not just one or the other. Neither has a good guy and neither has a bad guy here. But when we saw separately the NERC standard says that you have to check everything every 35 days. And I always get heartburn when I hear of utilities that are keeping track of this in a spreadsheet. It makes my head hurt, because I don't know how you can keep track of it. Because there are some that might tell you every two weeks that something's released. Microsoft is like clockwork, and you're going to have patch Tuesday and you know when it's coming out, unless it happens to be last month when they didn't have anything.

So it's all over the place, and then heaven forbid that you have to keep track of the ones that don't proactively tell you. They're not releasing patches every month. They're not going to tell you they didn't release anything. So you have to keep track of who you've contacted and the verification of the fact that something was in fact released or not released in a given period of time to keep yourself in compliance with the regulatory standards.

All right, guys, thanks for hanging in there with me. We've got two more lessons left. To me, this is almost a kind of a [inaudible] slide if you've been listening or paying attention up to this point. It is, I mean, to say that it's time and resource intensive is obvious. Because you've got to keep track of your vendors and who to call and how do you get it and where do you provide it, and how are you keeping track of your timeline. Have you contacted every source every 35 days and are you documenting the process along the way? Again, this is starting to wrap up all the previous slides together and take it in terms of it's not just the documentation, but it's the time and resources that it takes to do this.

And I think that resources even, it's the aptitude of the resources. I was talking with the utility a few weeks ago, and they said that for much of this, a level 1 engineer can't do it. So they're having to use much higher level engineers that have the aptitude to be able to make the right discernments around this information, and maybe someone who's a little more entry level might be able to go do that. But we think about this and we think, you're using higher level resources and distracting them from being able to do their day job because they're having to go and analyze patching information for the sake of your security and/or your compliance program.

Lastly, not least, I would be remiss if I didn't mention validation. As I said before, our DOE project had a stated project element specifically to create a validation training program. And again, that is something that we offer, as part of our CEDS project. So if you have questions or want to learn a little bit more about that, we certainly can help with that. That's a program that we're still building. But when you think about validation, it's a whole, it's another realm of complexities to keep track of. You have to know what to test, what level of detail. There's all sorts of different ways of getting this information.

So what's applicable, how much is too much? How much is just the right amount? How much covers what it is that you really need to see? How much is it going to make you say, this was safe to release? The implication of this on my other applications worked. Where is that level and you have to have the right people with the right training to be able to discern this, to make these assessments. And I believe that a testing mind is different than an operational mind. Some people can have both. But not everybody does. And so you have to make sure that the person who is implementing this validation program at your facility is one that is able to handle and make the right decisions around what needs to be tested and how we're going to do this.

You also need to be able to make decisions around the equipment. And you have to have the right systems in place. When is it production and when is it lab? When is it physical? When is it a virtual environment? All of those things need to be figured out. And then also the test equipment. You have to have different equipment to test some of these things. You have to be able to calibrate. You have to be able to get metrics. You have to be able to record actions of things for the sake of documenting what you're doing with your validation program. And have the ability to keep track of that, and understand what you need to be able to complete this validation. Again, that equipment's going to be different than the equipment itself. And so you need to have the resources and the know-how in order to set that up.

Again, I have to resist, I'm not going to belabor the point. But validation, there are specifics around having a validation program for the NERC standards. So again, furthering the compliance requirements around having these validation programs in place is important.

So, why should you care? Again, the risks are great. First is safety. An improperly patched item, you have great potential of bricking a device, taking it down, knocking it over, whatever you want to say. Also, if you don't do it right, you have a false sense of security. So you think you're doing due diligence, you think you're doing the right thing. But if you've got one of these elements that we're talking about out of alignment, you just may not. So you've got to be able to have that sense of knowing that you've got things set up in terms of safety.

Temporal vulnerability with a missed patch. So again, if you missed something, it's cause and effect here but the longer you go without patching, the greater your risk or your vulnerability. So you certainly are not creating a safe environment in terms of patch management.

Reliability. You know reliability is king with energy, and you want to make sure that if you don't do something right again, if you brick something or you knock it over, you have the potential to have an impact on the grid which is obviously not something that any of us are seeking.

Efficiency, again, staying focus on the job at hand. Having the ability to have someone else handle patching. Whatever source that may be. But it's distracting and detracting from the work at hand. And so, and then certainly the risks involved in terms of compliance. Fine risk of risks, the fines for non-compliance can be great. And so that's certainly something that we would all want to avoid.

So just to wrap things up here. Understanding the lessons here. So, they, the difference between IT and OT. Obviously you can't treat them the same. The difference is very critical. Public and private, making sure you know which ones are private and then being able to document how to go find the private patch information. Understanding the process for patch analysis, keeping track of finding the patches, recording it, is all very difficult. Knowing what you have, that asset analysis, creating that baseline can be very complicated. It's changing, ever moving. Security versus non-security. If the vendor's not telling you, what's your process for determining that something is a security, has security implications. A patch is not a patch is not a patch, right? There are different types of patches, different implications for the way that patches are presented, the way that they're installed and the way that you keep track of knowing which one. So that if things change or move, down the road, you are able to understand better what your patch status is at any given time as a result of understanding this.

Maintaining evidence is hard. It's an involving process. You have to have a lot of dedication in terms of time and resources and the ability to keep track of that. Not to mention keeping track of the calendar with all of the different resources, and all of the time showing and proving that you've been able to keep track of all these things, is certainly a timely process. And it also can be very resource intensive. And then the validation process can be very intricate.

So I think if you think through this, and I'm interested to get your feedback and your stories of patching, I can never have too many stories and understanding the real problems facing you. The industry and understanding, are you organized? Are you using these particular processes and procedures? How was your program organized? And so with that, I'd like to open it up for questions. I'm Michele Wright. My coworker and colleague Lindsey Hale had an emergency come up and was unable to join me today. But I certainly welcome your input, your feedback, and certainly am interested in getting your take on all of this.

Erfan Ibrahim Thank you Michele. We have some questions that are coming up now. But as we approach these questions, I think there were a few points that you made in your presentation that I wanted to emphasize. One of them was that all patches may not have a security label on them, but they may have security implications. That's one. And then the other one is about patches as a whole. If you think about security as a set of non-functional attributes being achieved, such as confidentiality, integrity, availability, reliability, accountability, those types of non-functional attributes, it's very important to understand that patches help not just with confidentiality and integrity of the data, but also the availability of the data. So all the points that Michele made about improving the functional aspect of your system as a result of the patch has direct security implications, because it improves the availability. Do you want to share some thought, Michele?

Michele Wright: No, I mean, I agree completely. And again, it's easy to set your eyes on a certain process or a certain rating or a certain level of criticality. Just to get the job done, just to check the box and say I did what I was supposed to do. Onwards and upwards, so to speak. But it's not always obvious. So you have to be diligent and paying attention to the details and making sure, or you could really miss something critical that might be masked as something slightly different.

Erfan Ibrahim Then the other thing I would suggest is that, think of patches as an asset also. So when you're thinking of your enterprise resource planning platform, and you have an asset management module, think of maintaining the information about patches in that asset management tool, just like you're chasing computers and servers and other devices in your enterprise. Okay, so I have a question. How do you maintain the fidelity of the patches? There are so many third parties now that are out there and they could easily create decoy patches that have back doors and put them on website that look very similar to the actual vendor websites, and proliferate their malicious code like that. So how, do you have a way of doing validation of patches before they are implemented? And is there some kind of supply chain validation that they really come from genuine sources?

Michele Wright: So the best way to combat that would be hashing. And the unfortunate thing is that not every vendor hashes. However, in the event that that is available, that's certainly the first line of defense against that, to make sure that it is in fact what it's supposed to be. And get that verification from the vendor. And the second piece of that you mentioned really is that that's where the validation piece of it comes in. It's so critical. So, that, you know, that is a mandated part of the process, to make sure that you are not only testing the patches but testing the applications, the implications of those patches on each other. Because sometimes the patch might play nicely by itself, but you put it in with something else and it can have unforeseen outputs that are not good. And so I think that's where between the hashing and the validation, those are the two real critical failsafes to help with that. Nothing is, I mean at the end of the day it all has the potential of having risks with it, and it certainly is something that we're all vulnerable to in terms of is it really what it says it is. But there are some ways to check against that.

Erfan Ibrahim Yeah. So for instance, companies like Synopsys have products that help you scan source code and executables, and look for things like poor coding techniques, malicious code callbacks, those types of things. Do you have any types of tools like that to scan patches and see if there's anything fishy in them?

Michele Wright: So not, no, we don't have any commercialized solutions available. We do, for some products. So for instance, with our OEM partners, we have a full on validation suite with over 75 test cases of things that we look for. We check the baseline, we install the patches. We check for 75 different scenarios. We run the baseline again and make sure there were no implications. So we have those validation services. But we don't necessarily right now have any kind of tools that are commercially available for end users to check on their own.

Erfan Ibrahim Right. So I would encourage you to look at Coverity from Synopsys for source code and prode code SCM for executables and also Defensics for data fuzzing testing to see vulnerabilities to data fuzzing. So those are commercially available. Okay, so now, let's, we have a couple of questions here. One is the Michael Shay from UC Berkeley Extension says it has been known that the devil is in the details. Would FoxGuard Solutions have a dictionary of their commonly used acronyms that could be made available for public consumption?

Michele Wright: So commonly used acronyms, did I understand you accurately, I'm trying to think what the, I mean this particular lesson learned is the biggest, is sort of our first step into sharing what we've learned, and you know, Michael as you pointed out, the devil is in the details. That's the whole point of this presentation. In terms of something we've published, this presentation's going to be published. But I'll think through a little bit further of what else we might be able to do, and how we might be able to put something together that we could get out to the industry.

And if you've got specific examples, you've got my phone number and my e-mail address right here on the screen. Please feel free to reach out to me. You know, we're in that [inaudible] and some of the others were that the peer review back in December, and I've met with so many labs in the last three months. It's been so much fun to finally help each other and collaborate. So I might not be thinking of something obvious on the spot here in this presentation. But certainly reach out to me and we can maybe collaborate and think of some ways that we might be able to put something together for you guys.

Erfan Ibrahim Okay, very good. So some of you have asked who the moderator is. The moderator is Erfan Ibrahim, that's me, from the National Renewable Energy Lab. And the, all the slides, presentations, is going to be made available to everyone who registered so you will have access to that. Another question Michael Shay asks. It has been known that all stakeholders in the OT side of the house have very different priorities and schedules to live with and abide by. How would the various stages of validation be worked into the to do list of the respective stakeholders with the proper priority in mind?

Michele Wright: That's a really good question. And I think where I see it, and again, I'm not an OT equipment operator, so I'll preface that by saying my opinion is pretty well useless in terms of that. But, from where I think the biggest driver is the fact that the regulations are going to set that. It sets the stage of, you have 35 days to check patch status, ensure security ratings on all of your critical assets every 35 days. And then you have 35 more days to install or mitigate, and then it also has requirements around validation.

So this has evolved, and these are all new standards with the current standards, but that's going to set the pace. And that's what's changing the environment of how to have these conversations, and what's being done is that it is really sort of a mandated pace now, and a mandated priority that will start it. And then from there, you know, if there's any operators on the call that want to share something, I'd want to hear that, your perspective as well. I think that's, I definitely think this whole process is burdensome. And I think it's underestimated how burdensome it is over time.

Erfan Ibrahim Yeah, so this is a ripe place for developing expert systems that can do discovery, that can look at the assets, scan them, look at what patches are there, understand the interactions between patches, and then make recommendations as to what patches are needed as a net. Otherwise, if you just leave it to documentation and you miss things as you mentioned in your presentation, you could be out of compliance, or you could have a vulnerability there for a long time. So I think for a lot of software vendors that are out there looking to see what they can do, what's the next thing they could build, developing expert systems like this that could be programmed for different classes of devices would be a wonderful capability to have to reduce the human dependence and make it more efficient. So are there projects like that in the market, Michele, today?

Michele Wright: I mean, there's a variety of what I'm finding, and I've learned something new every day, is there are a lot of vendors trying to help with pieces and parts of the process. What we're working on trying to do, and certainly it's bigger than what we can do, but is trying to bring the vendors together and say how can we work together. How can, for instance, we're not an asset management provider. So we work with asset management providers to provide patching information and then they might have something that needs to have some other component to it that might be workflow management, and so we work with workflow management partners. That's a burden that I'm hearing, and my communications is that there might be one part, one vendor that might help with parts of it. But to get all of us to play together, to have a more well-rounded approach to this, to have the benefits of one perspective versus the other, I dare say there's room to improve there. And so that's certainly something that we are working towards trying to improve the culture of vendors and working together and solving all of the pieces more holistically. I know that doesn't necessarily answer your specific question, but that's certainly a big part of the burden for the utilities.

Erfan Ibrahim Yeah. So I am all for the housekeeping and hygiene from workflow management and asset management. But I also believe that we need a bottoms-up approach for patch management discovery. In other words, there's some software that scans devices and knows where the latest thing is because it's connected to the Internet and can go to the vendor company, and know where the latest patches are. And checks devices against that. As opposed to just relying on procedures and management to figure out what's missing. Just like the way we have Malwarebytes today. That can scan the laptop and look for things in the registry. And then say remove these things. So that combination would be very powerful.

Michele Wright: Right. And I know that one of the big components of our project that we're working on right now is gap analysis. So that's where we're partnering with TDI to provide us that baseline so that we know what the current, what does this system look like right now. And then we go out and we can say, we know where you are. Here's what's missing. But part of where that's problematic, is that the aggregation of that information is very human-driven right now.

One of our original components of our CEDS project was to build a tool that would allow vendors to use a known standard. We were going to use the iso 9962443-2-3 as the standard to say okay, vendors, I can help us aggregate this information and send it to us, and our big problem with that is that vendors wouldn't talk to us and wouldn't cooperate. They saw us as a threat. They saw us as a competitor even though we were working with this government project, and we're not trying to take their business. We're trying to make it, we're trying to solve a problem for the utilities.

But that was an interesting part of the process that we've learned is again that collaboration and how to, to your point, that bottoms-up discovery. So you might have a toll that would be able to say, okay, I can glean your current tax status. But then to be able to identify the gap is a different story. Again, we're trying to help with the gap. But part of the problem with the gap is to be able to get the information in an efficient way. So that's part of the R&D effort that we work on, on a daily basis around here. How to help provide efficiencies in that process as well.

Erfan Ibrahim Well, I can share some empirical evidence of the importance of patch management. We have a research test bed here at NREL that has an entire distribution grid management system, it's a [inaudible] system that BG&E uses. And we have it on a routed network with Cisco switches and Cisco firewalls, and then we have intrusion detection systems here from EnDimension, Albedo, and NextDefense so PF, and we have inline blocking tools from Seclab, and from Black Ridge. And when we put this test bed to a third party [inaudible] test from BioSAT, in September of 2015, the one thing that they came back with what they said, every software that is on your system that has the ability to get patches had the latest patch on it. And we couldn't exploit any vulnerability as a result of an outdated patch.

So that was one of their findings, when they did the pen test on our test bed. The issue is that as the organization gets larger, and there are multiple business units, it's very important to coordinate your efforts across those business units and have proper and network and system hygiene so that these procedures can work, and that you don't have a chaotic situation. So it's a culture that needs to be developed. And the skill of people is very important.

I would even recommend on the Myers Briggs personality type indicator to have ISTJs working on the inside collecting and documenting information. These are introverted, sensing, thinking, judging people. And have ESTJs working with the vendors, working with the business units, to gather the information. That combination would be powerful. If you have a lot of intuitive, feeling type people, it's going to be very difficult. And if you have perceptive type people, you'll never get a patch management thing going. So the skill and the ability to quickly glean from data actionable intelligence is critical in this business. Michele, some thoughts?

Michele Wright: I took notes on that, that's a really good insight. And I think you know, that certainly is something that the right resources and who's doing the work certainly has been an evolution for us as well. Again, we initially I think underestimated how hard this was going to be. When I say at the beginning, patching, how hard it is? That's actually a jab at us internally. Because we're the ones that were like how hard can this be. And we understand it. We've been doing it for a long time. But even then, understanding who are the right people, we've made some mistakes along the way of not having the right resources doing the work, and that's evolved a lot. So I think that you're right in terms of the tolls working together and how to get the information and having the right skill sets to do the work certainly furthers the point of what we've learned along the way as well.

Erfan Ibrahim Yes, I judge a restaurant by the hygiene of its bathroom. I would also judge an enterprise by their ability to do effective patch management. It's very easy to buy toys. It's very hard to manage them. And then the lifecycle management, if you don't have patch management, and so much of your enterprise now depends on software of one kind or another, I don't know how any organization can thrive. They can survive but they can't thrive without doing this properly. So, there is one more point from Michael Shay, he says Microsoft has publicly advocated that Windows XP has been end of service. Or end of life. What if some of the OT stakeholders still have equipment running on Window XP, leading to unpatchable vulnerabilities in the field?

Michele Wright: Yeah, that's a good, that's a really good question. Because there's a lot of equipment still using XP in this industry. A lot. And I know we've looked at virtualizing some of that equipment in terms of how to help mitigate that a little bit, and so that's certainly one approach. I'm sure some of the more technical operations guys on our team could expound on it further than what I would be able to give it justice, but it certainly is something that I know we, for instance, we have tools that we've written and we're making an upgrade to it. And when we did the upgrade it broke the XP installation and we're now having to jump through hoops to make sure that we can still support that because of how much that's still being used in this industry, and still needing to be able to service that.

It's a patching solution, right? So it still needs to be able to be installed in XP because we're going to be providing patches to that system through that tool. So it's a sort of very involved process. I think you know on some level, I might be getting my specific legacy applications confused. But I think there are still a few legacy applications that Microsoft will support for a hefty fee, for some of those really old legacy applications. But for most it's not practical. So you know, it's a risk. It certainly is a risk. We've been doing research, other industries and even in this other industry, you know, something in the medical field, a big hospital in Boston has 600 servers that were still XP that were at great risk. So, I don't know. There's got to be some point a motivation to get that, get all of that equipment sunsetted. But it's easier said than done.

Erfan Ibrahim Yes. So one thing that is very important in this area, Michael you've brought up another good point. There needs to be the proper business cases made by the technical community and enterprises to the business community to make the appropriate investments in infrastructure based on the mission criticality of those systems. So, the reason why XP type systems remain is because the lifecycle of the actual application is much longer than the lifecycle of the operating system on which it is running.

So, when something like that happens, the FISO and CIO need to make the appropriate arguments to the CFO saying look, if this operating system is now extinct, and it's not getting the support from the vendor, and if this application that is running on this goes down, this is how much this will cost for every hour this thing is out. And you'll see very quickly the dollars will be released to upgrade this system.

So whenever organizations have these very, very old systems, that means someone is not tending the sheep, is not making the appropriate business cases to the other side of the fence of the organization, to make the appropriate investments. So it is very important to have that business case, and then, if there is no other option, the problem is there in the system. It cannot be replaced because it's something that only that particular OS runs, then you should consider systemic security. Use inline blocking tools. Use intrusion detections. And don't just rely on the security of that endpoint. Until it's, you can move to a more secure device.

All right, so Michael Shay likes my connection between Myers Briggs personality types and people who can do this job vey well. So I'm sure that there will be an uptick on taking that exam if they're up for patch management. But as you were talking, I thought what of the 16 personality types would be best for this from the Myers Briggs, and I really think that the ISDJ would be a great documenter, and the ESDJ would be the communicator.

Michele Wright: Yeah, I think that's a good assessment.

Erfan Ibrahim All right, Michele, any thoughts before we wrap this up?

Michele Wright: No, I appreciate the opportunity to have the dialogue. Again, please stay in touch. We're all of us together, we're all rowing in the same boat trying to make the industry better and more secure, and anything that we can do to be part of it, I am just eager to have these kinds of conversations with anybody and everybody. So please let me know. I know that this, Erfan, if you can make sure I'm on the distribution list for the recording. We're going to get you the actual slide deck itself if you want to include that as well. And make sure everybody has that ability to absorb this information. But thank you for the opportunity, and again I look forward to the continued conversation on the topic.

Erfan Ibrahim Very good. I appreciate the very detailed presentation, and I liked your format and the way you broke it up into a set of lessons and then summarized it. So your delivery was very good, and this is a very important subject and I hope that the audience really benefits from this as they take it back to their organizations and see what type of patch management activities they've got going on and if they can improve that in any way. I think the one thing that really stood out in the entire hour that you presented was this fallacy of using an Excel spreadsheet to do patch management. And I think that your empirical evidence to suggest that more sophisticated information tools should be used, and not a spreadsheet, if you want to do it in a meaningful way will resonate with our audience. So thanks again, Michele.

We are going to be having our next Webinar on April the 7th. I'll be sending out an announcement. You will be receiving the Webinar recording link as well as the slides in PDF format most probably by Monday. So please, make sure that you are following my communication in that regard. All those who registered will receive that. And you will also get a link to all our past webinars on the NREL website. So with that, I'm going to end the recording. Thank you very much and enjoy the rest of your