Using Streaming Analytics for Effective Real Time Network Visibility - Video Text Version
Below is the text version for the Using Streaming Analytics for Effective Real Time Network Visibility video.
Erfan Ibrahim: Good morning, this is Erfan Ibrahim of the National Renewable Energy Lab. Today is Friday, March 3, 2017, and I'm very pleased to have Dave Mitchell and Bill Sella of Singularity Networks here. We are going to present on a very innovative approach for doing anomaly detection, and we're going to hear a little bit about the approach and then we're going to learn about how Singularity Networks is applying that approach to solving some real problems, and there's going to be a live demo at some point in this presentation. So at this time, I would like to invite both Dave and Bill to say a few words and get started.
Dave Mitchell: Fantastic, thank you so much for having us up, and thanks to everyone on the phone for taking the time to join us today. My name is Dave Mitchell, I'm the founder and CEO of Singularity Networks. I'm also joined by my cofounder and chief technology officer Bill Sella. We started the company about three years ago, and we're taking a streaming analytics approach to network traffic analysis. So we can go to the next – there we go. So through the power of streaming analytics, we're able to allow you to visualize your entire network, regardless of how your infrastructure is set up, whether you have data centers, corporate environments, cloud, having the ability to see all of your network traffic in one place, single pane of glass, will allow you to identify odd events or anomalies that are on your network or your infrastructure with a lot more detail of context than traditional solutions that are out there.
And this allows you to respond to these issues much more quickly than normal, and you actually have the confidence that you're making the accurate decision because of that contextual detail. So why did we leave our nice, comfortable jobs at our previous employers to go start a software company and work seven days a week? Sometimes Bill I think asks me that question all the time. The real point comes down to the inability after 20 years in the industry for me to actually see what was happening across my infrastructure in real time in software and being able to integrate it with the solutions I already had on my network, whether they were sim systems, malware detection devices, other networks' ability tools. I was never able to see everything in one spot.
So you had siloed infrastructure, and you had to manually connect the dots between all of those different platforms. Unfortunately, that's a really difficult thing for almost any organization and you end up missing a lot of things, and also increasing your operational issues and increasing the timelines of outages. Also, not having real time data is a real problem when you're dealing with networks that if they go down, they're either costing you money by the second – so worrying about what your network did 15 minutes ago isn't really an option anymore. You need to know what your network is doing this second.
So we've already kind of touched on this a little bit. Having complete visibility allows you to identify with context what's actually happening. You know, what is this burst in traffic across my network? Why is someone in my finance group accessing my production database machines and doing all that? So it reduces just downtime in general, but also gives – empowers you to have that visibility across the board. When you can actually see everything in real time, you're able – security ends up being a side effect. You might notice why do I have a machine in the data center that I hadn't looked at in a while by communicating bidirectionally with a country that we don't do business with. Why is this laptop sitting there pushing out 100 megabits a second of MTP? There's a lot of different ways that you're able to see things when you're actually able to see it all. We also have a policy engine built into the system that allows you to once you know what your network is supposed to be doing, the importing policy sets in there so when there's any deviation, we alert you to it in real time that second. So that allows you to keep the compliance of your network security policies up across your network, so if there was a misconfiguration on an access control list or firewall, we would actually be able to tell you that what you have told us is supposed to be happening, there's deviation from that.
So it is just a pure single pane of glass. Besides having a really nice shoe eye that we're going to show you, we've built a platform, and this is the product on top of the platform. The platform itself also besides being real time, has a fully flexible API, the ability to create objects in the system, gather live and historical data, and also stream data in real time from the platform. We also allow you to group all your different machines and roles for users in the system itself, so you can define these are my most critical business assets, or these are my executive team or these are my data center or cloud resources, and then you can actually group and understand what's happening between those.
So the way we've built the platform is really three different tiers. There's the collection mechanism, and there is the API and the analytics engine, and then there's the backend database and analytics platform. So we collect Net Flow Version 5.9 and IT6 from almost any different vendor that's out there currently. Routers, firewalls. If you can send Net Flow directly, we have a lightweight probe that we can install on Hyper Visors or hosts or VMs in the cloud themselves, which would generate Net Flow from P-Cap. We also integrate with Microsoft active directory so you can identify a particular user or in a group on your corporate infrastructure if you're using that for authentication.
We then de-duplicate and normalize all of that data, and then stream it to the next tier and enrich it with other types of data sets. DNS, geographic location, Whois data, BGP data across the board. It also allows you to integrate in this portion with other types of technology that you might have if you wanted to do some targeted sis log from an endpoint solution. We can actually ingest that into the collector, and then it also just enriches with that data set. And then the back end data store, this is where Bill will really get into the streaming analytics is how we process the data as it's coming in as opposed to the traditional method of doing the storing query mechanism. So as the data is coming in, we're essentially doing all of the database processing in random memory versus writing it all to disk. So by combining all of those different types of data with the network data in real time and analyzing it in that method, we're able to really see a lot of different types of anomalies and we'll get into how our anomaly detection differs from everything else in the industry.
These are just some of the platform that we currently integrate with today. The way the platform is built is it's really just configuration driven, and if you need it to integrate with another particular solution, instead of having to wait months or weeks or potentially years for a vendor to be able to add that into the platform, the way our platform is built is it's pretty much a trivial amount of work to add a different data type into the system. And with that, I will let Bill take it away.
Bill Sella: Thanks, Dave. So talk about just a couple of – I muted myself. So talk about a couple of the ways that we distribute the system here. We have a couple different options available, and before we get into the guys of what we're doing. Either on premise locally, on your site, your data stays your data. A lot of people have an issue sending things to the cloud, and for good reason. Also if that's not an issue, there are hosted or SAS options available as well, and just the multi-tenancy thing is more for our partners that might be in the security management or service provider type of realms. There's a multi-tenant solution that allows them to realize revenue for the system as well. The base platform, let's talk about the on-premise. It's just a red hat faced system as far as the base is concerned, but one thing we do want to point out is that collectors are separate from the analytics, and we don't really care how many collectors somebody puts out. It's really a function of how your network is designed.
You might decide to centralize a collector. You might decide to have 500 of them distributed around your network. Wherever you need that de-duplication and normalization to happen for you before it comes back to that centralized component, you know, that is really a deployment decision versus anything that's going to impact from a cost perspective. The other option though that we do see – because we are taking a different approach to our analytics with a streaming based versus kind of that traditional storing query, generally speaking, most of our customers deploy it on a single server. It's efficient enough to generally handle those cases into the many, many hundreds of thousands of flows a second. So go ahead and continue on here then into the streaming analytics database that we built into the system.
So we did focus on hardware efficiency when we built this thing in the beginning. We understood that although Dave and myself and the rest of the team, we come from really large service provider backgrounds, not everyone is necessarily willing to throw a rack or two racks of machines at a problem like this. And so we said, "How can we shrink this thing down?" And get some of that benefit and keep it into a couple of you. So one of the other big pieces that we have here is if you take a look at other solutions, and particularly in the relational database – obviously, the [inaudible] a bit more on the open source realm, but on the relational database, there's generally a large cost associated with those licensings as well. Not to mention the DBAs that have to kind of keep things running on the side as well.
The other piece that we thought was very important to this is let's make sure this is an extensible system, and not just extensible for our engineers to add features, you know, quarter over quarter, but extensible for our customers, for our partners to integrate their data. Every company, we all know it. We're a little bit different than everybody else. We've got something that makes us a little bit special. It's why our customers use us, and why we thrive. So making the ability to take your data, stick it into the system and make it a little bit more valuable in your environment was a big part of what we're going after here.
So take a second and kind of drill into a little bit of technical detail and talk about that traditional big data approach. And this applies to whether you're looking at the traditional relational database model, you know, scaling that out, or you're taking a look at something like a no SQL style approach with a Hidup or whatever it might be. Right? That traditional data flow pattern is – I take new data in, I insert it, I batch it up in some amount, it then writes to the disc, I've got my analytics that are sitting in the background, and periodically, you know, whether that's once every five minutes, once every hour, or if you've got a whole lot of hardware, once every minute. Right?
It's going to run a bunch of queries, and it's going to pull all that data off the disc. So already, we're talking about taking that data multiple times to and from the disc, we're chewing up iOPS, we're chewing up bandwidth on the bus. There's all those things that come into play, and this is of course why that traditional approach has to scale so wide. You need a lot of bandwidth, therefore you need a lot of servers. It's just the solution that exists. The other catch to that is when you're talking about that sort of approach, because of that periodic cycle, it is a function of how much hardware you have obviously as to how close let's call it real time becomes. Certainly, it's only as real time as you have disc capacity essentially. And you know, the other piece of that is the more hardware for those of us that have had the fortunate of running these things before, more people, more power, more cooling, everything that goes along with that.
And so that's why we said let's take a look and see what a more modern approach might be to solving some of these problems, and that's where the streaming analysis piece comes in. And so instead of that design pattern of insert, write, read, and then analyze, there's nothing inherently different from running a database query and having the data come into that processing off of the disc versus simply turning it sideways and having that data come in directly off the wire, and that's essentially what streaming is really all about is just turning things sideways so we can, as we update data, we can analyze it in parallel to while it's being written down to the disc.
And that allows us to get those insights without having to have the delays, without having to have that massive hardware footprint in order to increase the IO. Now you know, you can certainly be taking a look at this and say, "Well you know, I've still got to write it down to my disc if I want to pull it up from a historical perspective." Those folks that have worked in the big data realms of the Hidups or in that map reduced world, the no SQL, you understand that you can always do a penned only rights. You can solve that IO problem pretty easily and make it just a problem with bandwidth down to the disc, and you know, certainly, there are some things that we do as well to compress the data, et cetera, you know, as it goes down or to further reduce that, but the whole point here is that if you take a look at the image on the right here, what we're trying to do is not just analytics, but also ad-hoc queries as well. So one of the big benefits of the whole big data approach of not having a schema is I can ask questions that I didn't know I was going to ask originally.
And that's an important part of what we're trying to do as well, and even our ad-hoc queries that we put into this system, we said those have to be real time. Those have to be live queries as well. And so as new data is coming in, those ad hoc queries that are running in the system, not just the analytics queries that are running and gathering insights from anomaly detection, et cetera, are also getting insights from new data as it's arriving to you. As we get to the demo, we'll be able to illustrate that a little bit. You'll see tables that are actually changing, numbers changing in the tables, things reordering as new top talkers rise to the top, et cetera.
So it's a neat way to take a look at what's going on in your network right now. Certainly the other thing that we talked about on the big data side was the hardware footprints. Well because we don't have to write it to disc, obviously, that's the IO benefit, but you also are not running a fully query where I pull the entire data set up into memory every periodic cycle to reanalyze that data. So we're cutting down on CPU as well, so obviously that has a couple big impacts. You know, reduced cost, and of course, that footprint again. CPU cooling, rack space, et cetera, all of those environmental components.
And it really gives us that ability to see new things second over second. We talked a little bit about APIs up to this point. So the way that our APIs are built is because this is a live data system, it's not that traditional either a SQL or even just that rest based approach. Now you'll notice some familiarity to a rest based item, but we actually do use a web socket based API into the system. So you know, it's – well let's call it a relatively new technology. It's been out there for several years at this point, and whether you want to interface from a job python, a C, a C sharp, whatever it might be, all of those language hooks are there nowadays for a web socket connection. And queries are really nothing more than sending over a JSON structure and either getting back a bunch of JSON structure as far as results go, if you just want to do that traditional historical query, or doing things like saying, "You know what? I'd like to make an API call that tells me any time something in my controller domain is talking to a country outside of where my infrastructure is located."
Those are the types of queries that you can put into there and you can get those alarms, hook them into whatever alerting system you might have, it really is about getting that live data within milliseconds of it actually showing up in the system on the front end.
Dave Mitchell: You want to touch on the programmability of other devices with –
Bill Sella: Yeah, so certainly one of the other areas this comes into play is most – any of us in the network world. SDN, Software Defined Networking. Certainly as starting to take hold, you know, over the last let's call it decade now, but it's gaining a little bit more popularity within the last few years here. One of the neat items is let's say that you had a compliance rule defined in the system that's particularly talked about that use case that I said. My controllers should not be talking to someone in a country outside of where my people are, and if that happens, obviously that's an indicator that a security policy has been violated.
Whether that's because somebody forgot to do it in the first place, or we did a maintenance last night, traffic shifted, and now it's going through a different firewall. Whatever it might be. There's a million reasons that can obviously happen, but you can take this, connect into the web socket API, and simply have that rule defined and say, "Let me know whenever that happens." When that happens, you know, maybe I'm going to make a rest call out to my SDN controller, or maybe I don't have an SDN controller and I've just got a bunch of Cisco routers or whatever it might be, and I've got sort of a spec script or something like that that I already use for my day-to-day operations to add and remove firewall filters.
Well have it trigger firewall change and block that off. No human interaction required. We've solved the problem before it became a big issue. That's just one use case as to how you can make use of this sort of an API in a more dynamic networking environment.
Erfan Ibrahim: At this time, what I want to do is take a little short break and see amongst the audience if they are seeing stuff clearly and hearing everything because we have minimized your window. So I want to have a quick look and see where we are with the audience.
Dave Mitchell: And that is the last slide, too.
Erfan Ibrahim: Before the demo?
Dave Mitchell: Yeah, before we can do Q&A and then the demo.
Erfan Ibrahim: So it looks like everybody is following along, and we have about 59 people on. So Bill, I wanted to have a short discussion before we get into this. If you look at network systems management, this is discipline that has been around for quite a while, the International Organization of Standards developed an FCAPS model, and F stood for Fault, C stood for configuration, A for accounting, P for performance, and then S for security. But security didn't really mean the security you and I know. It was the network security, which involved SNMP communities. But if you look at the FCAPS model, besides some aspects of accounting and performance management, all the other ones should require real time streaming analytics. Fault for sure.
Dave Mitchell: Yes.
Erfan Ibrahim: Configuration as people are making changes. We should know about them. Security definitely. So it was a bit disingenuous that we spoke about it at an architectural level, 15, 20 years ago, but the implementation was still store and query.
Bill Sella: Yeah, absolutely true, and there was – there are certain areas of that. Let's take the alerting right. SNMP of course has traps, but you did end up with the implementation, so obviously, a lot of the time, we're okay, we're going to send it somewhere. It's going to go into a database, and I can click refresh on my browser or whatever it is, and I'm going to eventually get that up, or maybe there was some sort of a pipeline there. So some of it was a little real time, but there certainly were the – particularly as you get to the security aspect of it. It's been an after the fact processing that sometimes even in the days before we finish a query.
Dave Mitchell: I mean you look at certain issues with compromises that have been out there. People don't know for a year, two years, sometimes longer that they've been compromised just because there's no easy way out there right now to combine all the different data sets. Why should your endpoint security platform not be able to be coalesced with your network data and your sis log data and your DNS and your authentication data and see it all in one place. You need something to tie all those together, otherwise you're going to miss things for a year at a time.
Erfan Ibrahim: Then the other thing I thought of with streaming analytics was that you have a very unique opportunity between while data is in flight and when it gets stored to put the appropriate cues on the data so that retrieval later for reporting purposes is a lot more efficient than putting it in big clusters without any markings.
Bill Sella: One of the things I did not mention as we were going through there is we do take the notion of pre-aggregating for future queries, essentially, so as we're tagging information as it comes in with geolocation, for example, right, here is all my US traffic. Here is all my Canada traffic just as an example there. We'll build pre-defined aggregates of that information so that when it goes down to the disc, you've got not only that raw data stream of here is all of my flows essentially, but you've also got these pre-defined aggregates of China, US, Canada, this traffic to this particular IP address, traffic to –
Dave Mitchell: To this application.
Bill Sella: To this application, you know, SSH, Facebook traffic, whatever it might be. Those aggregates are slowing down to the system in real time, which does add to your point, simplifies and reduces the amount of IO when I make those historical queries later. And it's not just – we're not just limited as well to those pre-defined aggregates. We can actually aggregate the aggregates as we pull the data up as well. So if you've got instead of let's say ten billion data points, you've reduced something down to 100 data points, well that's a pretty significant reduction in the amount of information you've got to pull up, which of course aids in us finding problems quicker than later.
Erfan Ibrahim: It's very interesting because in the last 100 years, we as humans seem to be moving towards more and more machine like behavior, but you're almost showing human like behavior to machines by what you're doing because if you think about what the human brain does with short-term memory and its sensors, right, and then there's the longer-term memory, which is kind of like our hard disk. As data is coming in, as stimuli, whether it's physical or acoustical, we make decisions right there and then. We assess what's coming in, and then once that's done, then it goes into our memory. So you're doing something very similar to that in the way –
Bill Sella: It's about presenting enough data to make a good decision quickly. We all have run into that situation of paralysis by analysis. So much data, and we spend so much time analyzing it, and we never get an answer we actually needed. And so yeah, absolutely correct.
Dave Mitchell: Yeah, after all of our years of experience, having a platform that you can take that human like behavior for your own workflows, for your business, and integrate it, that's where it was kind of taking all of our previous experience and putting it into this initial platform of being able to notice particular behavior with what I've seen by stitching things manually together. Now we can do it all in one spot.
Erfan Ibrahim: I think the key thing for the audience to understand is there is no existentialist battle between the two approaches. The traditional approach is there and will be there and has tremendous value for historical reporting purposes. Here, we're talking about using Moore's law with small form factors and high speed processors to do something today that wasn't possible 10 or 20 years ago in real time. So this is taking advantage of Moore's Law. The abuse of Moore's Law is what I call big data where you take everything and you just put it in these big clusters without proper marking, without the ability to have efficient IO. So what I'd suggest is that we go from big data to lots of data. And lots of data means distributed intelligence and real time analytics. That's what lots of data is because you are processing lots of data.
Dave Mitchell: Smart data, right?
Erfan Ibrahim: We're now facing the big data problem. And that's key of using Moore's Law to your advantage.
Dave Mitchell: Absolutely.
Erfan Ibrahim: Now let's go to the demo, unless there were any quick questions from the audience because we're going to go into a different mode now. So if anyone online has any questions to ask based on the discussion we had or the content that was shared up until now, let's go ahead and submit some questions. Let's answer them, and then we'll go to the next.
Dave Mitchell: Perfect.
Erfan Ibrahim: So anyone in the audience, any questions with any of the content of the discussion we've had so far? Looks like people are just ready to see the demo. Okay, very good.
Dave Mitchell: And if you have questions after the presentation, feel free to reach out on e-mail to Bill or myself and we'll be happy to answer.
Erfan Ibrahim: So at this time, I'll bring up the other slide here. Let's see. Here we go. Minimize this. And it's all yours, Dave.
Dave Mitchell: All right. This is our latest release of our 3.6 platform. This is just the main dashboard. As I walk through it, I'm going to let Bill explain some of it as well as we dig into it. This is just the dashboard. This is a one-hour live view of all the traffic going across. This is actually a real network. Bill and Jim, they joke they've built the largest based text based gaming company in the world, so this is actually real time traffic. The way we categorize and classify traffic are by taking the prefixes, whether it's your internal part of your network or your public IP part of your network, and we classify that as internal. Everything else is external. So when you see up in the corner we have internal hosts is the number of hosts speaking currently at this time.
Number of external hosts that they're connecting to, number of actual connections, and the number of active ports. These little spark lines would allow you during some sort of an issue to see all of a sudden now it went from 400 active connections to one million. There's probably some sort of a DDOS attack or whatnot going on, and the number of active ports. This geographic map is just cycling through, but it allows you to also click on a particular country and find out how much traffic has been sent and received to that country over the past hour when you see the countries light up and get darker. All of a sudden, you see a bunch of three or four countries you hadn't spoken to previously. You'll see a nice heat map of them changing. We also have five or six third party integrated threat intell – or threat reputation data sets built into the system. We also allow you to, if you have your own threat reputation or threat platform to actually be able to integrate it as well so you can –
Bill Sella: Yeah, and those integrations – particularly for threats, are relatively simple if you've got – I mean you can take a CSV and drop it into a directory and use that as a source, and any time that file updates, we'll pick it up and start using it. So there are a number of ways to make that relatively painless.
Erfan Ibrahim: Let's talk about that information a little bit because this is becoming very important as we see natural disasters occurring and manmade type attacks. There are so many models for threat sharing that machine to machine becomes very difficult with all those models. We have the NARC organization has IFAC, and they have a Chris model. FBI has Infra-Guard, and of course we have our traditional Certs, US Certs, and other Certs. So do you have some way of normalizing those different data formats so that you can make sense of it on a common platform?
Bill Sella: Yeah, what it really comes down to for our platform is writing a quick script, essentially, to normalize those. So you can plug into the system with a python, or as obviously we talked about the APIs as well. So anyway you can get it normalized down into just this simplistic let's call it a JSON object or a CSV record essentially, we can get it normalized down and take that data in in that format. Certainly there's a little bit of config. You know, you've got to say well, this field is X and this field is Y. But it's relatively simple to – as long as you've got some sort of textural format of it to format that into the right way.
Erfan Ibrahim: So I would recommend the SIEM format because it is becoming very popular now in the industry, and it's almost like the SNMP traps back in the HP open view days, and at least it gives us the ability then to integrate with a tool like Slunk or some other data integration for more visualization of those. You could do it on your –
Bill Sella: Yeah, we'll talk about another one of the integrations as we get to other tabs to like a Slunk and how we can pull context in in just even through the UI without the back end.
Erfan Ibrahim: Then let's talk a little bit about the types of threats. So you have defense and intelligence community in real time sharing signatures for specific types of malware that they have detected from their own reconnaissance work, which is not popularly known. So do you have connection into that kind of real time threat information?
Dave Mitchell: I don't have access to that, no. I think that would be interesting to work with. And the way the system works, you can actually score each particular attributes in the threat platform along with anomalies, but depending on which source you're getting it from, obviously from that government entity, you could weight it in the system as a – this is a 1.0 for accuracy, and these are the fields I most care about, and if I see any traffic touching that IP address or going to that particular sub net in general, alert me instantaneously and tell me that is –
Erfan Ibrahim: And the other type of threat information that's shared is more verbose. It's like ways of behavior they consider suspicious. It's not a malware signature per se, but – verbose stuff.
Dave Mitchell: Yeah, we can. The real problem with threat reputation data in general is with the evolution of the cloud. An IP address may only have been that particular role for five minutes or 30 seconds or a day, so associating bad behavior to an IP address anymore really just doesn't work, and that's where we're taking that approach of looking at all the different attributes of those, whether it's autonomous system number, country, IP sub net. Then along with the – what do we know about that autonomous system? Oh, that's part of government entity X and country Y. That might be something we want to look at, and then what ports and protocols and applications are happening.
Have we ever seen – this machine has never spoken to that country before, and now it is. Why? What's happening?
Erfan Ibrahim: So a couple of things come to mind on a dashboard like this. One is that it allows you to reduce your operational cost because one individual could sit in front of this and look at the various feeds of information on this dashboard and turn it into actionable intelligence and take steps. That's one thing. The second thing is that person doesn't need to be very skilled. And that's another way of reducing the cost that a few years of experience is enough to know the basic health indicators on this dashboard and respond without needing Tier 2, Tier 3 type of support. So for organizations that are currently facing this challenge of silos with networking and security, this creates that kind of platform where they have situational awareness at a higher level across those silos.
Dave Mitchell: Also, if people already have their own dashboards, it's a simple API call to pull the time series chart out and go put it on your own dashboard, so [inaudible] data. So now we'll drill into – so this is all the traffic from the internal network to the actual internet, and this is our Network tab where you drill in and actually see what's going across the network. Initially, it breaks it down by country. This is all the traffic for the past hour broken down by destination countries. So in this aspect, you can see the top five overlays, and actually see how the traffic is broken down. As you can see, the United States is definitely the bulk of it. Then we've got China and Canada there. I'm not too worried about the United States traffic being excluded. This is the really neat part about how our backend database works.
It allows you to start off with an aggregate view and continue to drill down and slice into the data, whereas with traditional SQL type structures, you can only go one or two layers deep, and then you have to go back and start at that new layer and go down. So now if we wanted to see all of the traffic just to China and excluding the United States, now we'd break it down into application, and as you can see here, depending on the type of – we map it with traditional Etsy services, but also if the application inspection turned on say Palo Alto Firewall or anything with IP6 or Net Flow 9, we can get the actual application record.
And then we would drill into – all right, let's see who is SSH'ing. Now we can break it down into source site – source country. Resume this, but I won't right now. Don't know why I can't get all the way down there. Screen is not scrolling. Source site. Need to resize it. So yeah, I grab – control minus maybe. There we go. Then if you wanted to see the destination IPs, everyone in China, on application, SSH was talking to – and we do the breakdown and then you can see how they are – actually see these should be updating in real time depending on how much traffic. There's probably not a whole lot of traffic there at this point. But if we wanted to just go back to the aggregate and go across the whole time, then you'll actually be seeing it. And then we've got to say 15 minutes.
And then you'll see on the side the percentage and totals and everything will update. So if there's a real – an event from – could be why is the network totally saturated? Well the DBA has decided to back up all the databases at the exact same time every day.
Bill Sella: Yeah, and so if you just flip over, Dave, and click the internal. So let's forget about traffic that's outside of our network at the moment and take a look at what's inside. Obviously, we don't [inaudible] in China, so we don't have any internal Chinese traffic, but so here, you want to grab that big spike in the middle there, Dave. We'll zoom in on that for a moment. So this is how we can notice that there's some sort of spike in the traffic, and let's drill into it and see what's making that up. So maybe we start out at a tease or maybe we start out at something else, but applications as an example. Let's see what's building it up. So add some overlays onto that and take a quick brief look, and we can see that obviously both from the table at the bottom as well as the picture at the table. That's all nice QL traffic. Right?
And so we can, you know, drill into that. We can take a look at what that source IP address is that might be originating that traffic. You can see that there's two servers in this case that are really making up the bulk of that, whether it be the 43 percent one direction or the 39 percent the other.
Dave Mitchell: This also shows you 95th percentile, the maximum and all that over here. So if you need to use it for longer-term capacity planning and whatnot.
Bill Sella: The whole point of this is this is where you can drill into data, find out what comprises a particular piece of time. If you want to click over to the anomalies now, let's take a peek at that.
Erfan Ibrahim: Before we do that, what I want to do is confirm with the folks online that they're seeing this demo. Because we didn't get that confirmation from them yet. So for those of you who are online, if you could just put in a comment and say, "Yes, I can see the demo," that would be very helpful. Okay, very good. It seems to be going – yes. I have to put this in the way – it wasn't set up right. Okay, very good. I can see it. Excellent. Perfect. Let's continue with the demo then.
Bill Sella: So anomaly detection. We've got a couple of algorithms, and this should look pretty familiar in a number of systems, whether it's alerting or whatever. You get a list of things that are going on, but what this is is these are anomalous behaviors detected. We've learned from the traffic patterns that are going on in your network normally, and these are things that are outliers, for lack of a better word there. And we do this on all different types of groupings of information. You've got routes – so particular routes on the internet, ports of traffic. So 53 you'll see at the top there, my DNS traffic for example. Traffic to and from particular countries, just overall aggregate, how much traffic is coming in and out of my network.
All those things are different ways that we are slicing the traffic as it comes into the system, and we're detecting on those time series a couple different algorithms we're applying to it. Certainly we can detect is there a huge spike in traffic or is a huge drop off in traffic from normal there from what we've determined from the baseline? Additionally, we also apply a seasonal pattern to it as well. So does Tuesday at 2:05 AM look like Tuesday at 2:05 AM for this particular subset of traffic? All of those are things that will pick up as an anomaly. Now certainly when you talk about that weekly cyclic sort of a detection, there's a little bit of learning that goes into that.
Typically, you'll start seeing some value a couple of weeks in. By the time you get to about six weeks, we've established a pretty good understanding of what that traffic looks like. For just spike detection, those sorts of algorithms, that is quickly as we can baseline is usually in the order of a couple thousand data points, and we've got a pretty good idea as to what trends are looking like there.
Erfan Ibrahim: Now let's think about enterprises that would deploy a technology like this. Let's say they've got multiple sites globally. You would want to collect the data at multiple points, not just at the main site because if they're doing B to B transactions remotely, they're not coming to the main site. You'd miss that slope.
Bill Sella: Correct.
Erfan Ibrahim: So you sit with your potential customers, do you architect your own deployment on top of their network to get this level of granularity
Bill Sella: Yeah, so the more places we can get data, the better is really the way that we look at it. So if you've got let's say 100 distributed locations around the world, what we'll want to do is in a very simplistic deployment, let's just pull off of a firewall coming in and out of those 100 locations. So we'll get that telemetry data off of there, and as I mentioned earlier on from a deployment perspective, maybe those 100 locations, you've got ten different hub locations spread throughout, and you want to run a collector at those ten hubs. But you've essentially got ten satellites per hub that is sending the data back to that, which then ultimately gets sent back to the back end processing system to look at things.
So deployment is very flexible, and as we mentioned, it can be as simple as one box. It can be as complicated as many boxes need to be involved. Some of those boxes can be tiny little embedded devices for all we care. As long as it can – it's running Linux and we've got a couple CPU cycles available, we'll do a collector on there.
Dave Mitchell: The other thing that's been missed over the years is the east west type traffic. It's great if you're watching everything north to south, but if you've got a corporate network machine that is then hopping over into production and then production is going outbound, unless you're seeing all of that other traffic or using the probe to generate net flow from host, you can see host A talking to host B in the data center. You're actually going to see a lot more things you wouldn't normally before. Drill into this one, Bill, if you want to talk about this anomaly and how it works.
Bill Sella: Yeah, so what they did there is he grabs one particular anomaly, and he's got it highlighted at the moment there. But what this is doing is giving us a time alignment of other anomalies that occurred at roughly the same time point. So what that's allowing us to do, it's not making an absolute statement that says this particular IP is related to this particular route in particular to Brazil, but it's saying there's enough of an indication because each of these individual sub-aggregates of traffic experience anomalous spikes or whatever in this patterns at roughly the same time. That's an indicator they might be related, and it helps us get to that root cause and finding the problem a little bit more quickly by doing that little bit of alignment here, and that's the point of this.
Erfan Ibrahim: One of the benefits that I'm seeing is optimization of network design because this raw data that you're getting and recognizing anomalies, a lot of that occurs from poor design where broadcast storms are happening. It isn't proper network segmentation. So you can be kind of like the Sherlock Holmes for the network designers to give them the data so that next time around, they configure their networks better. It's not always necessary for nefarious acts to be –
Bill Sella: No. Visibility has so many benefits. Security is certainly one of them, but just making things run better when all of my users are complaining that every day 3:00 in the afternoon, the network, it is not working very well for me.
Dave Mitchell: As someone who was on call for a long time, being woken up by a pager in the middle of the night because of network traps that are going off, I got five or 50 link flap alerts. Did the network actually go down? Is traffic still passing? So the ability to know that here are my five locations I most care about. Tell me when the traffic deviates below or above these things. Then let me know. Otherwise, let me sleep. A lot of people have that [inaudible].
Erfan Ibrahim: We are seeing even though network communications costs are coming down, it's still quite high when you get to high bandwidth subscription. So if a company has to make a decision to move to the next level, if you analyze their traffic and recognize that network redesign could lower their traffic by 30 or 40 percent, they could delay that decision of moving to the next subscription.
Bill Sella: Absolutely, there is a huge potential for capital avoidance in having to make those purchases from network equipment just by getting more out of what you have.
Dave Mitchell: And we've heard a lot, too, about a lot of people are going to hybrid data centered cloud. We've had a couple people ask us, "Hey guys, do you think it's smart for me to move this part of my web application to the cloud?" I'm like, "That's a great question." So being able to look at how much traffic stays between your web server tier and your database tier, so then if you split it up and being able to attach a cost for that AWS traffic load gives you that sort of insight on the financial.
Erfan Ibrahim: Let's bring this then connect it to the context of the power systems and the electric sector since many of our folks are here from that vertical. You're going to find that there are different time skills involved in our business, from milliseconds, from protection, all the way to several minutes and even into hours when it comes to general shaping of the load. So it's very important for there to be network hygiene in the OT side, the operational technology side, so that all the time scales are properly respected. If we had an asynchronous transfer mode and we could set bandwidth by application, I know that MPLS tries to flirt with the idea, but it's still connectionless service, so I don't know how good that is. You just solve it by throwing lots of bandwidth.
Dave Mitchell: We were all big ETM fans.
Erfan Ibrahim: Since we don't have that, this is kind of like a homecoming for me because I'm thinking Arman from 20 years ago. That was exactly what this used to do. Fast back at capture, full analysis, all seven layers, and then the depiction of it using a standard object model. It was an abstract syntax notation dot one, just like SNMP, the Alexicographic thing, and Arman [inaudible] had very specific things. You've created the equivalent of Arman in your domain and provided APIs to other people to integrate. So being a former fusion engineer, I know what it's like to come up with an ideal thing and not be implemented, so I saw that with fusion. I saw that with Arman, and now here we are back again.
But to talk about the electric sector, if we're going to move to smart grid and have smart grid really work, we have to have discrimination of traffic. It's critical. So analysis is the first step to discrimination. If you don't have good visualization capability, you're going to make wrong decisions or you're going to throw a lot of money at the problem, which again is not helpful given that it's becoming more competitive to offer energy services, and you don't have the luxury of the monopoly or anything like this. Is that the end of your demo?
Dave Mitchell: I didn't know how much time. I wanted to make sure –
Bill Sella: If you want to just do a report real quick here. So one of the more recent additions is the ability to add custom reporting into the system, and so just grab like the external traffic. We've got a couple of examples in here right now. But this is just a quick at a glance of if I want to take a look at kind of all those top and sort of views of my network, what are the top sites that are talking. What are the top groups of my servers, the applications that are talking? The IPs inside the IPs and the ports, the protocols, the routes, the autonomous systems that are involved. Right? It's a way to get a quick at a glance view of what do I actually care about. Similar to the dashboard, but maybe it's more tuned to your particular environment. So this is we think a pretty nice feature here as far as being able to get one page essentially of what's going on.
Erfan Ibrahim: Yeah. One thing I'd say to make this a little more efficient is to have the ability to write scripts in a very nice graphical user interface, come up with some business rules that go into these bins, and bring up a higher level logic for display. What you showed shows an advanced user that knows all the tributaries where to go look for stuff. But I'm thinking of you know, Singularity networks for dummies where you come up with business rules that are mission critical for you that are in alignment let's say with your business goals, and you apply those rules on this. So as the bins are populating, it will create a higher level of visualization capability to show when those rules are being violated.
Dave Mitchell: That's actually coming in some later releases. We already have the policy and rules engine on the back end so you can configure those rules right now on the back end, but being able to say if my database servers do this, then do this. Let me know – or if some traffic on this port goes over this amount of traffic load in this time window, then call Bill.
Bill Sella: Yeah, I've got a segment of the team working specifically on that.
Erfan Ibrahim: Because as you were presenting, you were that wizard.
Bill Sella: And as well –
Erfan Ibrahim: In a digital form now.
Bill Sella: Yeah, so there's that whole component, and there's also the – we'll be adding the saved views within the next couple weeks, so you've got a senior engineer who slices down to a particular piece of traffic and says this is really interesting, people should keep an eye on it, and be able to save that out for the rest of the staff.
Erfan Ibrahim: What I'd like to do for the next couple of minutes before we go into Q&A is talk about a few case studies. So I will – yes, I'll close this out, and I will bring up the Power Point again. Are there any more slides on this?
Dave Mitchell: I don't think so. We already went over this one.
Erfan Ibrahim: All right, so what I'd like to do is for you to discuss – you don't have to name names, but I want to understand – take a couple of verticals and show me what bringing this technology did for that company. In terms of gains – some of the benefits they had.
Dave Mitchell: Yeah, one of our customers, they're a large web services provider, they didn't really have any real visibility in their network when it came to cloud and data center. They had multiple data centers, and they really needed to understand what was happening so they could re-architect their network and also look at things from a security standpoint. But they didn't want to spend a lot of money on appliance based solutions. They're a small network engineering team, so they needed to quickly identify what was happening on the network and see what was going on, and then also from a security standpoint, they wanted to cut down on spam through their platform where their platform was actually being used as it. So they integrated their known bad IPs where they would dig through their own data sets or it was somebody flagged it in the application, integrated it into our platform so they could actually analyze the type of when somebody signed up in a particular town and started spamming being able to correct that, and in real time, being able to shut those accounts off before they really became a problem.
But it was just a real ease of deployment. We were able to get them up and running in a matter of minutes, and then they've continued to adopt and even taking in net flow from all their AWS. So now they have a complete holistic view of what's going on across their network. You know, another customer of ours, we were putting – actually, it was just a friend we were helping out and kind of looked at their network. And I'm like, "Did you know you have a few volunteer system administrators from China?" They go, "Uh oh," and we dug into it, and sure enough, there was an active problem, and it was a critical resource that was being affected.
Bill Sella: Yeah, no, that particular one, they were actually noticing they were having performance problems on their network. All the users are complaining, and it turned out that we could take a look at the spike, we drilled into it real quick, we saw as Dave mentioned where that traffic was coming from, and it turned out a couple employees had brought in a couple wireless access points of their own that were –
Dave Mitchell: [inaudible] infected. They thought they were being DDOSed, but they were actually the ones DDOS'ing.
Bill Sella: Yeah, so it helped them to get to that sort of problem really quickly there. One of the other interesting use cases is Dave kind of mentioned the customer that was taking their own data and shoving it into there from – okay, we've identified these are our bad folks or whatever that are abusing their system. What they wanted to see is how much traffic that is actually causing. And because they can define those groups and push them in through the API, essentially as their other backend systems are identifying things, they push that rule into the system, and we start detecting spikes in those particular pieces of traffic to know that oh, my other threat detection stuff I've got running on the back end suddenly isn't running as well as it used to. Because while that line used to be relatively low at a couple K a second, now suddenly I'm seeing a spike into a couple gigs a second of that traffic.
Erfan Ibrahim: Very good. So we're at the top of the house. We have 30 minutes for the Q&A period, and I'm going to see if there were any questions earlier. Yes, there are some questions here. So Marisu Mosavi asked can you talk about the implementation footprint?
Bill Sella: Yeah, so that is – obviously, this is one of those gray questions of it depends on how big of a [inaudible] is. Right? If we're talking about a company with let's say even ten gigabits a second of internet traffic or less, that's typically going to be definitely a single server type of deployment. So you'd be talking about a 2U type of hardware footprint. Probably somewhere in the neighborhood of in the 20 to 30 cores on that guy with maybe 120 gigs of ram or something in that ballpark there. Again, these are very loose. But generally speaking, what you can say is it's going to scale down to kind of that small size of I've got 100 megs out to the internet, and a gig LAN all the way up to I've got multiple 100 gig links running across a big backbone network, and that footprint is going to be anywhere from a couple of U at the low end up to probably four boxes or so at the high end there.
Dave Mitchell: Generally, it's a fraction of what you would need the traditional solutions that are out there. One of our competitors – or our customers was looking at another version of – that's out there in the industry, and it was going to be about 18 boxes, and they're running on one with us. We've really made it so the footprint is small so you don't have to spend all of your time deploying the solution, but then you can actually get your hand on it and start working with the data.
Erfan Ibrahim: So you can have one [inaudible] total, six [inaudible] - 16 bowls of Captain Crunch.
Dave Mitchell: Exactly.
Erfan Ibrahim: Okay, so the next question is how does the streaming analytics platform differ from Spark streaming?
Bill Sella: So it's similar to Spark in a number of ways. One of the big differentiators being Spark – it's a Java platform. It takes a lot of resources to run. It spreads across a lot of machines as you start to scale up. We're not a Java platform. It's a CC Plus Plus implementation from that perspective, which really dramatically reduces our hardware footprint, particularly when you talk about the nature of streaming data. It's a lot of data points coming in and out really quickly, and anybody that's developed in Java kind of has a feel for when you start getting that high short life cycle objects that are flowing to the system, you can solve it to an extent with approaches like object pooling, et cetera, but it can be a painful development.
Erfan Ibrahim: I would compare that with stick shift with an automatic. So your product is more like a stick shift. Very efficient, does not have a lot of the fly wheel spinning, but very precise in terms of what you can do with the C and C Plus Plus. But Java, because of its portability, they can be put on any operating system, but then the price you pay is that it's very resource intensive.
Bill Sella: Yeah, it's a language that has its place. I love it for certain things. There's certain things I just – it's not my choice.
Erfan Ibrahim: Great. Next question, I think you answered this, do you need a server or can you run the engine in an embedded system.
Dave Mitchell: We talked about putting the collector or the collection software probe on embedded systems so we can generate telemetry from them. I have the back end itself would need to run on a server.
Bill Sella: Well this is – again, it's one of those questions about how big is your footprint. If you're talking about a handful of dozens of pieces of instrumentation that it needs to take data off of, and we're talking about flow rates and the nature of tens or dozens of seconds sort of thing, the installation footprint can be really small as far as probably chewing up – think about a modern – I'm more familiar with the server processors, so take a modern processor. You might be chewing up like .1 percent of that processor, and you know, maybe a couple hundred megs of Ram. So if that system can work in that sort of environment, sure, but embedded is a very wide swath of exactly what that definition is.
Erfan Ibrahim: It's interesting, as I've been listening to your presentation, I kept asking myself is this a network management tool or is this a cyber security tool, and my conclusion is it's both. It's actually a network systems management tool that includes security and networking together because yes, you can help the network designer based on the information you're collecting, help them design networks better. On the cyber security side, you can provide real time analytics, which will help cyber people realize anomalous behavior. So it's almost like the wave particle duality. You have a foot in each of those two spaces.
Dave Mitchell: And that's really the over-arching goal is being able to unify all those data sets because normally in the past, you'd have to buy security platforms, and then you'd have your network management or visibility platforms. They're both married together at this point, they need to be. When you can see everything, your security is really a side effect.
Erfan Ibrahim: The final arbiter is neither. It is business continuity. That is the wealth generate and not plumbing. The plumbing is the means to the end. A lot of times, we're so technology driven that we forget that. By you bringing those two together, you're aligning it better with the business goals. Very good. So the next question is how would the new types of threat be included in the current model, and would it be real time inclusion of new types of threat if happening? Rules on the fly?
Bill Sella: Yeah, you can – so as the system is running, there's a couple different ways that – let's take threats as an example of how they can be pumped into the system. It can be as simple as updating a CSB it's reading into a particular directory, and that might be replacing existing one or dropping a new one in. We'll notice we're listening for odd node changes there, and as soon as we see that file show up, we'll immediately ingest it, and it will become part of the system, and it's live. So you're talking about, you know, seconds from the time it shows up there. Also from the APIs –
Dave Mitchell: We have a programmatic interface where you can build a module to say I want to go talk to my threat intelligence system I bought and use the API and pull the data directly into the system, or be able to push – then we analyze the data mixed with that, then you can push the data out to a different learning system.
Bill Sella: The short answer to the question is you do not have to install a new threat provider, shut the system down, and – I'm not saying it takes a half hour to come back up, but we've all worked with systems that have, and then restart it and wait for it to come into the system. As quickly as you can get the data to the system, it can start analyzing using that.
Erfan Ibrahim: I'm also seeing you taking the battle to the hacker, rather than meeting them halfway or at your front door. Because when you're seeing origins from other countries that were not supposed to be in the network and you can change policies that can block them from coming from there into your network, you're taking the battle to them as opposed to being in a reactive mindset in your own infrastructure.
Bill Sella: And that's what's – you know, the culture you get to real time, the less of that reactive role you have to play. It is about taking a proactive approach to the problem.
Dave Mitchell: And it's just empowering the operator, so we're not just that standpoint, but it's no different when it comes to watching what's going on in your network. All of a sudden, there was a huge burst in traffic, or we see traffic ticking up, and we're getting close to our capacity level. What is actually happening? Here is the context. Now go modify how your network is behaving. [inaudible] that application or whatnot before you actually have a problem. So proactivity on the networking end as well as the security thing.
Erfan Ibrahim: And the next question is excellent platform, the devil is in the execution. You might discuss implementation scenarios and success stories, and we just did that. Next is would it be possible to get the contact info of the speaker that's on your screen. How long is the installation process? That's very dependent on the infrastructure you've got.
Bill Sella: So we can talk about that in two pieces. The platform, and then actual full implementation . Getting the platform set up is really as quick as typically doing a Yum install singularity. It's got – we have a Yum repo. It's how we distribute software. It of course requires a key from customers only, but you're talking about installation to up and running, a minute or two without having a – if you're not doing any additional configuration. You just want kind of the vanilla out of the box.
Dave Mitchell: But that doesn't include configuring our network devices to export net flow, but people have been amazed when we – it's literally one command to get it up and running, and it's one minute.
Erfan Ibrahim: Do you actually provide the appliances, or do you just provide like some media with the software on it, and then tell them you need to put in a machine of these specs?
Dave Mitchell: So we do on premise software, so we provide a repo file and a key, and then they install it. We can provide appliances if people want them. Some people may have enough hardware infrastructure on it that they don't need to do that, so they can install the software.
Erfan Ibrahim: So I can tell you from my own personal experience and a lot of battle scars that finding good hardware appliance that is optimized to your software is very important. Even though it can run on different machines, but you never know about the [inaudible] machines, especially if they've been around in an organization for a while. You don't know what else is on those machines.
Bill Sella: And we will provide specs as to what sort of hardware rights. So sizing a network, okay, well I've got X amount of bandwidth, or let's say I know how much flow I'm generating, and you tell us well, I want to keep three months of data or I want to keep six years of data. Whatever the answer to that question is, we can spec the hardware to get you to where you need to be. It's a relatively quick exercise.
Erfan Ibrahim: So I was thinking that what would be really helpful is if you could provide an image of the machine in addition to your software so that yeah, the bare metal exists with them, but they put that image first and then put your software on the image. The reason is you may want to shut off certain services in your build that are on the OS.
Dave Mitchell: Yes, absolutely.
Bill Sella: And as Dave mentioned, we will do appliances as well. You know, hardware appliance if need be. It's hardware that we spec.
Erfan Ibrahim: I recognize that the hardware business is a different business, and it really adds to the cost of running a company when you have to deal with hardware because now you have some OEM relationship with a manufacturer, and you're responsible for the hardware, and it's not margin friendly. I get all that. I manage 5,000 sites, so I know firsthand what a next day Monday through Friday 8:00 through 5:00 can cost you.
Bill Sella: And this is actually where – because we work through a channel strategy as far as our sales are concerned, we work with value added resellers, and they help a lot with a lot of those situations.
Erfan Ibrahim: It would be really cool to get a singularity network built for Windows for the two or three flavors or whatever other platforms you work on, and just send that with the software and let them build the machine from scratch, and then put your software. That way, you know that there's no backdoor and nothing else that would disrupt you. Because one of the things a hacker will do is try looking for your appliances to disrupt so then they could have more flexibility in attacking. Okay. So next question is in the age of precision weaponry versus the last 10,000 years of warfare, how do you see this platform's ability to identify the highly targeted exports?
Dave Mitchell: We're not in the business of really looking at those application type malware detonations and things like that. We're just looking at the network as an aggregate, and then find granularity, and then we'll just tell you when something looks different. If that machine has never spoken to those particular IPs – and that's where combining all of the different data sets from your other solutions into the platform will give you that more of a view, could have a [inaudible] come off, but you could also have your end point solution notices something odd, taking in those data sets, and we pull in net flow and geo-location and BGP data and know that you've never spoken to that machine in South Korea before. What else do we know about it?
Was there any other indicators coming across the network that were looked at by different solutions? So we're not trying to be the fine grain malware targeting type platform. We just want to integrate with all the other solutions that are out there. We're really the epoxy that is going to glue everything together.
Erfan Ibrahim: Yes, so it's important that in addition to these tools that you also have other tools available for the insider threat.
Dave Mitchell: Correct.
Erfan Ibrahim: And that's where you go deeper into the context of the transactions. So DNS may be okay for them, but what they're doing in DNS may not be okay. So think of this like a funnel, and this is the widest part of the funnel and bringing it to some reasonable level. And then you use other tools to drill deeper into the application and the semantics and the business process to figure out anomalies of the kind you're talking about. Okay. Next question is has anyone connected an artificial intelligence platform to the system to assist with threat signatures.
Bill Sella: We have a customer right now in the process of working that. So the way that the system can be expanded, we've got a couple of different ways, and this particular customer is exploiting a couple of those different processes. But if you've got C Plus Plus developers, you can write plug ins that go right into the system. If you've got Python developers, that also is a language that we natively support within the platform where it'll run in process, and those things can subscribe to data just as any application or module of the system that we write. Our customers can write those as well, whether it's a C one, whether it's a Python one, or whether it's a Bash script. In all fairness, you can do a fair amount with that. It's not necessarily something you want to do for a massive amount of data, but you can do it .
There's a number of ways to plug into the system that are fairly standardized, and we do have customers that are plugging in machine learning items of let's say things that they have learned through their businesses over the last several decades in enhancing the system for their environment today.
Erfan Ibrahim: If you're able to track IP addresses that are talking, you can also know when a report has been activated and communication has begun because that's a very good signature of an insider threat that they'll use an unused port, hook up their laptop with malware, and start shoving stuff into the system. You're able to see that in real time, and someone just –
Dave Mitchell: Say for instance it's on the corporate network on your standard image. You have to know what ports and protocols are supposed to be listening on your laptop. So then you could go define a policy in the system that says, "All right, for my Windows laptops across the network, these ports are allowed. Tell me the minute I see source traffic on a different port happening instantaneously," and then you'd just get an alert the second we see it. And then for forensics, if you already knew the attributes, you found a machine that was really compromised, did all the heuristics on it, now you can go subscribe using the API or just run a query and make sure there are no other machines on your network being with those particular patterns.
Erfan Ibrahim: I'm thinking at the switch level where you have 24 ports let's say on a Cisco switch, and seven or eight of them are legitimate and they are real networks on them, but the others are not being used, but they weren't administratively shut down. So the question I have is if someone were to connect something to one of those ports that were not in the legit list, would it show up in your analysis that, "Oh, I've got this new IP address which I didn't have." It's still in your infrastructure.
Bill Sella: You could absolutely build a rule for that because let's just – it's going to depend of course on the actual deployment, but let's say that switches to exporting net flow as an example of data source. We'd pick up that hey, I've never seen anything coming in this interface before. That is certainly something that could be alerted on, absolutely.
Dave Mitchell: You could also just do a simple query to say show me all of the source IPs on my network yesterday. Tell me the ones today. Where's the difference and where are they coming from? So there's a lot of different ways to skin that cat.
Erfan Ibrahim: Very good. So then one last question is the time for a hardware refresh at a facility with singularity networks platform installed, how would the interoperability be established?
Dave Mitchell: I'm assuming he's asking about already have the instants running, and they need to buy a new box. How do we port the data over, and that's probably an answer for –
Bill Sella: Yeah, so if you were upgrading a singularity platform, everything gets deployed with – so if you're on the same machine and you're just swapping out internal components or just doing an update to the latest version, which we push a release out every couple weeks or so, there's just a script that you run. And it will pull down the new software, install it. If there's a data conversion necessary for the back end database, it'll go through the process and get things up and going that way. However, if you're transitioning from one box to another, it's really everything is stored in files on file systems. We have essentially data is broken up into partitions, so you've got a bunch of B trees spread on the disc, but they're sitting on your local XFS or whatever it is you've got for your file system.
You'd copy those files over, start the system back up, point it at the right directories. If it's default, obviously, that's a lot easier, but start it up. It will pick up those data just as if it had always been there.
Dave Mitchell: And we are coming out – I don't know at what point with horizontally scaling the back end to be able to shard the database across multiple different machines.
Bill Sella: Yeah, that's during the next quarter here, adding the ability to dynamically cluster. So in that release, once we have that here over the next couple months, it'll be as simple as spin up a new machine, add it to the cluster, signal the old machine that you're going away. He'll transition his data and shut down, and now I've got to transition to the new box completely live, no impact.
Erfan Ibrahim: Very good. Any final questions from the audience before I share some concluding remarks? Looks like okay, great presentation.
Dave Mitchell: Thank you. Thanks to everyone for taking the time.
Erfan Ibrahim: Yeah, so we've seen a very innovative approach to looking for anomalies in behaviors of companies, service providers, utilities, government agencies. We're all moving into this DCP IP realm, highly networked organizations. A lot of our business transactions are now occurring in a digital way, so we're using the network. But what people aren't realizing is with the bandwidth consumption going up, the variety of data is also increasing, and if you don't have this kind of capability to go down at the granular level and look at the types of data and where they're originated from and where they're going and look for anomalies in that, you're going to allow the hacker to take advantage of you because they use the element of stealth. The way they get that element of stealth is just like with the running of the bulls. They run with the bulls. If you don't have panoramic view of the area where the bulls are running, you won't be able to find that individual running between the bulls.
So it is very important to have the two things. One is the granular visibility, and also then the forensic information that you need, and you're collecting that forensic information. Because when you find the IP address, you know the ports, and you know which application is running on those ports, you pretty much lock down who the hacker is and what they're up to. So both visibility and forensics are important in order for you to have a handle over your infrastructure, your digital infrastructure. Otherwise, the hackers will just keep trying until they succeed. They only have to be right once. You have to be right all the time. So how are you right all the time? By having the proper tools to monitor and to respond.
Also, this kind of capability is very important for recovery because just like you know when someone is attacking you, you also know what parts of your infrastructure have been compromised. So then you can develop a strategy for recovery based on the parts that are still working on it, and that's very, very important for resilience. It's not enough just to put ten [inaudible] and say, "Okay, I'm going to lose three, so I still have seven left." That same hacker who has found three can find all ten and ruin you completely.
So it's really important to have this knowledge base on which to turn that data that you're getting into actionable intelligence. So we'll continue our dialogue in future webinars and bring innovative technologies to you so you can learn from them and see how they could potentially apply to your different environments. I would really encourage you to write to Dave Mitchell. You see his e-mail address, and start a dialogue. It will help them think about new approaches for their product. They're still an early stage company. They have customers that they're always open to learning new ideas. As well as get a better understanding of what's in the marketplace. If you see something out there that you feel like does this, tell them about it and let's start a dialogue. I really appreciate both of you coming in today.
Dave Mitchell: We appreciate the opportunity. Fantastic.
Erfan Ibrahim: Very good educational experience. I really feel like Arman has been reincarnated with alloy wheels.
Dave Mitchell: That is true.
Erfan Ibrahim: Yeah, I will send out a note about our next webinar presentation very soon, and you'll also get a link to the recording today as well as the slides in PDF format. So I thank you all for participating today. Enjoy your weekend, and I'm bringing this webinar to a close. Thank you very much.
Dave Mitchell: Thanks everyone on the phone. Appreciate it.