How can I find IT problems that I don't even know about?
Posted by Mike Epplin on Wed, Dec 03, 2008 @ 01:20 PM
"I wake up, I see logs, I go to sleep I see logs. Even
my Alpha-Bits look like a log… " –Anonymous, Frustrated Security Engineer
As Director of Technical Services, I get to see a lot of
interesting log activity during sales calls to prospects. Recently I was
working with a company to implement a log management and monitoring solution,
and in the first day we were able to connect a fair number of devices,
including firewalls, IDS, and Anti-Virus systems. The next morning, I wanted to
show a couple of the engineers what the risk-based correlated alerts meant, and
picked a critical alert at random.
I drilled into the alert and noticed a couple of things
right away. This was for a denied connection for one internal source, on
a specific port, 6346. Performing a port lookup – by clicking on a button
I added that references an external site, enlightened me that this was a port
commonly used by LimeWire, a P2P file sharing client. The company
couldn’t understand why an internal, corporate server would be making these
types of connections. A quick check showed that this server had over 75
GB of shared MP3 files.
It turns out that a few employees decided to set up their
own private file sharing system for use among their friends, internally.
What wasn't anticipated by these employees, aside from getting busted by a
risk-based alerting system, was that the application would broadcast its
availability as a file sharing host to the network and that the firewall would
block it.
We also saw the same type of traffic - multiple denied
connections from an internal server trying to connect outbound resulting in a
risk-based alert - in another instance. The server that was initiating
the connection was an antivirus server and the host it was trying to connect to
was an update server. It turns out that a simple typo managed to block
this server from downloading virus updates for over 7 months. Again,
simple events that are missed by a manual log review are easily found by
correlating events into a single, risk-based, actionable alert.
Analyzing logs to find problems when you are aware of a
specific problem is not overly difficult, but in examples such as these – where
nobody was even aware of a problem – it is near impossible. And we're all aware
of the cost-savings associated with identifying IT issues before they escalate
and impact business.