In today’s world there is no room for IT problems, effective monitoring and alerting for distributed, interconnected systems is necessary for the successful operation of any organization. Proactive monitoring of servers, workstations, remote computers, Windows event logs and applications is critical to security, network performance and to the overall operation of the organization.
In this post I will address what we monitor, and some of the methods we use. We are currently monitoring over 3500 different devices, so doing that with the utmost efficiency is a must. There are several tools that we use for monitoring here at Compuquip, and I have blogged about LVRTG® before (a proprietary monitoring tool for email), and you have probably seen the very popular blog post about PRTG® (a commercial monitoring tool that lets us see anything from Net Flow to SNMP traps) but for the majority of monitoring and alerting we use Kaseya®. This management platform provides us with proactive, user defined monitoring and instant notification of problems. Best of all it allows us to manage all 3500+ endpoints, without pulling our hair out.
Our system provides five methods of monitoring, each with many monitoring functions, and so I am only going to be able touch on a few of them for each method:
- Alerts Section: This is where we are monitoring server event logs for errors or warnings, disk space, as well as hardware changes and the status of the system as a whole (online/offline). Severe errors or alerts generate a ticket and an immediate email to the NOC team, other events may only generate a ticket. I would estimate that about 25% of the alerts that we receive come from this method of alerting.
- Monitor Sets: Monitoring the performance state of a server is what gives us the ability to be proactive in solving our client’s problems, before they even become problems. Using monitoring sets, we create several groups of criteria that we believe would identify a problem on a server. So for example, on a Microsoft Exchange email server, we might apply a monitor set specific to Exchange servers. Within that monitor set we would have configured a number of issues that often indicate problems on an Exchange server. The set would have a threshold for the mail queues, alerting once x number of emails are queued up, and a check for the services and processes specific to Exchange, alerting if their states change. Monitor sets are highly customizable and make up about 60% of alerts.
- System Check: System checks allow us to monitor devices that don’t have a Kaseya agent, like a router, switch or even an application. Using this method we can setup a server that does have an agent installed, to do a “check” of another system or device. We use this type of monitoring to verify our client’s WAN (wide area network) is online. We setup ICMP message from one server at Customer site A to a router at customer site B, and vice versa. By customizing the alert message, when one of the sites does not respond we are able to tell instantly when a site has gone down. System checks also allow for website monitoring, and some additional custom monitoring.
The last two monitoring methods, Log Monitoring and SNMP Sets, are not used on every end point but instead are setup on a case by case basis. I am going to save these two for a later blog.
Once an alert is triggered several things happen:
- Email notification: Depending on the severity, an email may be sent out to the NOC team, for triage.
- Ticket generation: Regardless of the severity of the alert (they are all of concern or we would not be alerting on it), a ticket is created. At the instant an alert is generated, Kaseya interfaces with our ticketing system and enters a ticket under the correct company/client, and that ticket is given a status of new.
- Continuous alerting until resolution: After the first alert has come through the system our monitoring platform continues to monitor the issue and continues to alert on it, without creating a new ticket. It will continue to alert until the problem has been fixed and of course we will continue to monitor.
Stay tuned for my next post where I share how we actually respond to the alert/ticket.

