Bill, the IT Director at a law firm, received the emergency call at 6am. A pipe had ruptured in the firm’s building and the server room was flooded. Bill knew it was going to be a long few days. First he’d have to repair or probably replace the hardware, spend hours rebuilding the system software, retrieve the right tapes from an offsite storage facility and then restore the data. Theoretically, it would take several long days of effort but should all work just fine. Unfortunately for Bill, reality set in. After replacing the hardware and installing the system software, many of the firm’s data tapes were unreadable and they lost 50% of their data. Read more »
Archive for the ‘Technical Education’ Category
In today’s world there is no room for IT problems, effective monitoring and alerting for distributed, interconnected systems is necessary for the successful operation of any organization. Proactive monitoring of servers, workstations, remote computers, Windows event logs and applications is critical to security, network performance and to the overall operation of the organization.
In this post I will address what we monitor, and some of the methods we use. We are currently monitoring over 3500 different devices, so doing that with the utmost efficiency is a must. There are several tools that we use for monitoring here at Compuquip, and I have blogged about LVRTG® before (a proprietary monitoring tool for email), and you have probably seen the very popular blog post about PRTG® (a commercial monitoring tool that lets us see anything from Net Flow to SNMP traps) but for the majority of monitoring and alerting we use Kaseya®. This management platform provides us with proactive, user defined monitoring and instant notification of problems. Best of all it allows us to manage all 3500+ endpoints, without pulling our hair out.
Our system provides five methods of monitoring, each with many monitoring functions, and so I am only going to be able touch on a few of them for each method:
- Alerts Section: This is where we are monitoring server event logs for errors or warnings, disk space, as well as hardware changes and the status of the system as a whole (online/offline). Severe errors or alerts generate a ticket and an immediate email to the NOC team, other events may only generate a ticket. I would estimate that about 25% of the alerts that we receive come from this method of alerting.
- Monitor Sets: Monitoring the performance state of a server is what gives us the ability to be proactive in solving our client’s problems, before they even become problems. Using monitoring sets, we create several groups of criteria that we believe would identify a problem on a server. So for example, on a Microsoft Exchange email server, we might apply a monitor set specific to Exchange servers. Within that monitor set we would have configured a number of issues that often indicate problems on an Exchange server. The set would have a threshold for the mail queues, alerting once x number of emails are queued up, and a check for the services and processes specific to Exchange, alerting if their states change. Monitor sets are highly customizable and make up about 60% of alerts.
- System Check: System checks allow us to monitor devices that don’t have a Kaseya agent, like a router, switch or even an application. Using this method we can setup a server that does have an agent installed, to do a “check” of another system or device. We use this type of monitoring to verify our client’s WAN (wide area network) is online. We setup ICMP message from one server at Customer site A to a router at customer site B, and vice versa. By customizing the alert message, when one of the sites does not respond we are able to tell instantly when a site has gone down. System checks also allow for website monitoring, and some additional custom monitoring.
The last two monitoring methods, Log Monitoring and SNMP Sets, are not used on every end point but instead are setup on a case by case basis. I am going to save these two for a later blog.
Once an alert is triggered several things happen:
- Email notification: Depending on the severity, an email may be sent out to the NOC team, for triage.
- Ticket generation: Regardless of the severity of the alert (they are all of concern or we would not be alerting on it), a ticket is created. At the instant an alert is generated, Kaseya interfaces with our ticketing system and enters a ticket under the correct company/client, and that ticket is given a status of new.
- Continuous alerting until resolution: After the first alert has come through the system our monitoring platform continues to monitor the issue and continues to alert on it, without creating a new ticket. It will continue to alert until the problem has been fixed and of course we will continue to monitor.
Stay tuned for my next post where I share how we actually respond to the alert/ticket.
NOC Services Team
The case for disk backup versus tape backup is clear. Tape-based backup systems, are built on decades old technology and can’t compete with modern disk-based backup systems. Disk-based backup systems offer superior performance in terms of: Read more »
The best data protection systems incorporate the ability to restore your most critical data first, then use a tiered approach to recovering less critical information. If you lose data or even an entire system, your solution should let you create strategies for recovering it in this manner, based on its business impact.
Today there are a variety of data protection options that help you recover quickly and potentially save you significant expense when restoring your business operations. Here are twelve best practices for managing your data backup and recovery.
1. Reliability. Up to 71% of restores from tape contain failures
Best Practice: Use disk-to-disk technology for backups
With disk-to-disk technology, your backup data resides on disk drives, proven to be far more reliable than tapes. When your backup completes, you know the data is secure and accessible on the disk drive. With tapes you never really know if your data is usable until you try to restore it, at which point it’s too late. Read more »
Marketing Manager
What’s more, they’ll have “one throat to choke” – a single point of contact for all their support needs. This trend is supported by a recent survey conducted by the Enterprise Strategies Group (ESG). In that survey, more than half of the respondents – both small and mid-size businesses – indicated that they “would prefer to rely on a single vendor for their data protection solutions whenever possible.”1
All-in-one DP solutions include all of the hardware and software necessary to get up and running quickly:
- Hardware: The hardware configuration can vary depending on the type of target customer. Solutions for SMBs are often a network attached storage (NAS) server with two or three terabytes of internal storage. Solutions for mid-size companies are typically server node(s) with SATA or Fibre Channel storage.
- Software: All-in-one DP solutions usually come with backup and recovery software. The software may include agents to protect each client, plug-ins for granular protection of applications, management and reporting tools, and a server component that handles the backend processing. Some solutions also include data duplication, compression, encryption, bare-metal recovery, and continuous data protection (CDP) functionality. Depending on the particular solution and vendor, these may cost extra.
- Support & Warranty: All-in-one DP solutions will typically include support for the backup application and warranty coverage for the hardware. Additional service offerings are almost always available. These can include 4-hour onsite response for hardware repair, training and certification programs, disaster recovery planning and testing, and software implementation. Read more »
Marketing Manager
