Using DIY Automation to Hunt "Zombie" Servers

Nicola Peill-Moelter

Your development team may be your greatest untapped asset in the fight to identify and eliminate wasteful “zombie” servers.

Despite the availability of commercial, off-the-shelf products to monitor port traffic and server utilization performance to help identify underused servers, these tools may not be practical for some organizations to procure and deploy. Consider the following example from Akamai, an active member of The Green Grid who enlisted support from internal development teams to create automated tools that locate and take action on underused servers.

As a leading provider of Content Delivery Network (CDN) services for media, ecommerce and software delivery, and cloud security solutions, Akamai has a robust server lab. The infrastructure consists of more than 10,000 servers that perform mission-critical functions supporting a wide range of development and testing needs.

Akamai’s Asset Management Team developed a strategy that enables all lab assets to be monitored remotely for activity and server status. This capability facilitates optimized utilization of deployed assets, as well as cost accountability – a critical aspect for motivating responsible asset ownership. The Asset Monitoring Team also developed a cross-platform monitoring system that allows the organization to:

  • Identify assets that are likely to be zombies and enable owners to take action to repurpose, decommission, or reassign to a different owner
  • Analyze cost and accountability at owner, project, asset type, organization, geography, and lab levels
  • Provide asset and cost reporting that can be rolled up to management and business unit levels to hold development groups accountable

In addition, Akamai provides a dashboard where developers can see the machines under their assignment and the status of each. Yellow and red flags are used to identify machines that are likely to be underutilized for a period of three and six months, respectively.

Because Simple Network Management Protocol (SNMP) is not activated on Akamai’s lab machines, to glean CPU utilization and other standard server metrics, alternative indicators are used to assess inactivity. For example, machines are flagged if they:

  • Are unpingable and cannot be connected to via SSH (is not manageable)
  • Have an old installation date (is aged)
  • Have a rescue image installed (is in recovery)

A zombie summary report, including the total number of zombie servers, complete with associated OPEX and CAPEX costs, is generated for the owner. Rollup reports can be generated for a group of developers and at the business unit level to show the extent and cost of zombie servers. This provides downward pressure for developers and managers to take action to repurpose or decommission unused servers.

The Green Grid’s U.S. Utility Work Group recommends implementing a similar solution – zombie servers are a problem many companies don’t even know they have. If you think zombies might be eating your development budget, take a page from Akamai and root them out!

Get involved today by joining The Green Grid and contributing to the work on zombie servers.