Practices, tools and expertise for better storage performance

31 October, 2019
Mohammed Mehdi Megzari
IBM

Houston, we have a problem. Call the storage experts, now!

When their customers complain about long wait times for online payments, lagging video streams and other impacted services, businesses tend to quickly and without prior analysis blame their storage appliances. While your data resides primarily in storage systems, that doesn’t necessarily mean that your storage systems are the root cause of data access slowness. Even acquiring the fastest full-flash storage system in the market might not solve the problem. The challenge lies in figuring out where the congestion resides in the IT infrastructure chain.

Good news and bad news on performance issues

Performance glitches can be transient or persistent. Their impact can range from creating a slower user experience to causing a total blackout. In either case, it’s essential to address IT performance issues so you can keep your business running and your customers happy.

The bad news is that IT performance problems can be tricky to track and solve due to the multiple layers involved in complex centralized IT infrastructures nowadays (SAN switches, storage arrays, servers, network switches, multipathing software, host bus adapter firmware and more) as well as the heterogeneity and scale of typical SAN and storage devices. This ecosystem is evolving drastically year after year to constantly improve performance and scalability, which adds more complexity to the equation.

The good news is that to address these challenges, you can turn to established practices, tools and experts in the field who are equipped to work on critical situations, helping you get back on track faster and with more confidence.

Let’s take a closer look at some of the most important assets and resources available to you when you’re frustrated by infrastructure performance issues.

Ensure you have the right tools for proactive storage monitoring

Managing and monitoring an IT infrastructure isn’t just about detecting hardware failures and repairing them. It’s about continuous supervision of how each component is behaving in terms of response time, bandwidth, usage percentage and so forth. It’s about capturing and analyzing unusual peaks of activity, errors and bottlenecks.

The majority of storage systems come with limited monitoring capabilities with regard to the supported metrics or the maximal historical data retention. IBM Spectrum Control is a more complete SAN and storage monitoring solution that can provide you with detailed, fine-grained and period-extended performance data on your IBM Storage environment. It’s equipped with an alerting server notifying you about everything going on in your SAN and storage infrastructure as well as with other advanced features.

IBM Spectrum Control can be combined with other tools at the server level to give you an end-to-end view. But none of those tools can help you if you don’t know where to look or how to interpret the collected data. So, below are some vital steps to take in your troubleshooting journey.

Four essential steps to help you pinpoint the problem

When you’re dealing with performance challenges in your IT environment, the following actions can help you determine the problem:

  1. Narrow down the problem by determining and enumerating the impacted servers, volumes and virtual machines (VMs). This helps you start your analysis with the focal point of the impact.
  2. Collect data from all the equipment within the impacted scope. Data collection gives you eyes into your environment. The richer and more accurate your data collection is, the clearer the picture will be. (Please note: This data collection will be useless if time is not synchronized across all your equipment. Time synchronization enables you to correlate trends and captured events between different devices and layers and follow a linear logic of cause-effect.)
  3. Extract performance data from your monitoring solutions. Performance monitoring tools at all levels (server, storage, networking) are a must-have; without them, you’re operating blindly.
  4. Reset all monitoring counters and wait for one to two hours. If the problem persists, run the data collection again; otherwise, wait for the next occurrence. Resetting helps you annihilate data from previous issues that could skew the current assessment.

When expert help is needed

Sometimes, you need expert insight to address tricky performance challenges. IBM Systems Lab Services has the skills and field experience to assist you in many ways:

  • Helping to spot the root cause more quickly
  • Collaborating with IBM Support and Development teams to help resolve your problems faster
  • Establishing automated data collection and a strategy for resetting the counters on your monitoring tool
  • Establishing automated alerting policies using IBM Spectrum Control at a granular level (volume, SAN switch port and so forth) to give you more advance notice of potential problems
  • Analyzing capacity and performance projections to know the limits of your infrastructure and avoid overload situations
  • Conducting stress tests to narrow down a performance bottleneck or determine the performance limits of the environment
  • Suggesting performance improvements and fine tuning in line with the latest IBM best practices
  • Conducting a firmware interoperability study to ensure the whole ecosystem is coherent and working in optimal condition

Prevention is better than a cure

The key to maintaining storage health is to proactively identify and prevent issues. We recommend regular SAN and storage health checks and assessments to ensure that you stay ahead of performance challenges, big and small.

If you’re interested in a storage infrastructure assessment from IBM Systems Lab Services, contact us today.

The post Practices, tools and expertise for better storage performance appeared first on IBM IT Infrastructure Blog.