Build your foundation for advanced data analytics

02 August, 2017
Sumit Gupta
IBM

Massive amounts of data and multiple data sources provide today’s businesses with the opportunity to glean new insights, leading to creative business decisions and the potential for competitive advantage. Moreover, innovative organizations are using machine learning, cognitive applications, artificial intelligence (AI) and other advanced technologies to quickly extract that insight and react with agility to market changes and client behavior. But capitalizing on these opportunities depends on making the right IT infrastructure choices.

Suppose you want to boost sales by gaining insight into how your customers think and act while shopping, and then use those insights to deliver timely and location-based offers for discounting relevant products. You could analyze customers’ sentiments expressed on social media about their experiences, and analyze their shopping behavior online and at brick-and-mortar locations. Of course, this kind of analysis may require simultaneous capture of millions of customers’ experiences in real time. IT infrastructure that is capable of handling these kinds of analytics workloads needs to be fortified with the right compute and storage technology for rapidly storing and processing volumes of unstructured data.

Structuring digital IT for an analytics-driven business

Building the IT infrastructure foundation you need for this kind of data analysis often starts with choosing the right server to process your analytics workloads. Processing requirements tend to fall into three categories:

Large amounts of high-speed memory: Running an enterprise-scale database requires servers with an abundance of high-speed system memory and high-performance compute capabilities. Considerations include throughput requirements and database size and type. Does the organization require servers optimized for the SAP HANA analytics database? Should the servers be optimized for handling the new generation of open source databases such as MongoDB or EnterpriseDB?

Many tasks running simultaneously: Multiple computationally and data-intensive analyses require servers equipped with multicore, multi-threaded processors capable of parallel computing to execute multiple threads at the same time — reducing time-to-results. They may also use GPU hardware accelerators to deliver the high performance necessary for demanding applications such as predictive analytics and machine and deep learning.

Maximum performance at the cluster level: Supporting multiple users and enabling faster analytics and machine learning often requires the use of server clusters to provide the necessary compute power needed. Scale-out computing with GPU accelerators can maximize the performance of the entire server cluster.

IBM Power Systems servers running Linux are available to meet the requirements of each of these workload types. For example, Power Systems servers optimized for open source databases provide a 2x[1] price-performance advantage versus x86 systems on big data servers for MongoDB.

For workloads that can benefit from a GPU accelerator, IBM and NVIDIA have worked together to embed the NVLink high-speed interface between the IBM POWER8 CPU and the NVIDIA Pascal GPU. Power Systems servers with NVLink are designed to accelerate applications by rapidly offloading parts of the application and data to the GPU.

Making the most of every millisecond

Choosing the right storage is also important for infrastructure that handles high-performance analytics and supports enhanced data-driven decisions in real time. All-flash arrays such as IBM FlashSystem give companies the power to quickly mine deep insights and respond to customers quickly —and typically using fewer resources. All-flash arrays can help reduce latency, offer scalable performance for a wide range of big data workloads and help reduce operating expenses.

An optimized storage infrastructure is also essential to unlock the potential of data, but so is managing infrastructure growth cost-effectively. The IBM Spectrum Storage family of software-defined storage (SDS) solutions, such as the Elastic Storage Server (ESS), is designed to meet this challenge. IBM Spectrum Storage helps simplify IT by leveraging data deduplication and compression to help reduce the need for storage space. It also helps optimize data economics with intelligent data tiering, which can further relieve today’s IT budgeting pressures.

Organizations with clustered environments can increase system utilization and optimize application performance to speed results. At the same time, they need to make working with cluster resources easier for users. The IBM Power HPC Stack brings together multiple IBM software components to facilitate efficient infrastructure, workload management and application optimization.

Fortifying digital transformations

Becoming a digital business can be a path to success for organizations across a wide range of industries — and cognitive analytics applied to valuable data stores coupled with data from multiple sources can be the key to a competitive future. Your digital transformation requires building the infrastructure that can lay the technology foundation you need to enrich your analytics capabilities and energize creative business decision-making that aligns with your strategic objectives.

Learn more about how IBM servers and storage can fortify your IT infrastructure for high-performance analytics workloads.

[1] Results are based on IBM internal testing of single system running multiple virtual machines with Sysbench read only work load and are current as of October 18, 2015. Performance figures are based on running 24 M record scale factor per VM. Individual results will vary depending on individual workloads, configurations and conditions. IBM Power System S822LC; 20 cores / 160 threads, POWER8; 2.9 GHz, 256 GB memory PCIE2 2PT 10 1GB; MariaDB 10.0.19. Ubuntu 14.04.03, PowerKVM 3.1 compared to Competitive stack: HP Proliant DL380 Gen9; 24 cores / 48 threads; Intel E5-2690 v3; 2.6 GHz; 128 GB memory, MariaDB 10.0.20. Ubuntu 14.04.03, KVM. Each system was configured to run at similar per VM throughput levels and number of VMs were increased for each system until total system throughput showed maximum throughput levels. Competitive pricing was taken from available web-based pricing.

The post Build your foundation for advanced data analytics appeared first on IBM Systems Blog: In the Making.