All you really need to know about open source databases

13 March, 2017
Rick Murphy
IBM

Perhaps you’ve heard about NoSQL and open source databases but are still wondering what it’s all about and if you should even care. Maybe you’ve looked online and found pages of technical stuff but don’t understand it?

If only there was one place to get all you really need to know…

Well there is, and you found it here!

Explaining the terms

The term NoSQL came up in 2009 when it was used to describe the emergence of new databases that that do not view data in strictly defined tables of rows and columns. It means non-relational databases.

Closed source means software whose source code is kept secret to prevent copying. Open source is the opposite — software whose source code is open and available for study, modification and even redistribution. Open source software is often free to download and use.

Summary: Open source databases are database systems whose source code is open source. An open source database could be relational (SQL) or non-relational (NoSQL).

Why should you care?

There are two forces at work in the database market today: the need for new applications and the need to lower costs. The need to lower costs doesn’t seem like anything new, but the need for new apps is driving the need to lower costs (they cost money to develop). So why are new applications needed?

With the advent of Web 2.0, static web pages have become dynamic and social media is all around us. Everyone is tweeting, posting, blogging, vlogging, sharing photos, chatting and commenting.

The Internet of Things (IoT) is emerging — a rapidly growing network of connected devices that collect and exchange data, such as sensors and smart devices. There are some great examples here.

Altogether, this generates huge amounts of new data that businesses want to absorb and use to stay ahead, to provide features such as product recommendations and a better customer experience. The data can be analyzed in search of patterns for applications such as fraud detection and behavior analytics.

Much of the new data is unstructured, which means that it can’t be neatly stored in a tabular database. Imagine trying to design a database to hold data on your grocery shopping — what you like, how often you buy it, whether you prefer milk or cream with your coffee.

New types of databases are needed to store the new data, and they need to be non-relational and ideally low cost. Ring any bells? Not relational as in NoSQL and low cost as in open source.

Types of NoSQL databases

We have seen that new data needs new databases, so it follows that a variety of new databases are needed to address the variety of new data and the applications that use it. The main types are listed here:

  • Key-value databases such as Redis store key and value data in memory for ultra-fast lookup. This page shows some use cases.
  • Document databases store document information. MongoDB is the best known and most widely used. This page shows some use cases.
  • Wide-column store databases are similar to key-value but allow a very large number of columns. They are well suited for analyzing huge data sets, and Cassandra is the best known. Use cases here.
  • Graph databases such as Neo4j are used to explore the relationships that link data together, allowing rapid execution of complex queries over millions of connections. Use cases include recommendations, social networks and fraud detection. This video is a great introduction.

Redis, MongoDB, Cassandra and Neo4j are all open source and there’s no cost to use them. Paid enterprise editions are available that include support and additional features. Even so, enterprise editions are much less expensive than traditional commercial databases.

Open source relational databases

Companies are looking for money in their IT budgets and discovering how much is spent on support and maintenance of traditional relational database systems. And it’s a lot. Estimates vary but some say up to 35 percent of software infrastructure spending. Switching to lower-cost open source software saves money, which is why an estimated 78 percent of enterprises use it, including open source databases:

  • MySQL is the world’s most popular open source relational database. It was acquired by Oracle in 2010, and Oracle now charges for support. A free “community” edition is still available.
  • MariaDB is a drop-in replacement for MySQL. Uncertain about MySQL’s future with Oracle, many users have migrated to MariaDB. Support subscriptions are available from Mariadb.com.
  • PostgresSQL has a strong reputation for reliability and data integrity. It’s feature-rich and is more robust and better performing than MySQL. The community edition is free.
  • PostgresPURE is available from Splendid Data on a subscription basis. It is built on PostgresSQL but with added tools and support to make an enterprise package.
  • EnterpriseDB (EDB) is also based on PostgreSQL but with additional features and tools, most notably Oracle compatibility features (which are closed source), enabling Oracle shops to transition to EDB more easily than to other PostgreSQL variants. EDB charge for these extras and support.

Free databases – too good to be true?

Get ready for a surprise: Open source databases aren’t really free, because businesses usually choose subscription editions with support. Okay, maybe that’s not too surprising. And here’s something else: businesses don’t really care if they are open source or not. Very few people really want to tinker with the source code.

But open source NoSQL databases enable innovation with new data, and open source relational databases are lower cost compared to traditional relational database management systems.

Open source databases on IBM Power Systems

All open source databases are available to run on x86. But what is available for companies that have already invested in or wish to invest in IBM Power Systems? MongoDB, EnterpriseDB, Redis, Cassandra, Neo4j and MariaDB are all available on IBM Power, and all are better performing compared to a similarly configured x86 system. In fact, IBM guarantees 2x price-performance over x86 for MongoDB and 1.8x for EnterpriseDB, which means that clients choosing IBM Power can expect better performance and less server sprawl.

This should cover the basics, but if you’d like to learn more about open source databases on IBM Power Systems, start here.

The post All you really need to know about open source databases appeared first on IBM Systems Blog: In the Making.