What is Database Benchmarking?

What is Database Benchmarking?

Database Benchmarking is a clearly defined method to analyse and compare the performance characteristics of databases. It was developed in the 1980s and now, with the increase in data volume and the diversity of database technologies and cloud providers, it is gaining more importance day by day. The performance and scalability of databases is a more than up-to-date issue in IT departments worldwide.  If the database is Cloud-hosted additional high-level features like availability, elasticity and cloud cost optimization have to be considered, as well. 

Cloud Database Benchmarking can be the answer to these issues and deliver reliable measures and KPIs to make profound, quantitative decisions and optimizations regarding your database configuration. Now, in 2021, a solution to enable efficient usage of cloud-based databases with Cloud Database Benchmarking seems within one’s reach.

This article shows you the most important key figures of Database Benchmarking. Furthermore, it explains how to set-up a Database Benchmarking process, regarding the difficulties and possible solutions. It is focussing on modern Cloud-based Database solutions and can be applied not only to SQL but also to NoSQL and NewSQL databases.

Dr. Jörg Domaschka

Author: Dr. Jörg Domaschka

Dr. Jörg Domaschka has published several research papers about Cloud Database Benchmarking during his scientific research about Cloud and Distributed Databases at Ulm University, Germany. His affinity for avoiding repetitive tasks and performance engineering is urges himself to develop up-to-date, automated solutions for daily issues in IT departments.

Key Figures of
(Cloud) Database Benchmarking

To understand the full method and get all the key figures of modern and Cloud-related Database Benchmarking we start in theory and proceed step-by-step to the modern Database Benchmarking.

  1. The Overall Benchmarking Concept
  2.  Traditional Database Benchmarking
  3.  Cloud Database Benchmarking

The Overall Benchmarking Concept

The general understanding of Benchmarking is the measuring and comparing performance dimensions to industry best or best practices in a clearly defined method. It is commonly used to compare business processes and technical products, especially in IT i.e. CPU or GPU.

The methodology of Benchmarking is neither new nor standardized. There are several well-known approaches with a different focus, like SWOT analysis or potential analysis. The results are usually presented in portfolios or diagrams.

Finally, all benchmark techniques can be generalized to the following common steps:

The 4 Steps of Benchmarking

  1. Define your model by setting a goal, the process, and your constraints
  2. Identify all related entities/sources/artefacts and data
  3. Measure or calculate all options
  4. Compare the results to the identified benchmark

Yes, this looks very generic. And often you can find more steps or a more detailed process with more steps. But in the end, it is always the same generic procedure, which has to be adjusted in detail to the given situation.

Because benchmarking seems so generic and equally, but also analytical and logical, benchmarking enjoys a high profile in many fields.

What is the importance of Benchmarking?

The goal of benchmarking is to establish a continuous improvement process to compare different solutions and make better solutions. In many professional fields of work, such an improvement process is useful and necessary. At the same time, an open generic approach usually provides quantitative and comparable results that enables making well-founded, reliable decisions and deriving options for action.

 

When talking about Databases (rather Database Management Systems, DBMS) Benchmarking is applied as follows.

Traditional Database Benchmarking

The first steps in Database Benchmarking have been started in the late 1980s with the rise of relational databases.

In 1993 Jim Gray published the first handbook about Database Benchmarking, the „Database and Transaction Processing Performance Handbook“. It became a kind of standard lecture for early performance engineering and set the domain-specific benchmark criteria for databases. Gray defines the goal of database benchmarks as follows: „What computer should I buy?“. And he gives the obvious answer: „The system that does the job with the lowest cost-of-ownership”. To get to this point you need to benchmark all possible options.

Gray 1993: Database Benchmarking

  1. Goal: What computer does the job and has the lowest TCO? 
  2. What workload do I have?
  3. Measure the database throughput on every computer.
  4. What option has the best throughput/cost ratio?

This guideline stresses the first really important artifact for database benchmarks: the workload. You need to know, what workload your database system will need to operate to find the best fitting option. In the easiest case, the workload differs between Create, Read, Update and Delete (CRUD). The characteristics of these operations significantly impact the results of throughput and latency performance of a benchmark. Transaction processing introduces a further dimension to the problem space.

At the same time as Gray publishes his guidelines, the „Transaction Processing Performance Council“ (TPC) is founded and defines benchmarks for transaction processing and database domains. The TPC is still active and defines benchmarks for modern user workload profiles. Benchmarks in this context are workloads applied to databases as defined.

Definition: Workload

A Workload consists of a sequence of operations issued on a database. The workload defines the variance of operations as well as access patterns. The operations may be processed by time or by sequence. Workloads are either based on trace-based data or synthetic data, which is modeled after real-world load patterns and generated by a set of configurable constraints.

Definition: Benchmark

A Benchmark applies workloads on a database. In addition, it provides features such as metric reporting, metric processing and the coordinated execution of multiple workloads. A single benchmark can provide different workload types that are commonly classified into OLTP, HTAP and OLAP workloads.

Here is a short list of current Database Benchmarks, provided by the TPC:

In the 80s and 90s, only a few different SQL databases existed: OracleDB, IBMs DB2, PostresSQL, or Microsoft SQL Server. The number of possible options to benchmark was small and the technology relatively uniform. In the decate to come, this changed significantly.

In the late 2000s non-relational NoSQL and, more recently, NewSQL databases were developed. These had a completely different technical structure and offered different, less rigid, data models; often they also sacrifcied transactional functionality. The traditional benchmarks could no longer be applied to these databses and had to be adapted individually. Most of the TPC benchmarks are not applicable to NoSQL databases at all because of the heterogeneous data models and changed API.

With the number of database providers increasing, also the configuration possibilities increased and got more and more individual. These factors led to an increase in complexity and effort for the Database Benchmarking procedure, too.

The website „db-engines.com“ lists a ranking of over 360 different database systems (January 2021), often with multiple data models. The ranking is based on social media and search popularity, not on performance measures.

DB-Engines creates a DBMS ranking based on popularity, it lacks performance data from database benchmarks

Of course, not all existing databases suit a specific use case, but there are still plenty of fitting options for most usage scenarios. Yet, most importantly, database systems have evolved that are specialized for specific use cases. The path to polyglot persistency, in which different databases suitable for the respective problem are operated in one application, seemed to be clear. Nevertheless, many traditional relational databases continue to enjoy seemingly oversized popularity according to the above ranking.

What could be the reason for this?

Are the new databases too difficult to access?

Is there a lack of experience or a lack of references?

Is there no need to change to a better fitting solution?

What do you think?

At this time we had a lot of different databases, but at least only one fix-installed server infrastructure for maybe 3 to 5 years until the next generation. Well, you know, that changed and is still changing, as well. Let us see how Cloud Computing inflicts Database Benchmarking heavily and corrupted the existing Database Benchmarking approaches.

Cloud Database Benchmarking

NoSQL and NewSQL databases developed in parallel to Cloud Computing. Each of these technologies is a success story on its own, but even more the combination of these two. Together they provide a high grade of flexibility and enabled new data-intensive applications.

For all performance engineers and traditional database benchmarking procedures, procedures the symbiosis of both technologies marked the beginning of a hard time. Originally, it was possible to benchmark the database totally independent from the infrastructure components. First, it seems that these two benchmarking domains continue to stay separate, with no dependencies.

But yes, that would have been too easy. Scientific research shows that database benchmarking can significantly differ on comparable, but different Cloud providers. In the following diagram, the Cassandra database was benchmarked on comparable OpenStack and AWS EC2 resources. The measured performance was different, especially with a higher number of nodes. These results were measured and published in the scientific research paper „Mowgli: Finding Your Way in the DBMS Jungle“. Mowgli provides a functional Cloud Database Benchmarking framework. More information about Mowgli will be provided in Chapter 5.

Cassandra database was benchmarked on compareable Cloud ressources
Cassandra shows different performances on compareable Cloud ressources in a Cloud Database Benchmark

To establish a reliable database benchmarking with Cloud resources you need not only to regard the database and your workflow, but also the specific Cloud provider and Cloud configuration.

The 6 Dimensions of Cloud Database Benchmarking

Cloud Provider
Database Provider
Benchmark
Cloud Configuration
Database Configuration
Benchmark Settings

So, two independent benchmarking domains need to be combined. In science, the first steps were made in the last five years by only a few researchers worldwide. A professional commercial solution has not been developed until now. A few companies have built internal scripts and workflows for semi-automated Cloud Database Benchmarking such as MongoDB and CouchbaseDB, but only for a small range of possible resources.

Closing this gap in performance engineering and enabling efficient usage of modern database systems in the Cloud is the challenging main vision of our project. Besides focussing exclusively on performance, Cloud Database Benchmarking also puts other metrics like scalability, availability, elasticity, and cloud costs in the foreground. They are becoming important and need to be measured and calculated as well in the benchmarks.

> 400

SQL / NoSQL / NewSQL databases

> 100

Runtime configurations per database

> 20.000

Cloud providers & resources

> 800 million

possible configurations

Conclusion - Chapter 1

This first introduction chapter put all of us on the same base level. At least, I hope so. And it also delivered some main facts, which I want to summarize here.

Recap: Key Figures of Database Benchmarking

  1. Database Benchmarking is an established benchmarking domain since the early 1990s.
  2. The specific CRUD workload is a key artifact for benchmarking like the database itself.
  3. For Cloud Database Benchmarking also the Cloud resources need to be considered.
  4. Cloud Database Benchmarking is in the early steps and not established in the daily business of IT departments.

At this point, I refer „Database Benchmarking“ to Databases in Cloud environments. As of 2021, Cloud Computing and NoSQL database systems are key technologies and have become an industry-standard in many areas. The following chapters will address many Cloud-related topics. Yet,even if you operate databases in a non-Cloud infrastructure a lot of the following pieces of information can be useful for you and applied for or transferred to your situation.

Why do you Need
Cloud Database Benchmarking?

There are seven different scenarios where Database Benchmarking can make a heavy impact in IT departments if it is applied for initial decision making or for a continuous improvement process.

In all these cases a standardized Database Benchmarking process will allow making smarter, more reliable, and more efficient decisions and improvements. This holds for traditional server setups and even more for Cloud infrastructures.

Finding the ideal Database

As stated before, the database market has been growing significantly over the last decade. By now, there are over 250 database systems with industrial maturity. A lot of them are specialized for defined purposes such as time-series BigData analysis. Hence, for any use case, you end up with 10 to 20 database systems fit for your purpose.

Despite this reduction, the question remains which of these is the best choice for your workload and infrastructure settings? Cloud Database Benchmarking, if done right, will deliver reliable measurements and therefore lay the basis for an objective and quantitative decision.

Finding the ideal Cloud Resource

The other side of the coin is to find the fitting Cloud resource for your preferred database and optimize your internal infrastructure properly to your requirements. 

Choose between AWS, GCP, MS Azure, or maybe a European GDPR compliant provider; and once that is done, which VMs sizes and internal storage should you choose for the workload? Again, Database Benchmarking provides the basis for an objective and quantitative decision.

Finding the ideal Cloud & Database Combination

Unfortunately, research in the last few years has shown that you cannot choose your database system independently from your cloud provider. This is true for all non-functional properties including performance, availability, and scalability. The reason for this is that Database benchmark results can significantly depend on the chosen Cloud provider. One database can perform very well on one provider, but worse on the other. And with another database or workload, it can be the other way around.

By now, research has not reached a point where reliable predictions are possible. Yet, Cloud Database Benchmarking can reveal such correlations.

Be aware of Changes due to Version Updates

Database providers release about 4 main version releases per year. Cloud providers regularly renew parts of their server and service landscape. All this can have an impact, even negative, on the performance or scalability of the configuration.

Routinely performing cloud database benchmarking detects such changes and allows taking countermeasures before they impact customers and business.

 

Planning and Stress-Testing

Cloud database configuration requirements change as businesses grow or seasonal effects kick-in such as Black Friday shopping. What is more, the limits and capabilities of the current configuration are often not known by the operators. Actively planning and stress testing the current configuration enables operating the system with confidence and reacting to changes early enough.

Tuning and Tweaking

Many IT departments face the need to regularly improve their database settings to reduce existing problems, work around bottlenecks, or mitigate foreseen problems. This time-intense, error-prone, and manual work of „Adjust“, „Observe“ and „React“ can be systematically solved by Cloud Database Benchmarking. Automatically deriving and optimal setting requires less time, is less error-prone, and is based on objective numbers.

Right Scaling

In the cloud, it is possible to react flexibly to changes in requirements and environment. This freedom makes it possible to use the resources efficiently as needed. On the other hand, being efficient comes with the obligation to constantly make adjustments in order not to operate an oversized configuration and have too high Cloud costs or suffer from the lack of performance in an undersized set-up.

Through continuous Cloud Database Benchmarking, this optimization can take place constantly.

Conclusion - Chapter 2

The points shown illustrate the diverse, practical decision-making and optimization potential of Database Benchmarking.

Recap: Why do you Need Cloud Database Benchmarking

  1. Finding the ideal Database provider and configuration
  2. Finding the ideal Cloud provider and configuration
  3. Finding the ideal Cloud and database combination
  4. Be aware of changes due to version updates
  5. Planning and stress testing
  6. Tuning and tweaking
  7. Right scaling

There are certainly other areas in which Cloud Database Benchmarking can be used, too. Do you see any other areas of use? Feel free to post them as comments.

With an efficiently designed, automated cloud database framework, IT departments are able to generate technical and business value in enterprises. In the next section, we investigate how such a framework can set up in practice!

How do you perform
Database Benchmarking?

Database benchmarking has great potential if applied correctly. In the following, we address the questions what “correctly” means here, in particular what steps are necessary to set up a benchmarking process and what has to be done and considered in these steps.  For both Cloud and non-Cloud infrastructures, this chapter shows the necessary steps for practical implementation. Steps 1 and 2 of the overall benchmarking concept are integrated into „Design“.

  1. Design
  2. Execution
  3. Analysis
A 3-step framework for Cloud Database Benchmarking
A 3-step framework for Cloud Database Benchmarking: Design - Execution - Analysis (based on the Mowgli paper)

Step 1: Design

The „Design“ step aims at setting the benchmarking objective and the entire process design including the measurements, the restriction of the resources to the process decisions. The following topics must be clarified:

  1. Which optimization goals such as performance, scalability, availability, elasticity, and cost should be targeted and which system and business metrics need to be measured for that purpose?
  2. What workload read/write distribution, number of operations, size of date items, etc. is expected or given?
  3. Which benchmark can be used for my workload?
  4. Which databases and server/cloud resources are potential options that should be included in the benchmark?
  5. How should my process be structured? How often will it need it (now and in the future)? Which steps should be automated?
  6. How many measurements should be run per configuration to get statistically reliable results?

Initially, without experience, these questions can only be insufficiently answered and will likely not yield an optimal result. Therefore, it makes sense to approach the goal through test runs of the entire database benchmarking process in several iterations; similar to the procedure of agile software development. 

Depending on the infrastructure (Cloud vs Non-Cloud), there are some differences to consider.

Non-Cloud

  1. Using physical servers limit the benchmarking potential, as practically only a few servers can be tested. This refers to both the amount of servers and the types of servers.
  2. Servers often must be purchased on the basis of the specs given by a manufacturer; often server and database even come as an appliance.
  3. Scalability and costs are defined by the choice and amount of available servers.
  4. Performance, performance/dollars, performance/watt are obvious KPIs. Availability can be estimated to some extent as well.

Cloud

  1. The objective is much more complex and diverse due to the high flexibility and volatility of cloud infrastructures.
  2. The large amount of cloud offerings and at conceptionally flexibly changeable cloud resources introduces a much larger selection space.
  3. Cloud resources and databases must necessarily be measured together (see above) leading to an even larger selection space.
  4. A much more profound and detailed thematic domain knowledge is required to reduce the options.

Step 2: Execution

The execution of the Cloud Database Benchmarking is strongly related to the decisions made in step 1. They determine the required effort and the resulting data volumes for the measured configurations. Based on this, the degree of automation of the benchmarking process must be derived. 

A sophisticated, non-invasive benchmark process is required to avoid corrupting the measurement results. Do not store the measurement results in the same database that you are currently measuring. Do not run the benchmarking software on the same resource you want to benchmark.

Non-Cloud

  1. Wiring, installation, and setup of each server must be done manually.
  2. The installation and setup of each database are also done manually if it is freely selectable on the server.
  3. This is mostly done via the command line and without UI. Parts of these steps can be automated by relying on tools such as Chef or Ansible. This, however, requires preparing the necessary scripts beforehand.
  4. At least a partial automation of the measurement and data preparation seems to make sense in order to obtain reliable and reproducible data efficiently.

Cloud

  1. Accounts and API keys (as well as credit card information) are required for Cloud resources.
  2. Installation and configuration of resources can be done via templates or UIs.
  3. Wiring the components in the Cloud is more dynamic and in consequence, more complicated to handle.
  4. Automation with a cloud-native deployment is almost mandatory to handle changing IPs and the multitude of combinations.
Mowgli offers a potential Cloud Database Architecture

Mowgli, a Multi-objective DBMS Evaluation Framework, shows a potential Cloud Database Benchmarking architecture. Its high technical complexity is necessary to realise an automated process with multiple benefits, including multi-cloud and multi-DBMS support.

Step 3: Analysis

The third step is the analysis of the measurement results. Here, the measurement results must be statistically processed and then put into a presentation that allows the results of the different configurations to be compared. The calculation of comparable KPIs or data visualization is a proven approach here. The complexity of the analysis depends strongly on the objective, the number of measurement results, and the distribution of the results.

Non-Cloud

  1. Fewer measured configurations, fewer fluctuations in the infrastructure, and a 1-dimensional target variable usually make the analysis much easier.
  2. The visualization of the results in a 2-dimensional chart should not be a problem if this should be necessary at all.

Cloud

  1. The complexity of cloud database benchmarking is also present in the analysis phase and requires data analysis specialists who can handle big, multi-dimensional data.
  2. Fluctuating measurement results with outliers, multi-dimensional targets of multiple, diverse configurations have to be handled with modern data science concepts.
  3. A multi-dimensional visualization or a complex scoring system is a crucial prerequisite for efficient comparison.

Conclusion - Chapter 3

The Database Benchmarking process follows the same steps for non-Cloud and Cloud. Yet, due to the flexible Cloud infrastructure there are several differences in detail. Particularly, far more configuration options are available in the Cloud leading a higher complexity for Cloud Database Benchmarking

Recap: How do you Perform Cloud Database Benchmarking?

  1. Designing a benchmarking process is complex. It is best to approach the goal in several iterations.
  2. Non-Cloud: The limitations of the infrastructure lead to fewer possible options, less need for automation, and fewer data to analyse. But also to a smaller probability to find the best or even a really good configuration.
  3. Cloud: Millions of possible combinations make the whole process challenging. A high level of prior knowledge, automation, and data analysis skills are required.

Some difficulties of the Cloud Database Benchmarking process need to be explained in more detail and are worth looking at in the next chapter.

Do you have questions? 

Or is there missing something? 

Use the comment section or contact us directly.

What are the Difficulties of
Cloud Database Benchmarking?

In the previous chapter, we looked at the Database Benchmarking process in general. In the following, we take a closer look at points that may seem complicated and elaborate.

Iterations, Iterations, Iterations

One of the greatest challenges in Cloud Database Benchmarking is the incredible number of almost one billion possible configuration options. In order to get results in finite time, this huge selection space needs to be narrowed down in advance by acquiring knowledge and „educated guessing“. Yet, even after that, it will in almost no case be possible to benchmark all remaining options. In iterative runs with „trial measurements“, the option set must be further reduced and narrowed down iteratively approaching the. Only in the last iteration step you measure and compare very similar configuration options against each other.

Cloud database benchmarking cannot replace the thematic study and acquisition of technology-specific knowledge about databases and cloud resources, but it can help to make an optimal decision based on reliable information in iterative runs.

Initial Setup Effort

Creating a reliable, production-ready cloud database process for the first time is daunting. First, a lot of knowledge must be acquired about databases, cloud resources, and required software deployment steps. Then, all desired cloud resources must be connected and all desired databases must be set up with their specifics.

The effort required for these steps cannot be underestimated and makes the creation of a functioning, reliable cloud database framework to a long-lasting, cross-domain software project.

Data Quality & Resilience

The benchmarking objective needs to make clear which data is required and hence, needs to be measured. In addition to performance and scalability indicators, other parameters such as elasticity or availability may also be required. These demands must be matched with the data that can be measured or derived or calculated from measurements. 

At the same time, the performance engineer needs to ensure that the measurements performed for the Database Benchmark do not falsify the results. Non-invasive measurement methods must be used; measurements of the same configuration must also be performed several times; and any interference effects and statistical outliers (often a consequence of utilization) must be considered in the analysis in one way or the other.

Data Preparation & Discussion

The required number of iterations, the demands for data quality in combination with the number of Database Benchmark configurations result in a large amount of data. As mentioned, these must be processed statistically. At the same time they need to be worked up such that the results are easily graspable and can be discussed and compared efficiently.

Often it is not possible to compare the measurement data, usually volatile time series data, in a meaningful way. Visualization as a graph or the calculation of comparable, solid, reliable KPIs is necessary and requires knowledge in this area of analysis. The complexity of this area should not be underestimated, as it is not a typical activity of IT teams.

Ongoing Market Change

Both NoSQL databases and Cloud Computing, are key technologies that undergo rapid changes in technology capabilities and market participants. This volatile market makes it necessary to constantly keep up with new features and new providers. Version updates during this technology phase can also have an extreme impact on the KPIs of the entire system, and not always in a positive way as former updates have shown in some cases.

Constant cloud database benchmarking makes sense in order to detect changes in the KPIs and hence minimize risks.

Conclusion - Chapter 4

Cloud Database Benchmarking is not „rocket science“, but a lot of knowledge in various areas is necessary to do it right. It is nothing you can do here and there. A professional process has to be created to master the difficulties.

Recap: What are the Difficulties of Cloud Database Benchmarking?

  1. Background knowledge and iterations of Benchmark runs are necessary to approach a good result.
  2. The initial setup is resource-intensive. In most cases, even a continuous (periodic) Database Benchmarking process is required.
  3. The measured data needs to be reliable and resilient. Often it is necessary to use statistical methods and visualizations to create comparable results.

A lot of IT departments have already faced these challenges and tried to establish a usable Cloud Database Benchmarking solution.

Let’s have a look at the available tools and solutions in the next chapter.

What Database Benchmarking Tools and Solutions exist?

As shown in the first chapter, traditional Database Benchmarking has been around in the industry for more than 30 years. At this time, some solutions for classic databases and non-Cloud infrastructures were developed and are available on the market.

On the contrary, Cloud Database Benchmarking is a new and even more challenging field. Despite that, the first approaches and adaptions of the traditional Database Benchmarking solutions are made already.  We have a look at the 6 tools and solutions.

  1. TPC
  2. YCSB
  3. Github Database Benchmark List
  4. SPEC-RG: Mowgli
  5. Cloud Database Provider Benchmark Results
  6. Baas-Project: Cloud Database Decision Platform

Transaction Processing Performance Council (TPC)

The first solution to be mentioned here and already mentioned in the first chapter is the non-profit IT consortium TPC. In addition to providing benchmarks (= prepared workload patterns, see definition above) for performing Database Benchmarks, the TPC also publishes benchmarking results based on them.

The consortium consists mainly of large server manufacturers such as IBM, DELL, Oracle, … and performance engineering IT consulting companies.

The published results are usually commissioned by server vendors and serve as a selling point for their server database bundles and now partly also for cloud clusters and NoSQL databases. The target metrics of these benchmarks are usually Performance, Price/Performance, and Watts/Performance.

The source code of the benchmarks is partly publicly available and can be downloaded from the website. The same applies to the results, which are available in a ranking table on the website and can also be downloaded as text files.

Results of TPC-C Database Benchmarks

Yahoo! Cloud Serving Benchmark (YCSB)

Originally developed in Yahoo’s research department, the benchmark is now an open-source community project. It can now be regarded as THE NoSQL database benchmark.

It has already been used in various research studies and is used by many NoSQL database vendors as a basis for conducting marketing benchmarks.

Various database benchmarking results exist on older versions of MongoDB, Cassandra, Couchbase, Apache HBase, and other NoSQL databases. The „vendor-independent comparison of NoSQL databases“ by Sergey Bushik of Altoros on Network World, which is already almost 10 years old, should be highlighted. This benchmarking measured different workloads on different NoSQL databases on the same AWS Instance and comes to the conclusion that there is no „perfect“ NoSQL database and that each has its advantages and disadvantages, depending on your preferences. 5 different workloads were designed in the YCSB benchmark.

YCSB provides an important underlying component for a Cloud Database Benchmarking framework. Beyond that, it does not provide a deployable framework or final solution.

Github Database Benchmark List

On Github, there is a list of different DB benchmark results from various sources. The results mostly look at the performance of NoSQL databases in the cloud and non-cloud infrastructures.

Most of these benchmark results were created between 2009 and 2018 and should no longer be considered current. However, they can provide a first orientation for a pre-selection of suitable databases.

Furthermore, they show different approaches to Database Benchmarking and presentation options. This information is certainly worth a glance if not more. 

SPEC-RG: Mowgli

The Standard Performance Evaluation Group (SPEC) is a non-profit corporation dedicated to the development and management of hardware and software benchmarks. Part of SPEC is the Research Group (SPEC-RG), which is dedicated to the collaborative collection of research results on quantitative evaluation and analysis techniques.

BaaS member Daniel Seybold developed the cloud database framework „Mowgli“ as part of his PhD thesis and made it available to the general public on the SPEC-RG platform under an Apache 2.0 License.

Mowgli provides a fully functional and stable Cloud Database Benchmark Framework. It is able to benchmark various NoSQL databases such as Apache Cassandra, CockroachDB, or Mongo DB on AWS EC2 and OpenStack instances. It supports YCSB and TPC-C as benchmarks.

At the same time, the framework is built in such a way that other NoSQL databases and cloud providers can be integrated. It also provides a rudimentary web interface and REST-APIs for easy integration.

Several benchmarking results have been created and published based on Mowgli. Mowgli is maybe the best basis to start Cloud Database Benchmarking, right now.

The Mowgli Cloud Database Benchmarking Frameworks is able to do perforamnce and scalability measures.
A Scalability Evaluation of Cassandra and Couchbase measured with the Mowgli Cloud Database Benchmarking Framework.

Cloud Database Provider Benchmark Results

In the last year, a few database providers have started to develop a Cloud Database Benchmarking framework for their needs on their own. MongoDB, as the largest NoSQL database at present, and CouchbaseDB should be mentioned at this point. On the one hand, these tools are used to intensively measure new versions and compare them with results of previous versions. On the other hand, they are used to compare the manufacturer’s database against direct competitors. Based on this, publications for marketing and sales are created.

The publications of benchmarking results from database providers should be taken with a grain of salt. Neither the objectivity nor the reliability of the results is guaranteed, nor is the benchmarking framework publicly available and verifiable. It is very likely that only selected favourable results are published and benchmarks are designed to favour their own strengths.

There are also some Cloud Database Benchmarking whitepapers with benchmarking results from independent IT consulting companies such as Altoros, which create and publish these benchmarks on behalf of the database providers. These results should also not be generalized and need to be considered with a grain of salt, as the purpose of these white papers is also clearly aimed at marketing and sales.

BaaS-Project: Cloud Database Decision Platform

Based on the Mowgli framework presented above, we are developing an on-demand Cloud Database Decision Platform that will allow everyone to autonomously measure all available database and cloud resources against different benchmarks and compare the results. The platform is designed to be used on your own and will provide all necessary pieces of information and a modern user experience for easy and comfortable decision-making.

A continuous Cloud Database benchmarking process adds a lot of value to enterprise IT teams, highlight potentials, and minimize risks. The currently widespread IT misalignment in the area of databases and Cloud resources can be actively combated and reduced, thus reducing technical problems and Cloud costs.

A first Alpha release, including a more catchy name, is planned for Q2-2021. More information about the project, features, and benefits can be found on this website. 

Login: First step of the Cloud Database Benhcmarking Decision Platform
1. Step: Login to the on-demand Cloud Database Decision Platform online
Workload: Second step of the Cloud Database Benchmarking Decision Platform
2. Step. Select your Workload CRUD mix and adjust it to your requirements
Resources: Third step of the Cloud Database Benchmarking Decision Platform
3. Step: Select the Cloud and database resources you want to benchmark
Benchmarking: Fourth step of the Cloud Database Benchmarking Decision Platform
4. Step: Checkout and let the automated Measurement and Data Preparation process start
Results: Last step of the Cloud Database Benchmarking Decision Platform
5. Step: Get the Key Results analyse them online and make your decision

Conclusion - Chapter 5

Cloud Database Benchmarking is a relatively new niche area in the big IT Universum. Existing solutions are not applicable to the masses, yet.

Recap:What Database Benchmarking Tools and Solutions exist?​

  1. TPC-C and YCSP are standard benchmarks, which can be adjusted and used in Cloud Database Benchmarking.
  2. Reliable Benchmarking results and solutions, which fit specific needs, are not existing.
  3. The Cloud Database Decision Platform will help to measure and select the best configuration for specific needs. An Alpha Version will be released in Q2-2021.

In the last chapter, we will recap the whole article and focus on the key insights.

Conclusion & Prospects

Thanks for making it through the entire article. In case you skipped some parts, let’s summarise the core insights of the whole article.

Cloud Database Benchmarking is meaningful!

Database benchmarking has been in use for over 30 years. It enables better decisions and unveils optimization potential. The Cloud has added new complexity, but also a whole new meaning to Database Benchmarking. In addition to the database, the Cloud infrastructure must also be measured in order to obtain meaningful results. 

A continuous Cloud Database benchmarking process adds a lot of value to enterprise IT teams, highlight potentials, and minimize risks. The currently widespread IT misalignment in the area of databases and Cloud resources can be actively combated and reduced, thus reducing technical problems and Cloud costs.

Cloud Database Benchmarking is challenging!

Database Benchmarking is an open method – from goal setting to implementation and discussing the results. This flexibility introduces many open questions and uncertainties at the beginning. Extensive knowledge and interdisciplinary skills are needed to create a continuous benchmarking process.

The benchmarking objectives can range from Performance and Scalability to Costs and cloud-native KPIs such as Availability and Elasticity. The pre-selection of suitable databases and Cloud resources is already extensive due to the huge number of different providers and database products. Identifying and designing a suitable workload for a use case is a major challenge even for experienced database engineers. The creation of reliable measurement data, preparation and visualization further requires data-analytical skills.

However, those who can overcome these challenges have more reliable information for better decisions regarding the database and/or Cloud infrastructure and can create significant benefits for their company.

Cloud Database Benchmarking is not well established, yet!

Currently, there are only a few current sources of information, partial solutions, and published results in the area of Cloud Database Benchmarking. Currently, there is no easy-to-use publicly accessible software solution for Cloud Database Benchmarking available nor exists as repository with up-to-date KPIs for various DBMS.

Anyone who currently has to take a decision on the selection or optimization of a cloud-hosted database can only do so on the basis of marketing information from the manufacturers or set up a Cloud Database Benchmarking process themselves at great expense. The open-source framework Mowgli can be a good basis for this. However, it also requires training and the integration of the desired database and Cloud resources but offers a reliable setup.

A modern solution is in sight!

In Q2-2021 our on-demand Cloud Database Decision Platform will be available in an early Alpha release stadium.

The platform will enable a professional Cloud Database Benchmarking as a modern, web-based, easily accessible software solution. Thus, fast, independent, and better decisions can be made. Besides the easy selection of database and cloud resources and predefined, but customizable workloads, the presentation of results is a core element of the solution.

Based on more than 20 years of research experience and as a further development of the Mowgli framework, the Cloud Database Decision Platform is a contemporary product that will be able to meet the importance and the need for better IT decisions. It will enable technically more suitable configurations, more efficient use of resources, and great potential savings in cloud expenses.

Right now we are in the process of designing our platform and defining our product offering. We would be happy if you could support us in this process. You only need to answer a few questions.

Thank you for your time and interest in this exciting and promising field of performance engineering. If you have any questions or suggestions, please comment or contact me.

Innovation is taking two things that exist and putting them together in a new way. Are you ready?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>