Introducing flood: a load testing tool for benchmarking EVM nodes

Jun 06, 2023 | Storm Slivkoff, Georgios Konstantopoulos

Contents

Introduction

Load testing is a critical step in the development of resilient, high-performing data systems. Nevertheless, load testing has not been widely applied in the development of cryptocurrency infrastructure. We're thrilled to bridge this gap with the introduction of flood, a benchmarking tool specifically designed for performance analysis of RPC endpoints.

We initially built flood as a tool to optimize Reth and understand its latency and throughput tradeoffs under various loads. However, we believe flood has significant utility beyond Reth for optimizing the performance of many types of crypto infrastructure.

We are excited to open source flood under the Apache/MIT license as free, open-source software. Its code and installation instructions can be found in the Github repository. flood can be used either from the command line or as a python library. There is also a Docker image of flood for easy integration into CI/CD and other types of pipelines.

Let’s dive in.

What is load testing and why does it matter?

Load testing refers to measuring how the performance characteristics of a system are affected by different types of workloads. The key insight behind this approach is that performance metrics such as throughput, latency, and error rate typically degrade when a system is placed under increasing amounts of load. Therefore, observing a system under different controlled loads can reveal insights into the bottlenecks, failure modes, and ultimate performance capacity of the system.

The information obtained by load testing can be leveraged in numerous ways. When a system is under active development, load testing highlights whatever system bottlenecks are most in need of improvement. When comparing two systems, load testing can reveal which system is more performant or reliable. As a special case of this, load testing can compare two different hardware or software configurations of a single system. In each case, load testing enables the development of highly optimized systems.

How to load test blockchain nodes?

Our focus is on RPC, which is the communication protocol typically used for extracting data from blockchain nodes.

Currently, the most common approach for measuring RPC performance is not load testing, but latency testing: you send an RPC node a request, and measure how long it takes to obtain a response. Latency testing for a variety of RPC providers can be found on various websites. Unfortunately, this type of testing offers a limited view of node performance because it reveals almost nothing about how the system behaves under load (see our post on measuring latency and throughput for details).

In the context of blockchains, workloads can vary in two important ways. The classic variable is size. A load of 10,000 requests per second will put more stress on a system than a load of 100 requests per second. The other load variable is the RPC method. There is a different RPC method for each type of data that you pull from a blockchain node. For example, blocks vs transactions vs logs vs traces. Each RPC method puts a different type of load on the system. Some RPC methods are bound by storage IO whereas others are bound by CPU.

What is flood?

With these principles in mind, we developed a load testing tool called flood. flood offers an unprecedented view into the performance characteristics of a RPC endpoint by 1) embracing load testing instead of latency testing and 2) expanding test coverage to all relevant RPC methods.

flood consists of 3 basic components:

Call generation engine: flood generates large parameterized sets of RPC calls, randomly sampled with distributions that resemble different types of blockchain workloads. flood leverages Paradigm Data Portal datasets to ensure full coverage of blockchain history.
Load testing engine: flood then orchestrates Vegeta (a high performance load testing tool written by @tsenart in Go) to use these calls for load tests against RPC endpoints.
Reporting engine: After performing tests, flood summarizes the results with various charts, tables, and reports. These summaries are easy to integrate into scripts and data pipelines.

Each of these components is highly configurable, enabling flood to cover a wide range of test scenarios and environments.

What can flood do?

During the typical operation of flood, a user specifies the RPC methods they would like to test along with a list of RPC endpoints. For example, you might want to test the performance of eth_getLogs for two versions of Reth. flood will then run different controlled loads against those RPC endpoints. For example, it might run eth_getLogs at 1,000, 2,000, 4,000, and 8,000 requests per second. flood will then display tables and charts that summarize how the performance metrics vary as a function of load. The output will look something like this:

See example of a real flood report here.

The particular way in which performance metrics degrade under load offers a rich set of insights into the bottlenecks and ultimate performance capacity of a system. For details on interpreting and leveraging load testing data, we recommend Chapter 15 of Cesarini's Designing for Scalability with Erlang/OTP.

Beyond this simple mode of operation, flood also provides advanced features to accommodate various types of power users:

flood can use different load testing schedules, including: “stress testing” (gradually increasing load over time), “spike testing” (a large sudden load followed by small load), and “soak testing” (running a load for a long period of time).
flood can orchestrate load tests to run in a local mode on each RPC node to eliminate noise caused by network bottlenecks.
flood has an “equality” testing mode that checks whether each RPC endpoint is returning identical responses.

Why did you build flood?

At Paradigm we are developing a new node implementation called Reth with performance standing as one of its primary objectives. We developed flood in order to characterize Reth’s performance in a detailed way. We have already used flood to uncover numerous Reth performance bottlenecks appearing under various workloads and system configurations. These bottlenecks were then rectified. With flood we have created a tight feedback loop where the Reth developers have high visibility into how any codebase changes translate into end-to-end system performance.

Beyond Reth, we believe flood wil be able to help resolve many unanswered questions related to RPC nodes in general:

Which hardware specs matter most when running a node? What is the relative importance of storage IO vs RAM speed vs RAM capacity vs CPU speed? Is RAID worth it?
What is the effective rate limit of each RPC method for each 3rd party RPC provider?
Which node client offers the best performance for different types of workloads?

Conclusion

In this post we introduced flood, a load testing tool that offers an unprecedented view into the performance characteristics of blockchain nodes. Although we originally built flood to optimize the development of Reth, we believe it will be a huge unlock for the development of other types of high performance crypto infrastructure. We look forward to seeing how others might use flood to build their own performant and reliable systems.

If you are interested in using flood, or want to contribute to flood, please check the Issue Tracker on Github, or reach out to storm@paradigm.xyz or georgios@paradigm.xyz.

Thanks to Achal Srinivasan for the graphics.

Written by:

Storm Slivkoff

Storm is a data associate at Paradigm. He uses data science and data engineering to analyze crypto systems and build crypto data infrastructure. Storm is passionate about open source software [→]

Georgios Konstantopoulos

Georgios Konstantopoulos is the Chief Technology Officer and a Research Partner focused on Paradigm’s portfolio companies and research into open-source protocols. Previously, Georgios was an independent consultant and researcher focused [→]

Disclaimer: This post is for general information purposes only. It does not constitute investment advice or a recommendation or solicitation to buy or sell any investment and should not be used in the evaluation of the merits of making any investment decision. It should not be relied upon for accounting, legal or tax advice or investment recommendations. This post reflects the current opinions of the authors and is not made on behalf of Paradigm or its affiliates and does not necessarily reflect the opinions of Paradigm, its affiliates or individuals associated with Paradigm. The opinions reflected herein are subject to change without being updated.