10GbE Tests with ZeroMQ v4.3.2

Introduction

This test presents performance result of ØMQ/4.3.2 on 10Gb Ethernet. The graphs have been produced running the "libzmq/perf/generate_csv.sh" and "libzmq/perf/generate_graphs.py" scripts and using the ZeroMQ benchmarking utilities shipped in that folder.

Environment

Box 1:

NUMA nodes: 2
Server CPUs: 56-core Intel Xeon E5-2680 v4 @ 2.40GHz (2 CPUs with HyperThreading enabled)
NIC: Intel 82599ES 10-Gigabit
Linux/Centos7 (kernel 3.10.0-957.27.2.el7.x86_64)
ØMQ version 4.3.2, built with gcc 4.8.5

Box 2:

NUMA nodes: 1
Server CPUs: 32-core Intel Xeon Gold 6130 @ 2.10GHz (1 CPU with HyperThreading enabled)
NIC: Intel X710 10-Gigabit
Linux/Centos7 (kernel 3.10.0-957.27.2.el7.x86_64)
ØMQ version 4.3.2, built with gcc 4.8.5

Boxes were connected by a direct fiber-optic cable.

Results for TCP transport

All the tests were run for message sizes of 8, 16, 32, … 65536, 131072 bytes.

Throughput Results

The following graph combines the achieved packet-per-second (PPS) and the achieved bandwidth obtained with benchmark utilities changing the ZeroMQ message size.
The socket type used by the benchmark utility is PUSH/PULL.

pushpull_tcp_thr_results.png

Note for reference that the maximum achievable PPS on a 10Gbps Ethernet link is about 14.88 Mpps (see https://kb.juniper.net/InfoCenter/index?page=content&id=KB14737).
Finally, the link was also tested using benchmarking utilities like "iperf", resulting in a throughput of 9.4 Gbps.

Latency Results

The following graph shows the achieved latency obtained with benchmark utilities changing the ZeroMQ message size.
The socket type used by the benchmark utility is REQ/REP.

reqrep_tcp_lat_results.png

Results for INPROC transport

For these results, only the box #1 was actually used.

Throughput Results

The following graph combines the achieved PPS and the achieved bandwidth obtained with benchmark utilities changing the ZeroMQ message size.
The socket type used by the benchmark utility is PUSH/PULL.

pushpull_inproc_thr_results.png

In the following graph, the performances of a PUB-> ZMQ PROXY -> SUB chain, all using INPROC transports, are shown:

pubsubproxy_inproc_thr_results.png