Caution: This document refers to an oldversion of ØMQ. From version 0.3.1 onwards C extension is integral part of ØMQ and thus it doesn't have to be downloaded and built separately! Also note that the C API have changed since!
Table of Contents
|
Introduction
This whitepaper describes first version of the C extension for ØMQ. It is simplified version of ØMQ interface. C extension is not yet part of ØMQ package. You have to download it separately (see below) and build it by hand. Any feedback on C extension is welcome on ØMQ developer's mailing list.
Download
Download C extension for ØMQ here.
Building it
Download and build ØMQ package:
$ tar -xzf zmq-0.3.tar.gz
$ cd zmq-0.3
$ ./configure
$ make
$ sudo make install
Unpack and build C extension:
$ tar -xzf czmq.tar.gz
$ cd czmq
$ g++ -c -fPIC czmq.cpp
$ g++ -shared -pthread -o libczmq.so czmq.o libzmq.so
Build test programs:
$ gcc -o local_lat local_lat.c libczmq.so
$ gcc -o remote_lat remote_lat.c libczmq.so
$ gcc -o local_thr local_thr.c libczmq.so
$ gcc -o remote_thr remote_thr.c libczmq.so
Using it
C extension's API is currently much simpler when compared to original C++ API. The difference is that C extension doesn't allow for full control of ØMQ threading as C++ does. Instead, C extension creates single I/O thread that can be accessed from a single application thread. This doesn't allow for seamless scaling on multicore boxes. However, it is our intent to expose full ØMQ API via C in the future.
To instantiate ØMQ:
void *handle;
handle = czmq_create (host);
Where hostname is name or IP address of the box where zmq_server is running. Returned handle will be used in all the other functions to identify this particular instance of ØMQ.
To create wiring, czmq_create_exchange, czmq_create_queue and czmq_bind functions can be used. For detailed description of how wiring mechanism works have a look here.
int eid;
eid = czmq_create_exchange (handle, "E", CZMQ_SCOPE_GLOBAL, "10.0.0.1:5555");
czmq_create_queue (handle, "Q", CZMQ_SCOPE_GLOBAL, "10.0.0.1:5556");
czmq_bind (handle, "E", "Q");
To send a message, you have to supply a buffer, its size and the function to be used to deallocate the buffer once it's no more needed:
void *buf;
buf = malloc (10);
memset (buf, 0, 10);
czmq_send (handle, eid, buf, 10, free);
Receiving a message hands you a buffer, its size and the function you should use to deallocate the buffer:
void *buf;
size_t size;
czmq_free_fn *ffn;
czmq_receive (handle, &buf, &size, &ffn);
if (ffn)
ffn (buf);
To shut down ØMQ infrastructure use the following function:
czmq_destroy (handle);
Test results
Tests were performed on two quadcore boxes (Intel Xeon CPU, E5440, 2.83 GHz) connected via direct 1Gb Ethernet link (Intel PRO/1000, PCI Express:2.5GB/s:Width x4). Operating system used was Debian Linux 4.0 (kernel version 2.6.24.7, CONFIG_PREEMPT_VOLUNTARY=y, CONFIG_PREEMPT_BKL=y, CONFIG_HZ=1000).
Latency
End-to-end latency - as measured by local_lat and remote_lat - is only slightly higher than C++:
Message size | C++ | C |
---|---|---|
1 B | 32.7 us | 35.43 us |
16 B | 34.54 us | 36.26 us |
256 B | 42.21 us | 44.04 us |
4096 B | 85.63 us | 88.64 us |
65536 B | 612.99 us | 651.83 us |
Same data can be seen on the following grpah (black line is C++, red line is C):
Throughput
C extension is somehow less efficient than raw C++ code - possibly due to omission of "VSM" optimisation from the C extension. Until network limit (1Gb/sec) is reached the throughput is approximately 50% of the C++ throughput. However, once the messages are large enough to exhaust the network (~256 bytes) the throughputs of C++ and C are exactly the same.
Message size | C++ | C |
---|---|---|
1 B | 2,435,820 msgs/sec | 1,624,139 msgs/sec |
16 B | 2,976,623 msgs/sec | 1,428,036 msgs/sec |
256 B | 447,126 msgs/sec | 447,534 msgs/sec |
4096 B | 28,896 msgs/sec | 28,893 msgs/sec |
65536 B | 1,810 msgs/sec | 1,810 msgs/sec |
Conclusion
Although C extension doesn't provide full ØMQ functionality at the moment, performance figures are quite convincing. Latency is only few microseconds above C++ latency (~35 us) and throughput, although lower than in C++, would still allow for decent handling of OPRA feed.