ØMQ Blog

To Trie or not to Trie - 18 May 2010 06:30 - by martin_sustrik - Comments: 0

Have you ever thought about what exactly happens when you subscribe for a topic using SUB socket? The messages have to be checked and those not matching the subscription(s) have to be dropped.

Currently, matching is done using an N-ary search tree, a structure known as trie.

Searching a trie is very fast. The search time is independent of overall number of topics or number of subscriptions. It's linearly dependent only on the length of the topic string in the message being matched. However, with large number of subscriptions the trie can consume more memory than alternative search structures such as hash table. Larger memory usage means that memory cache misses happen more often and slow the algorithm down.

Bhavin Turakhia did a research on trie optimisation and blogged about it.

Read more....

ØMQ/2.0.6 on OpenVMS - 12 May 2010 13:02 - by martin_sustrik - Comments: 3

BC&JA are pleased to announce that a binary version of 0MQ V2.0.6 has been released for Alpha and Integrity, OpenVMS 8.3 and higher…

Read more....

Zero-copy and Multi-part Messages - 08 May 2010 09:58 - by martin_sustrik - Comments: 17

In high performance networking copying data is considered harmful to performance and avoided as much as possible. The technique of avoiding all the copies is known as "zero-copy".

This article demonstrates the impact of single copy of the data on latency. It shows, for example, that for 256MB of data, single copy can increase latency by 0.1 second!

Obviously, data are copied from memory to network interface card and vice versa, they are copied on user space/kernel space boundary etc. This article in Linux Journal gives detailed explanation of what's going on under the hood of the operating system and what are the ways to get as close to the zero-copy as possible.

However, in this blog we are going to discuss only a single instance of copying the data, namely copying user data into ØMQ messages.

Consider the following example. We'll create a message million bytes long and copy the user data into it before sending:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 1000000);
memcpy (zmq_msg_data (&msg), buffer, 1000000);
zmq_send (s, &msg, 0);

The memcpy part looks suspicious. We have the data in the buffer already so why not send the buffer itself instead of copying it to the message? Is ØMQ capable of such thing?

Actually, yes. It is and it has always been. All we have to do is to define deallocation function for the buffer and pass it to ØMQ along with the buffer:

void my_free (void *data, void *hint)
    //  We've allocated the buffer using malloc and
    //  at this point we deallocate it using free.
    free (data);

Once the deallocation function is defined we can create a "zero-copy" message and pass it the buffer and deallocation function:

zmq_msg_t msg;
void *hint = NULL;
zmq_msg_init_data (&msg, buffer, 1000000, my_free, hint);
zmq_send (s, &msg, 0);

Note that the buffer is now owned by the message. It will be deallocated once the message is sent. We must not deallocate the buffer ourselves!

Also note the hint parameter. It can be used if more complex allocation mechanism is used. Say we allocated the chunk using some "allocator" object and we have to deallocate it via the same object. In such case we can pass the pointer to allocator as a hint to zmq_msg_init_data and modify the deallocation function as follows:

void my_free (void *data, void *hint)
    ((allocator_t*) hint)->free (data);

We've got rid of the copying, right?

Well, not entirely. In some cases the above may work. In other cases it is insufficient.

Consider the case when we have two large matrices — each 100MB long — which we want to transfer. Unfortunately they are not contiguous in the memory. Each was allocated using separate malloc invocation and thus we cannot describe both using single data pointer.

Why not send them as two separate messages then? Consider say REQ socket. It load balances messages. In other words, if there are two REP sockets connected to it, sending two messages would result in first matrix being dispatched to one REP socket while the second to the other REP socket. This is not what we want. We want the two matrices to form an atomic unit of transfer. They should never be split apart.

It seems that in this case we need something equivalent to POSIX gather arrays. For those unfamiliar with Berkeley socket API, gather array is an array of data chunks that's sent to the networking stack using a single call.

But would that account for all possible scenarios?

There's still a scenario where it won't help. namely, when the two matrices don't exist at the same time. First one is created, sent and deallocated, then second one. In such case the gather array would be of no use. There's no single point in time when we own all the data and thus are able to fill in the gather array.

The new feature in ØMQ called "multi-part message" solves the problem. To put it simply, it allows you to concatenate multiple messages into a single message:

zmq_msg_t msg1;
zmq_msg_init_data (&msg1, matrix1, matrix1_size, my_free, NULL);
zmq_send (s, &msg, ZMQ_SNDMORE);
zmq_msg_t msg2;
zmq_msg_init_data (&msg2, matrix2, matrix2_size, my_free, NULL);
zmq_send (s, &msg, 0);

It looks almost exactly as if you were sending two separate messages except for passing ZMQ_SNDMORE flag to the first send. The flag says: "Hold on! There are more data going to be added to this message!"

The important point to note is that although all parts of the message are treated as a single atomic unit of transfer, the boundaries between message parts are strictly preserved. In other words, if you send a message consisting of two message parts, each 100 bytes long, on the other side you'll never receive a single message part 200 bytes long. Or two message parts, 50 and 150 bytes long. Or even four message parts, each 50 bytes long. You'll get exactly what you've sent — two message parts, each 100 bytes long in the same order as they were sent.

This fact allows for using multi-part messages for adding coarse-grained structure to your message. The example with two matrices illustrates the point. You send the two matrices as two message parts and thus avoid the copy. However, at the same time the matrices are cleanly separated, each residing in its own message part and you are guaranteed that the separation will be preserved even on the receiving side. Consequently you don't have to put matrix size into the message or invent any kind of "matrix delimiters".

Another interesting use of multi-part messages is to combine them with PUB/SUB sockets. Publish/subscribe messaging pattern allows for subscribing for particular subset of messages. Subscription is a chunk of data supplied by receiver, saying "please, send me all the messages beginning with these data":

zmq_setsockopt (s, ZMQ_SUBSCRIBE, "ABC", 3);

Obviously, sender has to place the appropriate data at the beginning of the message to make it delivered to the specific subscriber:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 6);
memcpy (zmq_msg_data (&msg), "ABCxyz", 6);

The part of the message that is checked against the subscriptions is called topic. In our case the topic is "ABC".

When the topic is of variable length you need a delimiter to separate is from the rest of the message, so that subscription mechanism doesn't incidentally consider beginning of the data to be a continuation of the topic. Following example uses "pipe" symbol as delimiter:

zmq_msg_t msg;
zmq_msg_init_size (&msg, 7);
memcpy (zmq_msg_data (&msg), "ABC|xyz", 7);

While this works, it's a bit ugly. Even more importantly, if the topic happens to be binary data, there's no spare symbol we can use as the delimiter.

Elegant solution is to use a two-part message. Subscriptions are always evaluated only against the first message part, so we can place the topic into the first message part while the rest of the data into the second one (or even into several subsequent message parts):

zmq_msg_t topic;
zmq_msg_init_size (&topic, 3);
memcpy (zmq_msg_data (&topic, "ABC", 3);
zmq_send (s, &topic, ZMQ_SNDMORE);
zmq_msg_t value;
zmq_msg_init_size (&value, 3);
memcpy (zmq_msg_data (&value, "xyz", 3);
zmq_send (s, &value, 0);

One final remark. When receiving a message you may know that each message consists of two parts, say "topic" and "value". However, in other scenarios you may have no idea how many message parts there are in the message. In such case ØMQ allows you to ask the socket whether there are more message parts to be received or not. This is done using ZMQ_RCVMORE socket option:

zmq_recv (s, &msg, 0);
int64_t more;
size_t more_size = sizeof (more);
zmq_getsockopt (s, ZMQ_RCVMORE, &more, &more_size);
if (more) ...

The Long and Winding Road Behind - 07 Apr 2010 10:23 - by martin_sustrik - Comments: 2

The following is a point-by-point account of what we have achieved so far during the development of the ØMQ project. The most interesting aspect is how the project gradually evolved from "just another messaging solution" to "layer of the Internet stack".

Getting the messaging stack into order

Messaging systems have their origin in the mid-80’s corporate environment. At that time the existence of a standard and fully functional network stack wasn’t assumed. The design of corporate messaging took this fact into account and implemented almost all of the functionality that is available in the standard Internet stack today.

The relative isolation of corporate environments, combined with the long life cycles of enterprise software, resulted in a legacy that persists to this day. Functionality remains on the messaging level, virtually duplicating almost every feature of the Internet stack. Moreover, backward compatibility issues with legacy enterprise applications require any new messaging system to duplicate the design of the system being replaced.

In our design we took a completely opposite approach and instead of duplicating a traditional heavyweight messaging system we decoupled individual features from the messaging layer, delegating them to the appropriate layer in the Internet stack, leaving ØMQ to implement only the core messaging functionality — so called "messaging patterns".

For example: Many legacy systems provide their own implementation of multiplexed virtual circuits. We delegate multiplexing to the IP layer, where it belongs. This approach eliminates the feared "head of line blocking" problem and it is network-friendly, meaning that existing networking components (routers, switches) can transparently apply sensible congestion control, traffic shaping, and so on.

Different example: We deliberately refrain from structured content. ØMQ messages are opaque binary data, to be contrasted with the rich system of data types provided by legacy messaging systems such as CORBA. Serialisation and de-serialisation of data can be easily implemented on the application layer using many existing libraries.

Yet another example: Flow control is delegated to the TCP layer, where it belongs. In this way we avoid broken models of flow control at the messaging layer as well as duplicate transmit and receive windows.

Fixing broken functionality

Delegating functionality to it’s natural place in the Internet stack led us to compare the algorithms used by messaging systems with the algorithms used by the Internet stack.

We found that the heavily isolated nature of messaging system development prevents the flow of innovation between the Internet community and the world of messaging. For example, such a widely accepted algorithm as fair queueing (first proposed in RFC970 in 1985) has as of 2010 not yet made it across the border.

Simplifying the wire protocol

Enterprise messaging is a world of proprietary protocols, products with closed protocols make up the majority of the market. Several attempts to introduce open protocols to the messaging sphere have been made over the years however, none of these have yet succeeded.

We have participated in the standardisation of the AMQP protocol almost from it’s inception in 2004, and over time we came to realise that the major obstacle to adoption of AMQP is it’s inherent complexity. A complex specification makes implementation a costly prospect and compromises interoperability by the sheer amount of detail that may be misinterpreted by implementers.

To counter this problem, we designed a minimalist wire protocol which can be described in a couple of paragraphs. Moreover, the protocol is designed as any well-behaved Internet protocol should be: It can be layered on top of an arbitrary underlying transport protocol.

Simplifying the API

The complexity of tasks performed by traditional messaging systems is naturally reflected by overgrown and convoluted APIs. Most of them are closed, the only widely accepted open API is called JMS, which is Java-specific and comes with a specification that is 138 pages long.

During the design of ØMQ we initially mimicked widely used messaging APIs. After delegating functionality to the appropriate place in the Internet stack as described above the original API no longer matched the functionality.
To solve this problem we designed a new API inspired by Berkeley sockets. This API is a simple extension of the concepts used by Berkeley sockets, mapping almost 1:1 to BSD socket API, flattening the learning curve almost to zero and leaving the way open towards the possible inclusion of ØMQ into the large family of protocols available via POSIX sockets.

Making it ubiquitous

For ØMQ as a solution which is steadily evolving towards being a layer in the Internet stack, it is important to run everywhere. Our implementation is designed to be portable and currently runs on POSIX-compliant platforms, Microsoft Windows and OpenVMS.

For exactly the same reason it’s important that the implementation runs – and runs well – on any available hardware, and scales across a range of options. On the top end of the range, it runs on large multi-core servers (we have tested scaling on machines with up to 16 CPU cores). On the low end it runs on embedded systems and vintage computers. Further, ØMQ has been designed and tested to scale from unreliable slow networks such as GPRS all the way to high performance data centre infrastructure such as InfiniBand and 10Gb Ethernet.

For ØMQ to be universally useful a key requirement is that it be language-agnostic. Core implementation thus provides the lowest common denominator, a C library. Thanks to the help from community the number of languages that can natively access ØMQ increases at astounding pace.

Providing industrial strength

The ØMQ networking engine is extremely fast. By eliminating unnecessary calls to the underlying network stack, message throughput can exceed the network stack’s performance by a factor of 3-4. For message latency, the engine adds only a couple of microseconds on top of the network stack.

The performance thus achieved is a prerequisite for ØMQ to be suitable for operation on a Internet scale, for example on the core backbone.

Historical Highlights - 06 Apr 2010 16:41 - by martin_sustrik - Comments: 0

With addition of the blog to the ØMQ website, "highligths" section became obsolete. We've decided to remove it altogether to save the precious real estate on the web frontpage.

However, to keep the track of what have been happening in the past, all historical highlights were collected and moved to this blog.


  • April 1st, 2010: Ada language binding was created by Per Sandberg and can be found here.
  • March 25th, 2010: Performance benchmark for ØMQ/2.0.6 on top of SDP/InfiniBand stack was provided by Michael Santy and can be found here.
  • March 16th, 2010: ØMQ version 2.0.6 (beta) has been released with all-new documentation and many enhancements.
  • March 11th, 2010: Test showing impact of message copying on latency were provided by Michael Santy. The tests are also useful to get an idea of performance of ØMQ/2.0 on top on InfiniBand. See the results here.
  • Febrary 26th, 2010: Improved Python binding (support for polling etc.) was created by Brian Granger. Check it here.
  • February 20th, 2010: Lua binding was created by Aleksey Yeschenko. You can check it here.
  • February 18th, 2010: Recorded Connect community webinar is available here. Slides from the webinar can be downloaded from here.
  • February 16th, 2010: Thanks to Adrian von Bidder and Peter Busser, Debian packages for ØMQ are now available.
  • February 10th, 2010: Webinar on messaging in general and ØMQ in particular will be held on February 18th, 5:00PM GMT. You can register for the webinar here.
  • January 29th, 2010: A demonstration of sending video over ØMQ created by Martin Lucina is available here.
  • January 23rd, 2010: Haskell language binding for ØMQ/2.0 was created by Toralf Wittner. The package can be found here.
  • January 21st, 2010: Linux Weekly News have published an article about ØMQ. Check it here.
  • January 17th, 2010: ØMQ/2.0-beta2 is released. Aside of several bugfixes it provides new IPC (inter-process) transport that significantly reduces latency when passing messages between processes on the same box.
  • January 7th, 2010: ØMQ/2.0-beta1 is released. It includes Common Lisp API, zero-copy for large messages and more.
  • December 4th, 2009: ØMQ IRC chatroom is available here.
  • November 27th, 2009: Common Lisp binding for ØMQ/2.0 was created by Vitaly Mayatskikh. Get the source code here.
  • October 30th, 2009: Open Source firms iMatix Corporation and FastMQ Inc. today announced the acquisition of FastMQ by iMatix. FastMQ's flagship product, ZeroMQ, branded as the "Fastest. Messaging. Ever." will join iMatix's other messaging products, OpenAMQ and Zyre. Read more.
  • September 23th, 2009: Alpha3 version of ØMQ/2.0 is available. It features multicast bus and support for request/reply scenarios. Download.
  • September 17th, 2009: Alpha2 version of ØMQ/2.0 is available. Check it to try out new multicast support, subscription mechanism, forwarder device and more.
  • September 9th, 2009: Alpha1 version of ØMQ/2.0 is available. Check it to learn about new socket-like API and provide us with feedback.
  • September 7th, 2009: ØMQ/1.0.1 is available. This version brings Lua, Tcl and Delphi bindings to ØMQ. In addition it fixes several bugs found in version 1.0.0.
  • August 4th, 2009: Delphi binding for ØMQ was created by Daniele Teti. The source code can be found here. He'll have a talk about ØMQ for Delphi developers at ITDevCon.
  • July 13th, 2009: ØMQ/1.0.0 is available. Aside of several bug fixes and improvements new version comes with integrated build of OpenPGM reliable multicast library.
  • April 8th, 2009: ØMQ/0.6 is available. This version introduces load-balancing capabilities, on-disk offload for the large queues, support for Mono and OpenVMS etc.
  • March 1st, 2009: Blog on ØMQ port to OpenVMS platform can be found here.
  • February 8th, 2009: There's an article discussing how low-level monitoring tools like Wireshark can be used to monitor network traffic based on business criteria. Example is included that shows how to get graph representation of bandwidth used by stock quotes vs. trades vs. order confirmations.
  • February 6th, 2009: ØMQ/0.5 is available. New version introduces .NET compatibility, PGM reliable multicast, AMQP and SCTP support, congestion management as well as several other features.
  • January 7th, 2009: Windows installer package for ØMQ/0.4 is available here.
  • December 22nd, 2008: Comprehensive table of existing and prospective functionality in ØMQ is available.
  • December 17th, 2008: There's a new whitepaper describing different messaging architectures and explaining how ØMQ differs from traditional messaging products available.
  • December 15th, 2008: Holger Hoffstätte: In the unlikely case that anybody else is using Gentoo you might be interested in an ebuild for automatic installation, hosted in my private mercurial repository.
  • December 9th, 2008: ØMQ version 0.4 was released. Most importantly, it's the first ØMQ package licensed under LGPL. Aside of that ØMQ/0.4 adds support for new platforms (AIX, HP-UX, OpenBSD, QNX Neutrino, Windows) and uses advanced polling mechanisms to cope with thousands of simultaneous network connections.
  • November, 14th, 2008: There's a whitepaper presenting performance results for ØMQ/0.3.1 on 10Gb Ethernet available.
  • November 11th, 2008: Interesting paper investigating performance of AMQP over InfiniBand can be found here. The benchmark used is based on ØMQ "stock exchange" example.
  • November 7th, 2008: Novell partners with FastMQ to tune ØMQ for SUSE Linux Enterprise Real Time.
  • October 23th, 2008: We've run few tests comparing latencies on standard and real-time Linux kernels. The tests have shown that real-time kernel is able to eliminate all the latency peaks. Still, it does so with the trade-off of slightly increasing average latency.
  • October, 13th, 2008: There's a whitepaper presenting performance results for ØMQ/0.3.1 on top of SDP/InfiniBand stack available.
  • October, 8th, 2008: ØMQ is LGPL'd from now on! LGPL version is available in the form of source code in Subversion at the moment. LGPL'd package is coming soon.
  • September 25th, 2008: ØMQ version 0.3.1 was released. It's backwards compatible with version 0.3 and adds some usability features like allowing to use network interface name instead of IP address, single centralised include file, man pages, etc.
  • September 25th, 2008: Document describing how to build ØMQ on QNX neutrino real-time operating system was contributed by Alexej Lotz.
  • September 22nd, 2008: High Performance on Wall Street: meet FastMQ Inc. at the Roosevelt Hotel, New York, NY
  • September 10th, 2008: "Performance results for Python and C extensions are available."
  • September 9th, 2008: "Python and C extensions for ØMQ are available. Performance results are not yet available, but coming soon."
  • September 4th, 2008: "Tests with ØMQ on the top of SDP-MX/Myricom-10GbE stack show end-to-end latency of 13.4 microseconds. For those striving for ultra-low latency, I would suggest visiting Myricom's stall at High Performance on Wall Street conference (New York, September 22nd, 2008) and asking for detailed performance figures."
  • September 3rd, 2008: "Java extension for ØMQ is available. It's not a part of official ØMQ package yet, so you have to download is separately and build it by hand. Performance tests show end-to-end latency of 36 microseconds and throughput of 1.5 million messages a second."
  • September 2nd, 2008 (discussion on ØMQ mailing list): "I work in an smallish trading organization where we own licenses to Tibco/Elvin and 29West's LBM in addition to market data system built on another messaging system whose API is hidden to us. Nothing would delight us more than a viable open source solution which will be there for a while and free us from vendor lockin or the threat of end of product lifecycle event."
  • September 2nd, 2008 (discussion on ØMQ mailing list): "I do not see any competitor to ØMQ but I do know a few closed source messaging vendors. These vendors are now stuck with a hard to sell product and an uncertain economic future - remember Talarian? However, once ØMQ gets some traction, they will see the light. Open sourcing to build critical mass will be a natural strategy for them."
  • August 11th, 2008: "We are happy to announce that version 0.3 of ØMQ lightweight messaging kernel is out of the door! The main improvements are improved usability by adhering to standard messaging paradigm of routing messages into message queues and improved performance (over 3 million messages a second)."
  • August 6th, 2008 (Andy Piper's blog): "For some non-IBM messaging middleware updates, just to note that ØMQ sounds intriguing. I’ve done a lot of work with clients in the financial sector in particular, so I’ll be interested to see how this develops. One of the nice things about my other "pet" product, WebSphere Message Broker, is that it sits in the sweet spot of connectivity between different transports and protocols, so I guess I’ll be looking at how to make things talk to one another if ØMQ takes off."
page 4 of 4« previous1234

If you want to become a ØMQ blogger, contact us.

All posts

07 Feb 2011 10:21: Installing ZeroMQ and Java bindings on Ubunu 10.10 comments.png 0
03 Feb 2011 09:11: Python Multiprocessing With ØMQ comments.png 1
01 Feb 2011 21:29: ØMQ And QuickFIX comments.png 3
23 Dec 2010 09:33: Message Oriented Network Programming comments.png 0
19 Dec 2010 13:03: ØMQ/2.1.0 available on OpenVMS comments.png 0
13 Dec 2010 12:46: Load Balancing In Java comments.png 1
10 Nov 2010 10:01: Building a GeoIP server with ZeroMQ comments.png 0
31 Oct 2010 11:44: ØMQ blog by Gerard Toonstra comments.png 0
13 Oct 2010 10:47: ØMQ - The Game comments.png 0
09 Sep 2010 07:45: ZeroMQ and Clojure, a brief introduction comments.png 0
03 Sep 2010 18:07: ZeroMQ: Modern & Fast Networking Stack comments.png 0
02 Sep 2010 13:30: September Meetups comments.png 0
01 Sep 2010 16:05: Multithreading Magic comments.png 0
01 Sep 2010 11:25: Not To Be Confused comments.png 2
27 Aug 2010 05:39: ØMQ for Clojure comments.png 0
17 Aug 2010 07:20: ZeroMQ: What You Need to Know Braindump comments.png 0
03 Aug 2010 10:46: RFC: ØMQ Contributions, Copyrights and Control comments.png 1
02 Aug 2010 19:54: New - ØMQ Labs! comments.png 0
26 Jul 2010 14:02: Video Introduction to ØMQ comments.png 3
13 Jul 2010 20:19: ØMQ available at CPAN comments.png 1
12 Jul 2010 12:22: Welcome, Objective-C :-) comments.png 0
10 Jul 2010 13:42: Few Blogposts on ØMQ comments.png 2
29 Jun 2010 08:19: Ruby Gem for ØMQ/2.0.7 available! comments.png 0
23 Jun 2010 09:27: ZeroMQ an introduction comments.png 1
18 Jun 2010 10:15: Mongrel2 Is "Self-Hosting" comments.png 0
08 Jun 2010 21:25: Internet Worldview in Messaging World comments.png 0
07 Jun 2010 16:13: Berlin Buzzwords 2010 comments.png 1
05 Jun 2010 10:24: Loggly Switches to ØMQ comments.png 4
04 Jun 2010 18:05: ØMQ/2.0.7 (beta) released comments.png 12
24 May 2010 09:37: Building ØMQ and pyzmq on Red Hat comments.png 2
18 May 2010 06:30: To Trie or not to Trie comments.png 0
12 May 2010 13:02: ØMQ/2.0.6 on OpenVMS comments.png 3
08 May 2010 09:58: Zero-copy and Multi-part Messages comments.png 17
07 Apr 2010 10:23: The Long and Winding Road Behind comments.png 2
06 Apr 2010 16:41: Historical Highlights comments.png 0

Posts by date