(Code examples are in Ruby.)
Purpose
The man page for zmq_socket describes DEALER and ROUTER sockets in terms of how they communicate with their lock-step partners REQ and REP. That is, for a DEALER socket it discusses the data requirement for talking to a REP socket; for a ROUTER socket it discusses the data requirement for talking to a REQ socket. It is silent on the requirements for direct DEALER to ROUTER communications leaving it as an exercise for the reader to infer the proper message management.
This recipe discusses one technique for direct DEALER to ROUTER communications. It also suggests a methodology for debugging packet flow between the endpoints for the inevitable problems that occur in complex distributed systems.
Flow
The main detail to notice when reading the man page is the requirement for adding in a null (empty) message part between the identity routing information prepended to all messages and the message body containing the application-level data. This null message part is used as a delimiter to separate routing information from application-level data. When communicating to/from REQ/REP sockets, the routing information is silently processed by the framework up to the null message part; the framework then hands off the remaining message parts to your application for processing.
The first question I had when working with these socket types focused on the identity information prepended to all new messages by the DEALER socket. What is its purpose?
Simply, the identity message part is used as an address for return routing by a REP or ROUTER socket.
The second question that arose concerned the necessary routing information when connecting a DEALER socket directly to a ROUTER socket. There is no magic here for return routing; just write the response back out on the socket and it should get to its destination.
Except that the message does not get there!
Why
0mq is intended for building distributed systems. It is rare that a distributed system contain only two layers like in a client-server configuration. Oftentimes the network topology will include intermediate aggregation points which route the messages to other components which in turn may route them again until eventually they reach their destination. In order for a response to make it back to its originating node after hopping through multiple intermediates, the message must contain the address of every hop along the route.
From source to destination, each intermediate ROUTER socket prepends the sender's identity to the message (this is an optimization to save on bandwidth). When the message is received by the destination, it now has a complete record of every socket that touched the packet on its way to the destination.
When the ROUTER socket writes a response, it must prepend all of that routing information to the response so each intermediate node knows how to route the message to the next hop. Each intermediate ROUTER node pops the top routing message part off and uses the new top of the stack to route the message.
A Picture
The above explanation really needs a picture.
Let's assume we have a DEALER socket at our source with two queue devices between it and the destination. Each box represents a separate process with a pair of DEALER/ROUTER sockets.
Legend
- app1, queue1, queue2, app2 are socket identities used for routing
- Ø represents a null message part used as a delimiter
- request is the request body originating from app1
- reply is the reply body originating from app2
When sending a message from source to destination, the destination will receive a message that looks like this:
queue2.B
queue1.B
source
body parts / application-level data
Unfortunately, it becomes impossible to figure out which message parts are related to message routing and which ones are application-level data particularly when the number of intermediate hops changes over time. I suggest mimicking the behavior of the REQ socket and *always* prepend each message sent from the source with a null message. The DEALER socket in the source does not do this; the application should manually prepend this message part prior to sending the body.
queue2.B
queue1.B
source
<null message part used as a delimiter>
body parts / application-level data
The destination socket has a little more work to do than the intermediate hops since it is interested in the application-level data. The ROUTER socket handler should save all of the routing information by iterating through each message part and saving it up to *and including* the null message part. The null message part is the delimiter that indicates to the application that any remaining message parts are the true body of the message.
By modifying the application that uses the DEALER to prepend a null message part you also ensure interoperability with REP sockets which require the null message part as a delimiter.
Using REQ/REP Sockets Too
It is almost always simpler to use REQ/REP sockets. We can rework the prior example using those socket types for app1 and app2.
By examining the message contents, we see that they exactly mimic the second case where we insert a null packet as a delimiter between routing information and the message body.
queue2.B
queue1.B
source
<null message part used as a delimiter>
body parts / application-level data
By following this convention when using DEALER/ROUTER sockets we automatically guarantee interoperability with regular REQ/REP sockets. As circumstances change that make REQ/REP a better choice, all existing code just continues to work.
A Little Code
Here's how it works in Ruby from the perspective of the source DEALER socket.
# Called when the socket can write an entire message without blocking
#
def on_writable socket
# application sends a null message part before sending the body
# of the message
socket.send_message ZMQ::Message.new, true
request = ZMQ::Message.new "some request data"
socket.send_message request
end
Here's how it works in Ruby from the perspective of the destination ROUTER socket.
# called when the socket can read a whole message without blocking
#
def on_readable socket, messages
# note that +messages+ is modified in-place by this method call
routing_info = prepare_response_routing messages
if !messages.empty?
send_routing routing_info
# any remaining message parts contain application-level data
do_something_with messages
end
end
# Save off the messages pertaining to routing including the null message
# delimiter
#
def prepare_response_routing messages
routing = []
until messages.empty?
# take the first message from the +messages+ array and push it
# onto the +routing+ stack
message = messages.shift
routing.push message
# break out of this loop after we hit the null message part delimiter
break if message.copy_out_string.empty?
end
# returns the +routing+ array to the caller
routing
end
# application logic for doing something with the message body
#
def do_something_with body_parts
result = process body_parts
response = ZMQ::Message.new result
# second arg to #send_message defaults to false, i.e. no more message parts
@socket.send_message response
end
# transmit the routing information using SNDMORE semantics
#
def send_routing messages
messages.each do |message|
@socket.send_message(message, (multipart = true))
end
end
Hopefully the code sample above is easily interpreted.
Debugging
When writing my first application that used DEALER/ROUTER sockets end-to-end I noticed that I could send messages from the source to destination fine but the return trip failed. Clearly, the routing information was being mismanaged somewhere but the logs didn't indicate the location of the problem.
To solve the problem, I modified my queue device to print the contents of every message that passed through it. The device knows the direction of the received message (inbound from source, outbound to destination, inbound from destination, outbound to source), so I also added tags for "in" and "out" to the debug print.
Lastly, I printed the message contents at the source DEALER and the destination ROUTER.
The debug data showed the path each message took. By glancing at the "stack" I could see that the destination ROUTER socket was not sending all of the return addresses when replying.
My topology was similar to the example above except it had a single queue between app1 and app2. The debug data from source to destination looked like:
[out]
[out] body
[in] app1
[in]
[in] body
[out] app1
[out]
[out] body
[in] queue
[in] app1
[in]
[in] body
That looks correct.
On the return path, it printed:
[out] queue
[out]
[out] reply_body
[in]
[in] reply_body
The reply got dropped by the queue device. We can clearly see from the return path that the socket identity for "app1" is missing. This gave enough information to find the flaw in "app2" that was failing to preserve the entire identity stack for the reply.