zmq_poll workaround
This issue is fixed in ØMQ/2.1.0 and later. The workaround below is needed only for 2.0.x versions of ØMQ
When you call zmq_poll with a timeout it has the unfortunate tendency to return before the timeout completes, and without any socket activity. Here is a workaround wrapper written by Thomas Guyot-Sionnest <moc.liamg|toyugt#moc.liamg|toyugt> aka 'dermoth' that shows how to use timers to get a poll that works:
/* wrapper around zmq_poll which may return zero without reaching the
* specified timeout */
int
my_zmqpoll (zmq_pollitem_t *items, const int nitems, const long timeout)
{
struct timeval tv, te;
int rc, ret;
long tmleft;
/* Populate te with timeout value */
te.tv_sec = timeout / 1000000;
te.tv_usec = timeout - (te.tv_sec * 1000000);
rc = gettimeofday (&tv, NULL);
assert (rc == 0);
/* Add current time to the timeout (end time) */
te.tv_sec += tv.tv_sec;
te.tv_usec += tv.tv_usec;
te.tv_sec += te.tv_usec / 1000000;
te.tv_usec %= 1000000;
/* Loop over, return either >0, or 0 after a timeout */
tmleft = timeout;
while (1) {
ret = zmq_poll (items, nitems, tmleft);
assert (ret >= 0);
rc = gettimeofday (&tv, NULL);
assert (rc == 0);
if (ret == 0) {
/* Keep on looping unless time's up */
if (te.tv_sec < tv.tv_sec
|| (te.tv_sec == tv.tv_sec && te.tv_usec <= tv.tv_usec))
return ret;
tmleft = ( (te.tv_sec - tv.tv_sec) * 1000000) + (te.tv_usec - tv.tv_usec);
}
else
return ret;
}
}
Thomas has released this code to the public domain.
Comments: 10
page revision: 1, last edited: 18 Jan 2011 08:51
I would rewrite the while loop so there are no useless calls to gettimeofday(), and the retry logic is simpler:
It's a wiki :-) Feel free to edit…
Portfolio
True, and I did forget about that this time. But more importantly, I wanted to keep both the original and my suggestion, in case mine is flawed.
Seems Martin posted a patch to zmq_poll today, so this case is happily closed.
Portfolio
Any idea when this issue will be fixed? … Using while true loop is not very efficient.
@pieterh where is the patch applied? I am using the latest stable release & the issue exists.
Thanks.
The issue is fixed in 2.1 series.
There appears to be an implementation/documentation gotcha in zmq_poll() as of 2.1.0. The document states the timeout is in microseconds.
The document doesn't mention that the _resolution_ of the underlying poll() is milliseconds. This becomes particularly critical if one tries to implement a timeout under 1000us because they all become 0, which of course is an immediate return (and one ends up with a busy wait…)
I checked the source to confirm and there are several /1000 (zmq.cpp lines 482,570,571,648). This cost me an hour of debugging so it would be great if someone could amend the zmq_poll() page with an appropriate warning to help others.
regards, Kim
This can be done in 2.x versions, you should probably suggest so on the mailing list. The 3.0 version (current trunk) changes the timeout unit to milliseconds which makes the issue obvious.
Good point. I've added a line to the zmq_poll man page for 2.1, and this'll go into the next release. Thanks!
Portfolio
Good thing i have seen this wiki. I gain some ideas here.