Please comment and possibly extend this page!
Why ØMQ does need some introspection capabilities
First a text about logging and monitoring large (and distributed) systems, to lay ground for the discussion. This article that started it all for me: http://highscalability.com/log-everything-all-time
Other sources (aka books, articles, stories) suggest the same, see:
- Scalable Internet Architectures, Theo Schlossnagle, ISBN 0-672-32699-X
- Web Operations, John Allspaw & Jesse Robins, ISBN 978-1-449-37744-1
There are much more sources on the WEB showing how important a good monitoring is. Most of the articles published on how a system survided something start with the mentioning of the monitoring that detected something fishy and notifed the ops.
Executive summary:
Gather as much information as possible on what the system does.
To reach this goal we need a way to see what ØMQs idea of the current state is. The following attributes should be made accessible for this purpose:
- ØMQ socket state (blocked, over HWM …)
- ØMQ queue lengths on a socket
- Messages sent/received since last query (needs defined start)
- Number of underlying OS level sockets/connections
Also frequently requested are the following bits of information:
- Endpoints (ip, port) of the underlying OS level sockets
- Buffer sizes of the OS level sockets
With the advent of messaging the necessity of monitoring the flow of messages also rises. It doesn't make sense to closely watch the database queries but ignore the path where the requests and results are transmitted on.
Another important point is the angst of admins. If they can't see what something is doing, they reject it.
we just have the case of the angst of the OPS team not seeing whats going on. so this is a great article. does anybody know about she status in zmq?