Multi-Cluster Ganglia Configuration
I’ve discussed Ganglia configuration and RHEL/CentOS installation in the past. The usual structure is where every node belongs to one cluster. Recently I ran into a requirement to have nodes assigned to more than one cluster. For example, node1 is already part of the “Production Servers” cluster, but I also needed it to be in the “Production WebLogic” cluster. This way I can track performance of multiple groups of nodes based on various aspects of their functionality.
In a nutshell, every node runs gmond daemon that communicates with a central gmond instance (the cluster head node) for that cluster over a specific port. This central gmond instance then communicated with the data collection node (ganglia server) running gmetad on the same port. For every cluster definition you would use a different port.
Here’s a sample configuration for a single-cluster environment, where ganglia_server is running gmetad, head_node1 is running central gmond, and node1..n are the compute nodes running gmond. The flow of data is node1..n –> head_node1 –> ganglia_server
ganglia_server:/etc/ganglia/gmetad.conf extract:
data_source "Production Servers" 10 head_node1:8649
head_node1:/etc/ganglia/gmond.conf extract:
cluster { name = "Production Servers" } udp_send_channel { host = ganglia_server port = 8649 ttl = 1 } udp_recv_channel { port = 8649 } tcp_accept_channel { port = 8649 }
node1:/etc/ganglia/gmond.conf extract:
cluster { name = "Production Servers" } udp_send_channel { host = head_node1 port = 8649 ttl = 1 } udp_recv_channel { port = 8649 } tcp_accept_channel { port = 8649 }
Now, let’s say, you want to define the “Production WebLogic” cluster with node1 as a member. However, node1 should also be a member of the original “Production Servers” cluster. You will need to add the cluster definition to the ganglia_server:/etc/ganglia/gmetad.conf like so:
data_source "Production Servers" 10 head_node1:8649 data_source "Production WebLogic" 10 head_node1:8650
The second step is trickier: on head_node1 you will need to create a second gmond.conf to relay the data for the “Production WebLogic” cluster to the ganglia_server. This would also require running a second instance of gmond daemon on a different port. The new configuration file will be called head_node1:/etc/ganglia/gmond_prod_weblogic.conf
cluster { name = "Production WebLogic" } udp_send_channel { host = ganglia_server port = 8650 ttl = 1 } udp_recv_channel { port = 8650 } tcp_accept_channel { port = 8650 }
Finally, in node1:/etc/ganglia/gmond.conf you will need to add the new send/receive channels for the new port:
cluster { name = "Production Servers" } udp_send_channel { host = head_node1 port = 8649 ttl = 1 } udp_recv_channel { port = 8649 } tcp_accept_channel { port = 8649 } udp_send_channel { host = head_node1 port = 8650 ttl = 1 } udp_recv_channel { port = 8650 } tcp_accept_channel { port = 8650 }
Thus, node1 will be sending the same data to head_node1 on two different ports. Note that the “cluster name” parameter remains unchanged: you can only have one of those in gmond.conf
If your main gmond.conf is centrally deployed by a configuration management utility (Puppet, Salt, Chef, etc), it would be best to use the “include” directive in gmond.conf. Something like this added at the very end of the main gmond.conf.
1include ('/etc/ganglia/conf.d/custom_*.conf')
And then you create /etc/ganglia/conf.d/custom_ports.conf containing something like this:
/* Production WebLogic */ udp_send_channel { host = head_node1 port = 8650 ttl = 1 } udp_recv_channel { port = 8650 } tcp_accept_channel { port = 8650 }
This way you can have a portion of gmond configuration that you can managed manually, without the configuration management software trampling all over it.
You will need to modify head_node1:/etc/init.d/gmond startup script to launch multiple instances of gmond for every gmond.conf you created. The best way of doing this is to standardize the naming convention for gmond.conf. For example: /etc/ganglia/gmond_prod_servers.conf; /etc/ganglia/gmond_prod_weblogic.conf; etc.
Doing so will allow you to add a “for” loop to the /etc/init.d/gmond script to get a list of all configuration files matching “/etc/ganglia/gmond_” string and launch multiple instances of gmond daemon. Here’s an example:
#!/bin/sh # $Id: gmond.init 180 2003-03-07 20:38:36Z sacerdoti $ # # chkconfig: - 70 40 # description: gmond startup script # GMOND=/usr/sbin/gmond CONFDIR=/etc/ganglia CONFNAME="gmond_" . /etc/rc.d/init.d/functions RETVAL=0 case "$1" in start) for i in `ls "${CONFDIR}/${CONFNAME}"*` do in=$(echo ${i} | awk -F/ '{print $NF}' | awk -F. '{print $1}') echo -n "Starting GANGLIA gmond for ${in} " [ -f $GMOND ] || exit 1 daemon $GMOND -c ${i} RETVAL=$? echo [ $RETVAL -eq 0 ] && touch /var/lock/subsys/gmond_${in} done ;; stop) for i in `ls "${CONFDIR}/${CONFNAME}"*` do in=$(echo ${i} | awk -F/ '{print $NF}' | awk -F. '{print $1}') echo -n "Shutting down GANGLIA gmond: " killproc gmond RETVAL=$? echo [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/gmond_${in} done ;; restart|reload) $0 stop $0 start RETVAL=$? ;; status) status gmond RETVAL=$? ;; *) echo "Usage: $0 {start|stop|restart|status}" exit 1 esac exit $RETVAL
The final step is to bounce gmond on head_node1 and node1, as well as gmetad daemon on ganglia_server.