| Multi-Cluster Ganglia ConfigurationKrazyWorks

Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Performance

Multi-Cluster Ganglia Configuration

Submitted by Igor on June 17, 2015 – 11:28 am

I’ve discussed Ganglia configuration and RHEL/CentOS installation in the past. The usual structure is where every node belongs to one cluster. Recently I ran into a requirement to have nodes assigned to more than one cluster. For example, node1 is already part of the “Production Servers” cluster, but I also needed it to be in the “Production WebLogic” cluster. This way I can track performance of multiple groups of nodes based on various aspects of their functionality.

In a nutshell, every node runs gmond daemon that communicates with a central gmond instance (the cluster head node) for that cluster over a specific port. This central gmond instance then communicated with the data collection node (ganglia server) running gmetad on the same port. For every cluster definition you would use a different port.

Here’s a sample configuration for a single-cluster environment, where ganglia_server is running gmetad, head_node1 is running central gmond, and node1..n are the compute nodes running gmond. The flow of data is node1..n –> head_node1 –> ganglia_server

ganglia_server:/etc/ganglia/gmetad.conf extract:

data_source "Production Servers" 10 head_node1:8649

head_node1:/etc/ganglia/gmond.conf extract:

cluster {
  name = "Production Servers"
}
udp_send_channel {
  host = ganglia_server
  port = 8649
  ttl = 1
}
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}

node1:/etc/ganglia/gmond.conf extract:

cluster {
  name = "Production Servers"
}
udp_send_channel {
  host = head_node1
  port = 8649
  ttl = 1
}
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}

Now, let’s say, you want to define the “Production WebLogic” cluster with node1 as a member. However, node1 should also be a member of the original “Production Servers” cluster. You will need to add the cluster definition to the ganglia_server:/etc/ganglia/gmetad.conf like so:

data_source "Production Servers" 10 head_node1:8649
data_source "Production WebLogic" 10 head_node1:8650

The second step is trickier: on head_node1 you will need to create a second gmond.conf to relay the data for the “Production WebLogic” cluster to the ganglia_server. This would also require running a second instance of gmond daemon on a different port. The new configuration file will be called head_node1:/etc/ganglia/gmond_prod_weblogic.conf

cluster {
  name = "Production WebLogic"
}
udp_send_channel {
  host = ganglia_server
  port = 8650
  ttl = 1
}
udp_recv_channel {
  port = 8650
}
tcp_accept_channel {
  port = 8650
}

Finally, in node1:/etc/ganglia/gmond.conf you will need to add the new send/receive channels for the new port:

cluster {
  name = "Production Servers"
}
udp_send_channel {
  host = head_node1
  port = 8649
  ttl = 1
}
udp_recv_channel {
  port = 8649
}
tcp_accept_channel {
  port = 8649
}
udp_send_channel {
  host = head_node1
  port = 8650
  ttl = 1
}
udp_recv_channel {
  port = 8650
}
tcp_accept_channel {
  port = 8650
}

Thus, node1 will be sending the same data to head_node1 on two different ports. Note that the “cluster name” parameter remains unchanged: you can only have one of those in gmond.conf

If your main gmond.conf is centrally deployed by a configuration management utility (Puppet, Salt, Chef, etc), it would be best to use the “include” directive in gmond.conf. Something like this added at the very end of the main gmond.conf.

¹

include ('/etc/ganglia/conf.d/custom_*.conf')

And then you create /etc/ganglia/conf.d/custom_ports.conf containing something like this:

/* Production WebLogic */
udp_send_channel {
  host = head_node1
  port = 8650
  ttl = 1
}
udp_recv_channel {
  port = 8650
}
tcp_accept_channel {
  port = 8650
}

This way you can have a portion of gmond configuration that you can managed manually, without the configuration management software trampling all over it.

You will need to modify head_node1:/etc/init.d/gmond startup script to launch multiple instances of gmond for every gmond.conf you created. The best way of doing this is to standardize the naming convention for gmond.conf. For example: /etc/ganglia/gmond_prod_servers.conf; /etc/ganglia/gmond_prod_weblogic.conf; etc.

Doing so will allow you to add a “for” loop to the /etc/init.d/gmond script to get a list of all configuration files matching “/etc/ganglia/gmond_” string and launch multiple instances of gmond daemon. Here’s an example:

#!/bin/sh
# $Id: gmond.init 180 2003-03-07 20:38:36Z sacerdoti $
#
# chkconfig: - 70 40
# description: gmond startup script
#
GMOND=/usr/sbin/gmond
CONFDIR=/etc/ganglia
CONFNAME="gmond_"

. /etc/rc.d/init.d/functions

RETVAL=0

case "$1" in
   start)
          for i in `ls "${CONFDIR}/${CONFNAME}"*`
          do
                  in=$(echo ${i} | awk -F/ '{print $NF}' | awk -F. '{print $1}')
                  echo -n "Starting GANGLIA gmond for ${in} "
                  [ -f $GMOND ] || exit 1

                  daemon $GMOND -c ${i}
                  RETVAL=$?
                  echo
                  [ $RETVAL -eq 0 ] && touch /var/lock/subsys/gmond_${in}
          done
        ;;
  stop)
          for i in `ls "${CONFDIR}/${CONFNAME}"*`
          do
                  in=$(echo ${i} | awk -F/ '{print $NF}' | awk -F. '{print $1}')
                  echo -n "Shutting down GANGLIA gmond: "
                  killproc gmond
                  RETVAL=$?
                  echo
                  [ $RETVAL -eq 0 ] && rm -f /var/lock/subsys/gmond_${in}
          done
        ;;

  restart|reload)
        $0 stop
        $0 start
        RETVAL=$?
        ;;
  status)
        status gmond
        RETVAL=$?
        ;;
  *)
        echo "Usage: $0 {start|stop|restart|status}"
        exit 1
esac

exit $RETVAL

The final step is to bounce gmond on head_node1 and node1, as well as gmetad daemon on ganglia_server.