Monthly Archives: March 2014

Groupnames with pdsh

pdsh is one of my favorite utilities for fiddling with Hadoop clusters. The parallel distributed shell fans commands out to the machines you name on the command line with the -w options, e.g. pdsh -w “server1 server2 server3″ “ls -l”. There is also support for wildcarding and this allows you to refer to the machines in the example with a shorthand syntax of server[1-3]. You can even exclude machines by using a -x “servername” option.

These features are great but typing all those server names over and over gets a bit tedious even in the short form. That is when you should you start using groupnames. You can create a file in the ~/.dsh/group directory, or in the /etc/dsh/group directory. You will name the file as the groupname you want to create and place a newline separated list of machine names in the file. For example, the file ~/.dsh/group/all could contain a list of all the files in your cluster and you would invoke it as pdsh -g all ls to run an ls command on each server in the group. You can still exclude some machines with the -x option, or an entire groupname with the -X option.