Tools on LXSHARE

Tools to run the tests Automatic execution script for setup_daq Analysis Tools

RGANG tool introduction Command line execution on multiple hosts Tools on LXSHARE

Timing tests for the state transitions:

Timing measurements on the state supervisor transitions and the rc transitions can be performed using option s in the setup_daq or play_daq scripts. Several variants are suggested

A build-in timing test which start the infrastructure and cycles up to running, stop, start and cycle down the states and remove the infrastructure. Timing results are stored in a file. Setup_daq and play_daq run this same timing function .
A timing test which start the infrastructure and cycles up to running, stay n seconds in running state and cycle down the states and remove the infrastructure. Timing results are stored in a file. Setup_daq and play_daq can run this same timing function.
User defined timing tests: the user defined the sequence of transitions. He/she has to write the result function as well.
Tests with the text_gui: a command line gui allows to enter commands for the state transitions. The time is displayed after each transition and stored in the results file. Setup_daq or play_daq can be used.

Timing test option -tt

In setup_daq the option -tt or in play_daq the command line parameters time nogui will automatically run through the states as indicated in the diagram The transition times are measured. They are displayed after the shutdown is performed and printed in the results_logs_file, see setup_daq -h / play_daq -h

In setup_daq the option -ut or in play_daq the command line parameters time nogui will automatically run through the states as indicated in the diagram The transition times are measured. They are displayed after the shutdown is performed and printed in the results_logs_file ,see setup_daq -h / play_daq -h

Timing test -ut [--usercycle <.play_daq_usercycle>]

The .play_daq_usercycle is not in the release but you can copy it from
/afs/cern.ch/user/a/atlonl/public/tools/execution/.play_daq_usercycle

The default file .play_daq_usercycle is expected to reside in your $HOME (yes, including the leading '.')
USR_RUNNING_TIME defines the time in running state before stopping. The default value is 30s

Timing test -ut --usercycle <user_file>

The user can develop and use his/her own sequence of transitions. In order for results to be displayed and stored, he/she must provide the corresponding funtion. The command for setup_daq is : -ut --usercycle <path_name_to_user_timing_funtion>

The text_gui

The text_gui is contained in the release tdaq-01-02-00

Tests with the text_gui is a command line gui which allows to enter commands for the state transitions. The time is displayed after each transition and stored in the results file. Setup_daq or play_daq can be used. The text_gui is invoked with the options -ut --usercycle text_gui in setup_daq. The file text_gui must be copied to the home directory or the full path to text_gui must be entered.

Automatic execution scripts

setup_daq_n_times:

Where can I find it ? Not yet in release tdaq-01-02-00, but you can find the script and the example described below in /afs/cern.ch/user/a/atlonl/public/tools/execution

setup_daq_n_times is a script to run setup_daq 'n' times (displayed as loop count) for every partition listed in a partitions file, with the option of repeating indefinitely from the start of the partition list when it has reached the end. Read the help information given with the -h option for more info. An example of its use is given for running over a list of partitions 20 times each, executing a user defined timing script. Optionally a short summary mail is send when the script is terminated. The command line would look like:
setup_daq_n_times -p myPartitionList -n 20 -ut --usercycle <pathname_of_usercycle_file> -m <email-address>

Analysis Tools :

The results files produced by a series of .play_daq_usercycle can be analysed with the tool analyse_results_usercycle3.pl.

Where can I find it ? Not yet in release tdaq-01-02-00, but you can find the script and the example described below in /afs/cern.ch/user/a/atlonl/public/tools/analysis

DESCRIPTION

Normally, using the automatic execution procedure, a set of partitions is run n times, for example using play_daq_n_times (note that the corresponding setup_daq_n_times is in preparation).
The analysis script produces scatter plots for each measured transition (setup, boot, configure, etc) for each partition which is referenced in the file. Note that the criteria is the datafile name, not the partition name.
In addition, the average of each partition for the respective number of runs is calculated and displayed in a comparison plot per transition for all the listed partitions, versus the number of controllers contained in the respective partition. Failures are not counted. This is in particular useful when having run partitions of different sizes in terms of included controllers. The ps file can be viewed with gvv or another tool.
The analysis script produces a number of intermediate files for each partition and for the average. The .csv files are convenient for importing the sorted data into excel to produce presentable graphs. The .dat files are created for each partition, for all partitions and the average of each partition. These files do not include a header line like the .csv but rather only the timing values.The .txt file is the file that is passed by the script to gnuplot to create the .ps.file.

SYNOPSIS:
analyse_results_usercycle3.pl [-h] -f <input_file> -i <info_file> -o <output_file_name>

OPTIONS:
-h this help message
-f <input_file> the input file
-i <info_file> the file with the measurement info on the rc state transitions
-o <output_file_name> the file name for the global output files

EXAMPLE of analyse_results_usercycle3 USAGE, with measurements from the tests on Westgrid:
./analyse_results_usercycle3.pl -f PLAY_DAQ_N_TIMES--meas-May19Final222---timing -i ./timing_info_usercycle.txt -o measure19

<input_file> PLAY_DAQ_N_TIMES--meas-May19Final222---timing
<info_file> timing_info_usercycle.txt
<output_file_name> measure19

this line will generate fallowing files:
measure19.csv
measure19.dat
measure19.ps
measure19.txt
PLAY_DAQ_N_TIMES--meas-May19Final222---timing_glob.csv
Res_PARTITIONS_part_efTest.data.xml.p840_05subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.csv
Res_PARTITIONS_part_efTest.data.xml.p840_05subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.dat
Res_PARTITIONS_part_efTest.data.xml.p840_10subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.csv
Res_PARTITIONS_part_efTest.data.xml.p840_10subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.dat
Res_PARTITIONS_part_efTest.data.xml.p840_15subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.csv
Res_PARTITIONS_part_efTest.data.xml.p840_15subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.dat
Res_PARTITIONS_part_efTest.data.xml.p840_20subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.csv
Res_PARTITIONS_part_efTest.data.xml.p840_20subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.TimeOutS.dat
Res_PARTITIONS_part_efTest.data.xml.p840_30subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.Timeouts.csv
Res_PARTITIONS_part_efTest.data.xml.p840_30subFarms_20nodes_1efd_4pt_0rc_0is.ParamsOmni.Timeouts.dat

Note that at the time of running ,only the root and intermediate controller was counted and not the local controllers and therefore all partitions show 2 controllers. This has been modified for release tdaq-01-02-00 in play_daq.

For the corresponding excel graphs, an additional column has been reated by hand containing the number of EF processes
per partition in the excel file only. It can be viewed here: timingSeries1&2.xls

RGANG

RGANG TOOL: A SMALL INTRODUCTION (15/12/2004)
(last update:06/06/2005)

TOOL TO EXECUTE COMMANDS OR SCRIPTS IN SEVERAL MACHINES IN A PARALELL WAY
[Note: The following commands and scripts have not yet been tested in more than 20 machines in paralell in the testbeds
Still waiting for doing those tests when machines are available]

IMPORTANT NOTES:
The file rgang.py, as well as the following described scripts, are under
These files should be in the PATH in order to work properly.
They can be found together with rgang.py under /afs/cern.ch/user/s/sobreira/public.

INSTRUCTIONS:

RGANG options (switch):
--nway=1 ----> done in 'serial' mode
--nway=200 (BY DEFAULT) ----> very fast (more machines executing in paralell-maximun number)

A typical RGANG command has the following sintax:

rgang.py farmlet_name 'command arg1 arg2 ... argn'

--> farmlet_name manually entered:

Range of machines: rgang.py "pcatb{132-140}" 'command arg1 arg2 ... argn'
Set of machines: rgang.py "pcatb{132,140}" 'command arg1 arg2 ... argn'

--> farmlet_name can also be a file, containing the machines' names:

#Comment
pcatb132
pcatb133
pcatb134
pcatb135
pcatb136
pcatb137
pcatb138
pcatb139
pcatb140

RGANG can be very useful for performing commands in several machines (executing in paralell):

typical commands performed may be:

> Copying files to several nodes (rgang's copy mode - take care of the -c switch after rgang):

rgang.py -c "pcatb{132-140}" origin_file destination_file

Example: rgang.py -c "pcatb{132-140}" /tmp/test_orig /tmp/test_dest

.Check if the file has really (it should!!) been copied: rgang.py -x "pcatb{132-140}" ls /tmp/test_*
(without -c switch, since we don't want to copy anything-copy mode not required)
.Remove the file: rgang.py -x "pcatb{132-140}" rm /tmp/test_dest

> Executing files or commands in several nodes (not in the copy mode, no -c switch)
The -x switch avoids some "annoying" messages

rgang.py -x "pcatb{131-140}" path_file_to_execute

Examples:
rgang.py -x "pcatb{132-140}" rm /tmp/test_dest
rgang.py -x "pcatb{132-140} ~/scripts/testscript --->Range of machines
rgang.py -x "pcatb{132,137,139}" ~/scripts/testscript --->Set of machines
rgang.py -x pcatb{131-140} ls /tmp
rgang.py -x "pcatb{132-140}" 'ls /tmp | grep test_dest'

USEFUL COMMAND:
rgang.py -x "pcatb{131-140"} ps -u username ---> displays all processes belonging to the user username on the given set of nodes

> Measuring command execution performance:
time rgang.py -x --nway=4 pcatb{131-140} /home/sobreira/rgang/bin/script ---> slow
time rgang.py -x --nway=10 pcatb{131-140} /home/sobreira/rgang/bin/script ---> faster (done more in paralell)
time rgang.py -x pcatb{131-140} /home/sobreira/rgang/bin/script ---> much faster (default nway=200)

###########################################SCRIPTS DEVELOPED USING/FOR (ON TOP OF) RGANG###########################################
> "show" script
it can be performed using rgang (several machines in paralell) or in just one machine (regular usage)

The script can:

-h option: display help and some examples

#---------------------------------INCLUDE USER Section----------------------------------#
-u option: show all processes belonging to user
Examples:
#RGANG USAGE: rgang.py -x pcatb{131-140} show -u "werner\ sobreira\ root"
#REGULAR USAGE: show -u werner\ sobreira\ root

#---------------------------------EXCLUDE USER Section----------------------------------#
-v option: show all users and their processes, except those refering to the speficied users
Examples:
#RGANG USAGE: rgang.py -x pcatb{131-140} show -v "werner\ root\ sobreira"
#REGULAR USAGE: show -v werner\ root\ sobreira

#--------------------------------INCLUDE PATH Section-----------------------------------#
-p option: show all users performing the processes refered by their path
Examples:
#RGANG USAGE: rgang.py -x pcatb{131-140} show -p "/sbin\ pmg\ /usr\ /usr/libexec/post\ /afs/cern.ch/"
#REGULAR USAGE: show -p /sbin\ pmg\ /usr\ /usr/libexec/post

#--------------------------------EXCLUDE PATH Section-----------------------------------#
-q option: show all users performing whichever processes except those which are specified
Examples:
#RGANG USAGE: rgang.py -x pcatb{131-140} show -q "pmg\ /afs\ /sbin\ /usr\ crond\ ypbind\ /afs"
#REGULAR USAGE: show -q pmg\ /afs\ /sbin\ /usr\ crond\ ypbind

NOTE: Just the beginning of the arguments (users or paths) might be specified and multiple arguments (users or paths) should be separated by "\ " (slash and space)

###################
> "collect" script
It can be only be performed using rgang utility.
One run of the script "overlaps" a previous run, so that previous a node's name directory can be deleted if
that node doesn't contain any searched file
NOTE: * must be preceed by an \ when specifying more than one file

"Collect" certain file(s) from several nodes and send a copy to a specified node in a specified directory
Files are kept under a directory with the name being the provenience node name

Syntax: rgang.py -x farmlet_name collect path_file destination_node:directory_to_place_files

Examples: rgang.py -x pcatb{131-140} collect "/tmp/expfile\*" $HOST:~/exp
(collecting all files in the tmp directory starting by 'expfile' and placing it in the actual machine
in a directory called exp)

rgang.py -x pcatb{131-140} collect "/tmp/expfile\*" pcatb135:/tmp
(collecting all files in the tmp directory starting by 'expfile' and placing it in the pcatb135 machine
in the 'tmp' directory under the 'pcatb135' directory)

rgang.py -x pcatb{131-140} collect "/tmp/expfile4.teste" $HOST:~/exp
(collecting a file called /tmp/expfile.test and place it in ~/exp/pcatbxx

##################
> "machines_up" script
It uses rgang internally; therefore, rgang.py must be in the PATH in order to perform this script

Shows machines which machines are up in a given set of nodes and stores then in a given file, so that
this file can be used to specify instructions using rgang

Syntax: machines_up -s set_of_machines_to_analyze [-f file_to_store_machines_up]

Examples:
machines_up -h ---->(help) to display the usage
machines_up -s "pcatb{131-180}" -f /tmp/up
machines_up -s "pcatb{131,135,152,180}" -f ~/up_machines

Tools on LXSHARE:

Monitoring system gives status of nodes, CPU consumption and network traffic,
see http://ccs003d.cern.ch/lemon-status/

the Quattor system manages the nodes for sw installation, repository updates,
servers and daemons to run, etc according to an installation list

Monitoring of the farm (to be confirmed)
-----------------------
The subset of nodes that will be assigned to us should be available in
ythe monitoring system as Atlas-tdaq.
to be made available:
- a file containing the host names of the nominal list of nodes
    belonging to Atlas-tdaq, one per line.
- a file containing the host names of the actual list of nodes
    belonging to Atlas-tdaq, one per line.
    'actual' means that the nodes is in a state in which it responds to
    ssh. It is important that this file is automatically updated when a
    change in actual availability occurs.

rename_partition

copies originalPartitionName to newPartitionName

node_selection

Instructions

useful links