Introduction

There are currently 3 levels of script to generate and run EF partitions. The first 2 levels are used to generate the configurations. The top level uses the bottom 2 levels to automatically generate many different configurations, run them and take timing measurements. This document describes how to use the scripts.

The configurations generated are standalone EF partitions which use SFI and SFO emulators. The SFI emulators generate dummy data which is sent to the EFD applications in the EF which forward the events to dummy PT applications (i.e. PTs which do not run physics algorithms - but simply run implement a configurable "burn time" and acceptance rate. The events are passed back to the EFD and then forwarded to the SFO emulators (they are not written to disk). The normal configuration is 1 EFD and ~4 PTs per processing node. But multiple EFDs can be run per node in order to simulate larger systems. The EF is split into a number of sub-farms. These will probably consist of ~30 processing nodes per
sub-farm. However, one of the main aims of the large scale tests is to investigate how sub-farm configuration (number of sub-farms, number of nodes per sub-farm) affect system performance.

The first aim of the large tests is to verify that various EF configurations may be cycled through the entire DAQ FSM (finite-state machine) without error. That is it must be possible to launch all EF applications, configure them, bring them to the running state, stop them running, unconfigure them and subsequently terminate them.

Once this funtional testing is done, the next step is to make timing measurements for each of the various state variations for a sample of
representative configurations. The configurations can either be generated manually or automatically using the highest level script.

Checks on the dataflow behaviour of large-scale EF configurations can be made. For a number of large scale configurations it should be verified whilst in the "running" state that all PTs are receiving events, and assuming that each sub-farm is of a similar size and homogeneous, the total event throughput of each sub-farm is approximately equal.

Finally a long term stability test could be made with a couple of representative large-scale configurations to verify that event throughput stays approximately constant over time.

The following sections describe in detail, how to generate and run standalone EF configurations.

IMPORTANT DISCLAIMER:
The scripts described in this document are under active development and their current form is due to rather disorganised evolution over the last months. We hope to be revise the scripts over the summer into a more coherent structure. In the meantime, the method might not be very beautiful, but it works. Obviously all constructive feedback on how to improve the procedure and ease-of-use of the scripts is welcomed (contacts are Per Werner, Gokhan Unel, Sarah Wheeler).

Note: I'm assuming tcsh throughout.

How to run EF standalone localhost partition with Combined Release

1. Download combined release tdaq-01-01-00 from: http://atlas-onlsw.web.cern.ch/Atlas-onlsw/download/download.htm and install all the patches. Bryan has already installed tdaq-01-01-00 and patches in his home directory on WestGrid (this includes a private patch from Marc to solve the pmg problem caused by the underscore in the name of the ice nodes):

/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed

2. The database generation scripts are in the DBGeneration package but are not up-to-date in the release. They need to be checked out from cmt and changed slightly. Bryan also has information on how to check out from cmt locally (i.e. from a machine without direct afs access). Perhaps he could provide a link' Here is a recipe to do this from scratch:

Setup cmt for the tdaq-01-01-00 release (this must be done from a machine with CERN afs access):

source /afs/cern.ch/atlas/project/tdaq/cmt/bin/cmtsetup.csh tdaq-01-01-00

make a working directory, e.g.

mkdir ~/DC

check out DBGeneration package into it (this has been under active development - so I suggest you check out a specific tagged version, v1r1p33, which has been tested and is consistent):

cd ~/DC
cmt co -r v1r1p33 DAQ/DataFlow/DBGeneration

For running on WestGrid you will have to make some small changes.

The HW_tags in gen_nodes.py need to be set back to RH73, every instance of:

 <attr name="HW_Tag" type="enum">"i686-slc3"</attr>

should be changed to:

 <attr name="HW_Tag" type="enum">"i686-rh73"</attr>

In gen_partition.py you will need to add a value for the RepositoryRoot to pick up the following private patches currently installed in my home directory on WestGrid:

re-implemention of IS statistics for the dummy PTs
elimination of most of the (unnecessary) logging output from the PT
implemention of the Gatherer (application to sum operational statistics by sub-farm)

Replace:

'<attr name="RepositoryRoot" type="string">""</attr>'

with:

 '<attr name="RepositoryRoot" type="string">"/global/home/swheeler/InstallArea"</attr>'

For operational monitoring, the Gatherer segment should be included when the database is generated, modify gen_db.sh to include the Gatherer .xml file:

echo '<info name="" num-of-includes="5" num-of-items="'$objs'" oks-format="extended" oks-version="3.0.0" created-by="gensetup scripts" />'
cat <<EOF 

<include>
<file path="DFConfiguration/schema/df.schema.xml"/>
<file path="DAQRelease/sw/repository.data.xml"/>
<file path="daq/segments/setup.data.xml"/>
<file path="DFConfiguration/data/efd-cfg.data.xml"/>
<file path="DFConfiguration/segments/gathLocalhost.data.xml"/>
</include>

Make sure the segment is included (but disabled - instructions on how to enable it come later in operational monitoring section) in the partition, change (this is a hack - really should change to number_of_segments+1, but I don't know python):

    #contains
    print ''' <rel name="Segments" num="%d">''' % len(segments)
    for seg in segments:
        print '''  "Segment"''','''"'''+seg+'''"'''
    print ''' </rel>''',

to:

    #contains
    print ''' <rel name="Segments" num="2">''' 
    for seg in segments:
        print '''  "Segment"''','''"'''+seg+'''"'''
    print '''  "Segment"''','''"gathLocalhost"'''
    print ''' </rel>''',

To make sure it is disabled, change the line:

<rel name="Disabled" num="0"></rel>

to:

 <rel name="Disabled" num="1">
  "Segment" "gathLocalhost"
 </rel>

Note: I have already checked out copied and made the above changes to v1r1p33 of DBGeneration and you may find it in my home directory on WestGrid:

/global/home/swheeler/DC/DBGeneration

The DC directory also contains some other directories which are needed by the create_ef.bash script which is described in the next section (multihost partition).

3. Go to the gensetupScripts sub-directory of the DBGeneration package and look for the file: input_ef_localhost.cfg. This file is the input to the first-level database generation script. It is currently configured to create a localhost configuration to run on nunatak2.westgrid.ca. If you want to run on a different host change nunatak2.westgrid.ca to the hostname. Note that the hostname must be the same as the result of the 'hostname' command on the machine on which you wish to run and may or may not be qualified with a domain name, e.g.

nunatak1|swheeler|55> hostname
nunatak1.westgrid.ca

ice17_13|1> hostname
ice17_13

If this is not correct the pmg_agent will not be able to start when you try to run your partition.

4. In the same directory type:

./gensetup efTest input_ef_localhost.cfg

This will generate the part_efTest partition in the part_efTest.data.xml file.

5. Setup the combined release by running the script in the tdaq release installed directory e.g.:

source /global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/setup.csh

6. Define the following environment variables:

setenv TDAQ_PARTITION part_efTest
setenv TDAQ_DB_PATH $YOUR_DB_PATH:$TDAQ_DB_PATH
setenv TDAQ_DB_DATA $YOUR_DB_PATH/$TDAQ_PARTITION.data.xml
setenv TDAQ_IPC_INIT_REF file:$YOUR_IPC_DIR/ipc_init.ref

Note that the use of environment variables has changed in this release (it is somewhat simpler than before). For more information read the release notes available from the download page. For instance the .onlinerc file is no longer used.

When there is no shared file system or you do not wish to use it, the ipc reference can be written straight into the TDAQ_IPC_INIT_REF environment variable. The recipe is the following. After first having run the tdaq setup script and before running play_daq, start the global ipc server, giving it the port number to use. In this case 12345.:

ice35_11|gensetupScripts|11> ipc_server -P 12345 &
[1] 23237
ice35_11|gensetupScripts|12> 8/4/05 12:51:49 :: ipc_server for partition "initial" has been started.

Make the reference, giving the machine name and port number and set the TDAQ_IPC_INIT_REF environment variable to this value:

ice35_11|gensetupScripts|12> ipc_mk_ref -H ice35_11 -P 12345
corbaloc:iiop:ice35_11:12345/%ffipc/partition%00initial
ice35_11|gensetupScripts|13> setenv TDAQ_IPC_INIT_REF corbaloc:iiop:ice35_11:12345/%ffipc/partition%00initial

Note: you can use the ipc_ls command to find the list of applications registered with the ipc_server and where they are running with the following command (call it with -h option for explanation):

ice35_11|gensetupScripts|14> ipc_ls -a -l -R -t
Initial reference is corbaloc:iiop:ice35_11:12345/%ffipc/partition%00initial
Connecting to the "initial" partition server ...done.
Getting information for the "initial" object...done.
Getting list of partitions ...done.
Getting list of object types ...done.
initial          ice35_11 23237 swheeler 8/4/05 12:51:49

7. If you want to browse the configuration use the oks_data_editor:

oks_data_editor $TDAQ_DB_DATA

For a graphical view of the partition select Edit/Partition. Left-click on objects to see their attributes, right-click on an object for a menu of possible actions (e.g.show relationships for that object). When re-sizing the display window it often needs to be refreshed. To do this, right-click on blank space and select Refresh.

For more information on the oks_data_editor a good (and still up-to-date) introduction can be found in slides presented at the Second Testbeam training for teams held May 2004.

8. Now you can run play_daq:

   play_daq $TDAQ_PARTITION

Note: If you get inscrutable errors on starting play_daq it is often due to a problem in the database. Running the oks_data_editor gives much more verbose error information and should help to track down the problem. Occasionally I have noticed that when the IGUI starts (especially when using larger configurations) the DAQ Supervisor State will be displayed as "Configuring" and remain like this. I suspect in these cases there is a problem reading the database. Usually exiting and restarting clears the problem. To be understood.

9. When the IGUI starts, take the partition through the DAQ FSM. On Boot all the EF applications are started. On Configure, the applications read configuration information from the database and perform the necessary actions. On Start the SFI emulator will start sending dummy events. You can check that data is flowing through the EFD by starting the IS monitor (note that when using an SFI emulator the event statistics are not written to the main IGUI panel - this is only done when using the real SFI). Click the IS button at the top of the IGUI. When the IS panel appears click on "Select Partition" and select the name of your partition. The list of IS servers running in that partition will then be displayed. Select the EF IS server by clicking on "EF-IS1". Select the "Window" icon at the top of the panel (show available information objects). This will give a list of all the information in the server. You will see entries for one EFD and 4 PTs. If you click on the EFD the statistics for that EFD will appear in the bottom panel of the window. The instantaneous throughput should be non-zero and the EventsIn and EventsOut should both be increasing. Here is an example of the IGUI in the "running" state for an EF localhost partition, showing the IS information for the EFD:

To terminate the partition, click on Stop to stop the SFI emulator sending dummy events. On Unload the EF applications will undo the actions they did on Configure. On Shutdown the EF applications are terminated. Click on Exit at the top right-hand corner of the IGUI to kill the IGUI and shutdown the Online SW infrastructure (various servers).

After exiting the log files for the applications are archived on the machine on which they ran in the directory:

/tmp/backup/${USER}

10. Timing information for this partition can be obatined by running play_daq with the time option, the no_gui option means that the the IGUI is not displayed:

play_daq $TDAQ_PARTITION time no_gui

One cycle (more can be specified I think ) of the DAQ FSM (including a Pause and Resume) is performed and the timing information is printed at the end:

Executing timing tests...
Booting DAQ ...
Waiting for RC to be in IDLE state ...
Start DAQ ...
Pause DAQ ...
Start DAQ ...
Stopping DAQ ...
Results: 0.217637 13.4832 2.67266 23.7074.
Stopping partition...
Stopping the DSA supervisor, this might take some time (30s timeout)!
4/4/05 12:54:49 INFO [dsa_stop_supervisor] DSA Supervisor safely stopped.
Stopped DSA Supervisor.
Stopped RC IS Monitor.
Stopped RC Tranistion Monitor.
Stopped MRS Audio Receiver.
Stopped MRS Receiver.
Problem stopping CDI via PMG (return value = 1)
Removed partition from Resource Manager.
PMG Agents killed all non-managed processes for partition part_efLocalhost.
Stopping all IPC based servers...
IPC servers stopped
*************************************************************
  dsashutdown_start_time 64
  shutdown_stop_time 68
  boot_start_time 21
Timing results:
---------------
Enter user comment for results log file (return = none):
sarahs test results
***
OnlineSW Timing tests for partition part_efLocalhost on ice17_13
Mon Apr  4 12:55:04 PDT 2005
      1 hosts defined in this database
3 run control applications used in this database
Command line parameters were: OBK:  CDI: yes
pmg testTime: 0 s
backend setup: 21 s
pure setup: 21 s
shutdown: 4 s
backend close: 2 s
boot: 3 s
cold start: 5 s
cold stop: 28 s
luke warm start: 2.67266 s
luke warm stop:  23.7074 s
warm start: 0.217637 s
warm stop:  13.4832 s
User comment:
sarahs test results
Test completed successfully in 70 seconds

The timing information is also written into the file:

/tmp/results/timing_test_result_list.out

on the machine on which you are running. Note that if you run a number of timing tests, the results will be concatenated in this file.

The correspondance of these times with the current DAQ FSM transitions is a little convoluted. The following diagram should help:

How to run EF standalone multihost partition with Combined Release

1. On WestGrid it is first necessary to reserve the nodes via PBS. An example script to run a "sleep" job on the requested number of WestGrid resources can be found in:

/global/home/caronb/PBS/atlasEF_res.sh

To submit the job:

qsub atlas_RF.sh

Where necessary the number of nodes required and the total wall-time for the reservation can be modified by changing the relevant lines in the script:

#PBS -l walltime=24:00:00,nodes=5:ppn=2

will reserve both processors in 5 nodes for 24 hours

sleep 86400

the submitted job will sleep for 24 hours.

To check the status of the jobs submitted by Bryan:

qstat -nu caronb

Once the status changes to "running" indicated by an R entry in the output, the list of allocated nodes on which the job is running will be shown, for example:

teva.weteva.westgrid.ubc:
                                                            Req'd  Req'd   Elap
Job ID          Username Queue    Jobname    SessID NDS TSK Memory Time  S Time
--------------- -------- -------- ---------- ------ --- --- ------ ----- - -----
1148531.teva.we caronb   ice      atlasEF_re    --    5  --    --  72:00 R 03:27
   ice17_13/1+ice17_13/0+ice17_9/1+ice17_9/0+ice17_1/1+ice17_1/0+ice12_5/1
   +ice12_5/0+ice1_8/1+ice1_8/0
1149095.teva.we caronb   ice      atlasEF_re    --    5  --    --  72:00 R 01:58
   ice21_12/1+ice21_12/0+ice15_13/1+ice15_13/0+ice8_9/1+ice8_9/0+ice7_3/1
   +ice7_3/0+ice4_3/1+ice4_3/0
1149097.teva.we caronb   ice      atlasEF_re    --    5  --    --  72:00 Q   --
    --
1149099.teva.we caronb   ice      atlasEF_re    --    5  --    --  72:00 Q   --
    --
1149100.teva.we caronb   ice      atlasEF_re    --    5  --    --  72:00 Q   --

The first two requests are running and the last three are queued. The names of the nodes on which the jobs are running are displayed, once per CPU.

Once the list of reserved nodes is known, the node name should be written into a .txt file (one might envisage a script to do this automatically). This file will act as input to the next level database configuration script (create_ef.bash). The purpose of the this script is to automate the generation of the .cfg which is input to the .cfg file.

Note from Bryan: it's probably more efficient to put in several small requests for reservation rather than one big one.

2. It gets a bit messy here, we have to run another setup script to create the correct environment in which to run create_ef.bash. You could in fact do this at the very beginning i.e. before all the steps listed in the previous section (running localhost partition). But I thought I'd introduce it here as it's a bit less confusing (I hope). First set an environment variable to point to the scripts directory (in my home directory on WestGrid) and then run the setup script:

ice17_13|gensetupScripts|47> setenv DF_SCRIPTS /global/home/swheeler/DC/scripts/v3r1p18
ice17_13|gensetupScripts|48> source $DF_SCRIPTS/setup.csh -o /global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed -t i686-rh73-gcc32-opt -r
DataFlow setup sucessful:
TDAQ_INST_PATH     => /global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed
DF_INST_PATH       => /global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed
CMTCONFIG          => i686-rh73-gcc32-opt
TDAQ_DB_PATH       => /global/home/swheeler/DC/installed/share/data:/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/share/data:/home/atdsoft/releases/tdaq-01-01-00/installed/share/data:/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/databases:/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/databases
DF_WORK            => /global/home/swheeler/DC
TDAQ_LOGS_PATH     => /global/home/swheeler/tdaq_logs
CMTROOT            => /afs/cern.ch/sw/contrib/CMT/v1r16
TDAQ_IPC_INIT_REF  => file:/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/com/ipc_root.ref

Basically this is another way of setting up the tdaq-01-01-00 release, but it also sets up some more environment variables needed by the create_ef.bash script. If you want to learn more about $DF_SCRIPTS/setup.csh, call it with -h switch (but at this stage you probably don't need to).

3. You are now ready to run create_ef.bash. Given the example shown in point 1, the contents of your machines text file should look like:

ice17_13|gensetupScripts|33> cat machines-ice.txt
ice17_13
ice17_9
ice17_1
ice12_5
ice1_8
ice21_12
ice15_13
ice8_9
ice7_3
ice4_3

The naming convention for the file is machines-<name of testbed>.txt If a machine becomes unavailable for some reason it can be commented out temporarily from the list by preceding the name with a '#' followed by a mandatory blank. So if say, ice17_9 and ice17_1 were unavaible the file would look like this:

ice17_13
# Following nodes are dead
# ice17_9
# ice17_1
ice12_5
ice1_8
ice21_12
ice15_13
ice8_9
ice7_3
ice4_3

Use the create_ef.bash to generate a .cfg file. The valid options for create_ef can be displayed using the -h switch:

# A script to generate EF partitions with SFI/O emulators
#
# ***** Note that for 'switch' options 0=false=off and 1=true=on *****
#
# -h                                        This help text
# -D                                        show DEBUG info                         [Quiet]
# -t <testbed>                              testbed                                 [32]
# -x                                        3 RC levels (one ctrl/EFD) [2 levels]   [0]
# -H <play_daq host(NOT in machine file)>   node starting play_daq                  [ice17_13]
# -c <EFDConfiguration>                     EFD configuration name                  [efd-cfg-in-ext-out]
# -e <#EFDs per node>                       # of EFDs per node                      [1]
# -f <EFDs per sub farm>                    #EFDs per sub farm                      [32]
# -s <PT result size>                       PT result size in bytes                 [1024]
# -b <PT burn time>                         PT burn time in us                      [1000000]
# -a <PT accept rate>                       PT  accept rate, 0.0-1.0                [0.1]
# -p <#PTs per EFD>                         # of PTs per EFD                        [4]
# -F <no of EF sub farms>                   # of EF sub farms                       [1]
# -I                                        One IS server per sub-farm              [0]
# -d <ISUpdateInterval in seconds>          update interval for IS objects          [20]
# -S                                        Stop after cfg, dont create xml         [0]
# -u                                        Use Dedicated EFD nodes from the hostlist [0]
#
# The name of the partition (and cfg) file reflects the settings for -F <F> -f <f> -e <e> -p <p> -x -I
# so that the cfg file would be <F>SF<f>x<e>xEFD<p>PT_<logical>RC_<logical>IS,
# where <logical> is the logical values 0 or 1, see top of this help text.
# For these settings the first base name would be 1SFx32x1EFDx4PT_0RC_0IS.cfg
#

An example:

ice17_13|gensetupScripts|70> ./create_ef.bash -t ice -f 4 -F 1
INFO create_ef.bash: *** One controller/EFSubfarm ***
./create_ef.bash: line 1: ANY_HOST[]: bad array subscript
INFO find_nodes(): A total of 10 hosts found in machines-ice.txt. Their use is:
   INFO find_nodes(): 10 ANY_HOST hosts found
INFO create_ef.bash: 10 nodes required for configuration
INFO create_ef.bash: part_1SFx4x1EFDx4PT_0RC_0IS: 1 subfarms, 4 EFD nodes, 1 EFDs/node, 4 PTs/EFD
INFO create_ef.bash: nbEFFarms=1, nbEFDnodes=4, TOTAL=24 applications on 10 nodes
INFO create_ef.bash: generating partition part_1SFx4x1EFDx4PT_0RC_0IS, gensetup output to /tmp/createDB_swheeler.out/cfg
reading node info from: /global/home/swheeler/DC/DBGeneration/v1r1p33/gensetupScripts/machines-ice.txt
found 0 nodes in file

Setting binary tag to i686-rh73-gcc32-opt.
Generate DCAppConfig and DC_ISResourceUpdate for EF-IS1.
Dummy SFI SFI-1 has address ice12_5:10000.
Dummy SFO SFO-1 has address ice1_8:11000.
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
Generate EF_SubFarm 1 with 4 EFDs and 16 PTs.
 Controller will run on ice17_1.
 IS server number is 1.
Generate top segment for EF.
 We have 1 sub-farms
 and 1 EF-IS servers
 and 2 other applications.
 Top EF controller on ice4_3.
Generating partition object part_efTest

Verification of timeouts.
  Parameters found:
Timeouts verified.

Verification of ROS memory clears.
  Parameters found:
ROS memory clears verified.

Done!

will use the list of machines in machines-ice.txt to create a 1 sub-farm system with 4 EFDs. The rest of the parameters will be the defaults shown above. Before the script creates the .cfg it will calculate the number of hosts required for the specified configuration and exit with an error if there are not enough in the machines.txt file (there is a bug in the script which slightly overestimates the number of machines required for a configuration - to be fixed). I have changed the create_ef script very slightly to always write the .xml partition file with the same name, part_efTest.data.xml (this is my personal bias when running partitions by hand). For automatic generation/running we probably should use the default name (which gives a name based on a summary of the actual configuration, in the above case it would be: part_1SFx4x1EFDx4PT_0RC_0IS, also see explanation in help text) as this makes it easier to trace logging information following automatic running (but this is for later). Note that there is an errors reported by the script which I suspect is due to the unusual naming scheme of the ice nodes: to be investigated. The resulting files look fine though.

4. Make sure your running environment is set correctly:

setenv TDAQ_PARTITION part_efTest
setenv TDAQ_DB_PATH $YOUR_DB_PATH:$TDAQ_DB_PATH
setenv TDAQ_DB_DATA $YOUR_DB_PATH/$TDAQ_PARTITION.data.xml
setenv TDAQ_IPC_INIT_REF file:$YOUR_IPC_DIR/ipc_init.ref

5. Run play_daq as described in the previous section. If you select the pmg panel after the Boot command you will now see all the pmg agents on the allocated nodes. Click on any agent to see the applications running on that node. See the screenshot below for an example:

6. One aim of the large scale tests is to see whether there are timing penalties imposed by having a 3-tier as opposed to 2-tier run control hierarchy. It is very simple for create_ef to switch between these two configurations using the -x switch. Without it (as above), generates a .cfg file for a 2-tier hierarchy, with it, as below generates, a 3-tier hierarchy:

ice17_13|gensetupScripts|105> ./create_ef.bash -t ice -f 4 -F 1 -x
INFO create_ef.bash: *** One controller/EFSubfarm AND one controller/EFD***
./create_ef.bash: line 1: ANY_HOST[]: bad array subscript
INFO find_nodes(): A total of 10 hosts found in machines-ice.txt. Their use is:
   INFO find_nodes(): 10 ANY_HOST hosts found
INFO create_ef.bash: 10 nodes required for configuration
INFO create_ef.bash: part_1SFx4x1EFDx4PT_1RC_0IS: 1 subfarms, 4 EFD nodes, 1 EFDs/node, 4 PTs/EFD
INFO create_ef.bash: nbEFFarms=1, nbEFDnodes=4, TOTAL=24 applications on 10 nodes
INFO create_ef.bash: generating partition part_1SFx4x1EFDx4PT_1RC_0IS, gensetup output to /tmp/createDB_swheeler.out/cfg
reading node info from: /global/home/swheeler/DC/DBGeneration/v1r1p33/gensetupScripts/machines-ice.txt
found 0 nodes in file

Setting binary tag to i686-rh73-gcc32-opt.
Generate DCAppConfig and DC_ISResourceUpdate for EF-IS1.
Dummy SFI SFI-1 has address ice12_5:10000.
Dummy SFO SFO-1 has address ice1_8:11000.
3rd level CTRL ['ice21_12']
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
3rd level CTRL ['ice15_13']
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
3rd level CTRL ['ice8_9']
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
3rd level CTRL ['ice7_3']
Generate 1 EFDs on 1 nodes.
SFIs are: SFI-1 SFOs are: SFO-1
Generate EF_SubFarm 1 with 4 EFDs and 16 PTs.
 Controller will run on ice17_1.
 IS server number is 1.
Generate top segment for EF.
 We have 1 sub-farms
 and 1 EF-IS servers
 and 2 other applications.
 Top EF controller on ice4_3.
Generating partition object part_efTest

Verification of timeouts.
  Parameters found:
Timeouts verified.

Verification of ROS memory clears.
  Parameters found:
ROS memory clears verified.

Done!

The difference in configuration is illustrated by the run control tree displayed in the Run Control panel of the IGUI for the 2 types of configuration:

2-tier hierarchy

3-tier hierarchy

7. Timing measurements can be made by hand as described before.

More on operational monitoring

1. When running multiple sub-farms it is useful to have a summary of the operational statistics for each sub-farm. If sub-farms are of equal size and consist of identical machines one would expect the summary information to be the same for each sub-farm. Statistics can be summed for all EFDs in each sub-farm by adding the Gatherer application to the partitions. As mentioned in the first section, the private binary patches for the Gatherer are already taken into account by defining the RepositoryRoot relationship in the partition and the Gatherer segment is also already included in the partition, but disabled. All that remains to be done is to enable the Gatherer segment.

2. The Gatherer segment can be enabled from the Segment & Resource panel of the IGUI. Start play_daq as before. Once the IGUI is displayed and before booting the partition select the Segment & Resource panel. The two top level segments are displayed (see below). The top-level EF segment which is enabled and the Gatherer segment which is disabled.

3. Enable the Gatherer segment by right-clicking on the word disabled and selecting "Enable segment gathLocalhost" from the menu. The change must then be saved to the database by clicking on the icon at the right-hand bottom corner of the panel (see below):

Note: The database will only be saved on the machine on which the the IGUI is running. This is OK if we are using the shared-file system. If/when we move to having local copies of the database file on all machines it will be necessary to copy the file to all machines (e.g. using rgang) once the save has been made. If DAQ Supervisor State gets stuck in "Configuring" after you have made the save, exit from the IGUI and start again. This should clear the problem - to be understood.

4. Now start the system as before, once in the "Running" state the run control tree should look like this:

5. The functionality of the Gatherer can be verified by looking at its output in the IS server. Start 2 copies of the IS server. For the first copy select the EF-IS1 server and choose one of the EFDs. For the second copy choose the Histogramming server, choose the gatherer entry for efSubFarm-1. The gatherer entry is a vector which contains a sum of all the numerical IS information for all the EFDs in the subfarm. In the example shown below there are 4 EFDs in the sub-farm, the summed output statistics displayed by the Gatherer are ~4 times those for the single EFD:

6. There is a text utility (is_ls) which may also to be used to display IS information:

ice17_13|gensetupScripts|53> is_ls -h
Usage: is_ls [-p partition-name] [-n server-name] [-R regular-expression]
             [-v] [-N] [-T] [-D] [-H]
Options/Arguments:
        -p partition-name       partition to work in.
        -n server-name          server to work with.
        -R regular-expression   regular expression for information to be
                                printed.
        -v                      print information values.
        -N                      print names of information attributes (if
                                available).
        -T                      print types of information attributes.
        -D                      print description of information attributes
                                (if available).
        -H                      print information history (if available).
Description:
        Lists Information Service servers in the specific partition
        as well as the contents of the servers

The following command will show the Throughput value for all EFDs running in the partition:

 ice17_13|gensetupScripts|54> is_ls -p $TDAQ_PARTITION -n EF-IS1 -v -N | egrep 'Throughput|EFD'
    EF-IS1.efSubFarm-1.EFD-2.Stats.efdStats <5/4/05 10:37:44> <EFD>
      Throughput        888.917
    EF-IS1.efSubFarm-1.EFD-1.Stats.efdStats <5/4/05 10:37:44> <EFD>
      Throughput        886.513
    EF-IS1.efSubFarm-1.EFD-4.Stats.efdStats <5/4/05 10:37:44> <EFD>
      Throughput        889.816
    EF-IS1.efSubFarm-1.EFD-3.Stats.efdStats <5/4/05 10:37:44> <EFD>
      Throughput        883.016

This will show the output from the Gatherer:

ice17_13|gensetupScripts|65> is_ls -p $TDAQ_PARTITION -n Histogramming -v -N
  Server "Histogramming" contains 1 object(s):
    Histogramming.gatherer_ice17_13_199641.efSubFarm-1.Stats.efdStats <5/4/05 10:59:45> <vector>
    1 attribute(s):
                        40, 48388, 48359, 3521.7, 16

7. A graphical display of IS information can be obtained using the islogger (see documentation) utility:

ice17_13|gensetupScripts|67> islogger &
[2] 21626

The information to be displayed can be chosen from the hierarchy of IS information:

And a continually updating graph obtained (in this case Throughput for EFD-1):

It is possible to display simultaneously plots of several different parameters.

Drawbacks are:

It is a java application and running remotely on WestGrid is very slow, currently too slow to effectively display multiple plots. It may be possible to run the islogger locally. Would need reference to global ipc_server running on WestGrid plus local installation of tdaq-01-01-00 and java - this could already be investigated.
The islogger is unable to understand the vector output from the Gatherer. Therefore it cannot yet be used to display summary subfarm statistics - a work around needs to be found. Hopefully in time for second phase of WestGrid tests in May.

Running more than 1 EFD per node

When only limited hardware resources are available, for the purposes of large scale testing it is useful to be able to run more than 1 EFD (and associated PTs) per node in order to simulate a larger system. (Note that running more than 1 EFD per node is not currently envisaged as a requirement for the final system). When running multiple EFDs the communication between each set of EFD and its PTs must be implemented via a uniquely named (for that node) socket and shared heap. This is all taken care of by the database generation scripts. It wasn't quite implemented before I came here - I had to make small changes to both the create_ef.bash script and the gen_EFsuite.py module of gensetup. Do:

diff gen_EFsuite.py gen_EFsuite.py.05Apr05 
diff create_ef.bash create_bash.01Apr05

to see the changes made. Currently the scripts support a maximum of 4 EFDs per node. This is probably a sensible limit in order not to run into problems with resources. An example of call to create_ef is shown below where the number of EFDs per node is set to 4:

./create_ef.bash -t ice -e 4 -f 32 -F 2

This command will create a 2 sub-farm system, with 32 EFDs per sub-farm and 4 EFDs per node. Therefore each sub-farm consists of only 8 nodes. Multiple EFDs per node are specified in the .cfg file by quoting the required number in square brackets (with no blank between EFD and the leading bracket), for example:

EF EFD[4] ice15_13 efd-cfg-in-ext-out 4 SFI-1 SFO-1

If only 1 EFD is required either the number in square brackets is set to 1 or they are omitted altogether, e.g.

EF EFD ice15_13 efd-cfg-in-ext-out 4 SFI-1 SFO-1

An example of the pmg panel when running 4 EFDs per node is shown below:

When running large configurations and/or making timing tests in order to avoid possible problems with the shared file system the partition file (and the files which it includes) should be copied to the local disk of each machine in the configuration. Not only is it necessary to copy the .xml file created by create_ef.bash but all the .xml files to which it refers (these are in the tdaq release, also the gatherer .xml files in my home director. Maybe Bryan could investigate how best to go about making the copy (with rgang etc.)

When working with local installations of the database remember that your TDAQ_DB_DATA and TDAQ_DB_PATH environment variables must be reset accordingly.

Use of tools rgang for copying files, dvs for log file browsing etc etc.

1. Application log files can be browsed using the DVS log browsing facility. This can be started from IGUI by clicking on the DVS button at the top or it can be startetd by hand:

ice17_13|gensetupScripts|38> dvs_gui $TDAQ_DB_DATA -p $TDAQ_PARTITION

It provides a GUI (see below) to display log files from all the machines in the configuration. It avoids the need to ssh to each machine in order to find the files:

Note: You will not be able to start the DVS browser if you setup your environment using $DF_SCRIPTS/setup.csh. For some reason it will be trying to look for something on afs and will fail - to be investigated. If you want to use the DVS browser I suggest you start a separate session on the same machine, run the tdaq setup script directly:

source /global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/setup.csh

and the run the dvs_gui as shown above. This should work.

2. rgang is a useful tool for managing files, issuing commands etc. on multiple machines, in a fast, parallel fashion. To use it on WestGrid, put it in your path:

ice17_13|swheeler|50> setenv PATH /global/home/caronb/atlas/tools/rgang/bin:${PATH}

For more information check the README files in /global/home/caronb/atlas/tools/rgang. An example, showing how to clear all log files for the part_efTest partition is the following:

ice17_13|gensetupScripts|57> rgang.py -x machines-ice.txt "rm -r /tmp/part_efTest"
ice17_13= ice17_9= ice17_1= ice12_5= ice1_8= ice21_12= ice15_13= ice8_9= ice7_3= ice4_3= ice17_13|gensetupScripts|58>

The -x option stops the printing of errors of the form:

ice32_11= Warning: No xauth data; using fake authentication data for X11 forwarding.
/usr/X11R6/bin/xauth:  error in locking authority file /global/home/swheeler/.Xauthority

We should copy over the scripts directory from Antonio Sobreira's area - there's some useful stuff /cluster/home/sobreira/scripts. For example, in case of a messy shutdown of a configuration, their may be leftover applications running on some machines. The "show" script can be called with rgang to display all the processes on all the machines in the specified file that belong to a particular user:

rgang.py -x machines-ice.txt "/global/home/swheeler/scripts/show -u swheeler" > procs

Currently the full path name to the script has to be given, even if the scripts directory is put in PATH in the shell login. There is a bug somewhere which Antonio is investigating.

rgang has also been modified so that it understands the convention used by create_ef by which machine names are commented out in the machines.txt file (# followed by a blank). If the name is commented out, rgang will not attempt to run the command on that machine.

3. For cleaning up partitions (i.e. killing absolutely everything including OnlineSW infrastructure servers etc.) we have the "nuke" utility from Gokhan (he developed this for use in the CTB). It is currently installed in my home directory: /global/home/swheeler/tools and is invoked thus:

./nuke $TDAQ_DB_DATA

Internally it runs "fuser" which is in the same directory. I'm not sure how it works, but it does seem very effective. One possible drawback is that it is looking for (I think) binaries from the standard installation area. Anything in external private patches won't be considered. At the moment this is just the Gatherer application which sits in the InstallArea of my home directory on WestGrid.

If you are in any doubt about orphan applications being left over after a messy shutdown I urge you to use this!!! We wasted a huge amount of time last March due to messy shutdowns and lack of tools to tidy up. It does kill all pmg_agents which will then need to be started next time you run play_daq (tedious).

4. More... I'm sure

Automatic Running

Still to be tested

Analysis of Results from automatic running

There are scripts/procedures available to automate analysis. I have not had time to become familiar with any of this yet. Will look into this on returning to CERN.

Results from WestGrid first phase

The actual CPU configuration on which I made the tests is here. Obtained by running:

qstat -nu caronb

This was used to generate machines-wg.txt which was used as input for the 2 tests listed below. Note that when both CPUs of the same processing node have been allocated it is only listed once in the machine file.

Also there is a bug in PMG IGUI panel - to be investigated. The naming scheme for the ice nodes on WestGrid is the following:

icexx_yy

where xx is the crate number (1-50?) and yy is the node number in the crate (1-14). When there is a process started on node:

icexx_1

it appears also to be started on nodes:

icexx_10
icexx_11
icexx_12
icexx_13
icexx_14

It is only a display problem. The actual running configuration is correct. Therefore nodes with names of the form icexx_1 can be included in the configuration.

First Configuration

Generated thus:

./create_ef -t wg -p 1 -f 10 -F 10

Produces a configuration with 10 sub-farms, 10 nodes per sub-farm, 1 EFD, 1 PT per node.

.cfg file gatherer_results runcontrol_timing screenshot

Here is an extract from the gatherer results file. The file contains the sum of the EFD statistical information for each sub-farm, at approximately 1 minute intervals. The sum for each sub-farm is shown as a vector of values. From left to right the values are: IS update time (not very sensible to sum this, but it just happens), Events In, Events Out, Instantaneous Throughput, Number of PTs connected to an EFD. Note: there is a bug with summing for efSubFarm-1. What is displayed is actually the sum of efSubFarm-1 + efSubFarm-10. So the Troughput for efSubFarm-1 is really: 7359.06 - 3386.42 = 3972.64

    Histogramming.gatherer_ice35_11_159269.efSubFarm-10.Stats.efdStats <7/4/05 11:59:55> <vector>
    1 attribute(s):
                        100, 92557, 92497, 3386.42, 10
    Histogramming.gatherer_ice35_11_159264.efSubFarm-1.Stats.efdStats <7/4/05 11:59:55> <vector>
    1 attribute(s):
                        200, 200057, 199939, 7359.06, 20

Second Configuration

It was possible to run a larger configuration (2 PTs per node) once a problem with the rdb_servers had been fixed. The following command line parameters:

-ORBthreadPerConnectionPolicy 0 -ORBmaxServerThreadPoolSize 10

had to be added to each instance of the rdb_server in the daq setup segment in:

/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/databases/daq/segments/setup.data.xml

Generated thus:

./create_ef.bash -t wg -p 2 -f 10 -F 10

.cfg file gatherer_results runcontrol_timing screenshot - looks exactly the same as above

Overnight Running

There was a problem on WestGrid that any processes not started via PBS would get killed at regular 6 hour intervals. Once this was stopped for processes run by my account I was able to run the above (second configuration) successfully overnight. An SFO died but was restarted. The controller for subfarm-1 died too but this did not affect the EF in the running state. Gatherer results and a screenshot can be found by following the links below:

gatherer_results screenshot

Third Configuration

Once the TDAQ_IPC_TIMEOUT parameter was increased it was possible to run an even larger system (4 PTs per node). The default value for TDAQ_IPC_TIMEOUT is 30 seconds and the Configure step was timing out. The timeout was increased for both PTs and the LocalControllers (lowest level of Run Control). I suspect it may only have been necessary to make the change for the LocalControllers. I had to change gen_db.sh (to include a file which defines the environment variable and set it to 100000ms i.e. 100 seconds) and gen_EFsuite.py (in gensetupScripts) so that the controller and PT objects point to the environment variable. I kept the command line options for the rdb_servers the same as they are described above.

The configuration was generated thus:

./create_ef.bash -t wg -p 4 -f 10 -F 10

This configuration ran correctly and the output statistics can be seen following the link below. Note: the configuration step takes around 60 seconds. I was unable to make timing measurements. Note that the subfarm controllers give warnings at startup. Thiss is only due to the fact that this timeout has been changed. Running play_daq with the time option, did not work due to timeouts. I expect there is a timeout option I have to increase somewhere - to be investigated.

gatherer_results screenshot (yes it really does work!)

Note: This test has shown that it is possible to run a "standard" EF configuration on a system consisting of 100 separate WestGrid processing nodes.

Next Steps

I subsequently tried running larger configurations by doubling up EFDs per node. For instance, I tried a configuration with 15 subfarms:

./create_ef.bash -t wg -e 2 -p 4 -f 10 -F 10

This and other larger configurations are failing. It's possible (?) I'm beginning to run into file descriptor problems. For instance running the command Bryan provided me:

ls -l /proc | grep $USER | awk -F " " '{ print "ls -R /proc/"$9"/fd/ | wc -l" }' > nfile ; source ./nfile

I see that some processes are using over 900 file descriptors. Once the number of file descriptors on WestGrid has been increased you could try running this configuration again and see if it is any better.

However, I think I am also beginning to see scaling problems due to the LocalController - the problems appear to get worse with the number of applications the LocalController has to manage. I'm also not familar enough with the the tuning required in the OnlineSW for large scale testing. I exchanged emails with Igor and Sergeui at CERN today regarding the tuning - I tried to implement what they suggested (the modified setup segment I created is here, email from Igor here) but it was not helping - and in fact some of the time it appeared to make things worse, e.g. the boot step timing out.

If you wish to try with the modified setup segment, one way to do it is generate your database as above and then edit the .xml file changing the line:

 <file path="daq/segments/setup.data.xml"/>

to:

 <file path="/global/home/swheeler/setup.data.xml"/>

assuming you are going to pick up my modified version of the file. Another way is to copy it (making sure you have saved a copy of the original first!) to the standard release area:

/global/home/caronb/atlas/tdaq-release/tdaq-01-01-00/installed/databases/daq/segments/setup.data.xml

Other issue: is it possible to run the islogger locally? Need a local installation of java and the tdaq-01-01-00 release and a reference to the TDAQ_IPC_INIT_REF of the global ipc_server on WestGrid. To test this you would run the setup.csh for the release. Set the TDAQ_IPC_INIT_REF and run it with the command is_logger.

Final Note: Both the OnlineSW tuning and LocalController issues I will follow up in detail at CERN.

Another Final Note: Roger was saying that it might also be feasible to run some tests on THOR provided we configure the dummy PTs not to take up too much CPU time.

Last update by S.Wheeler 15th April 2005