Usage of the Online Software - hints and instructions for other systems who want to use the Online Software on the IT cluster

The Online Software is installed on all the cluster machines. It can be found in /pool/online/online-00-21-01 and is Read/Write for the atlonl AFS account and Read for anyone else.

This implies that DC (and I suggest for atlonl too) should not write into that area (one should take particular care of the log directories which can be set in the .onlinerc file).

This also implies that the DC have their own copy of the databases files (or use the feature of TDAQ_DB_PATH that it is a path and can have multiple values seperated by colons), TDAQ_IPC_INIT_REF, etc... in their own area.

The appropriate JDK is also available for use. It is installed in /pool/online/jdk1.4.2 on all the machines. The online sw is configured in the setup script to use the jdk from that location.

The Online have made available their scripts in the release (previous years they were not). They are located under TDAQ_INST_PATH/installed/share/bin and will be in your PATH when you have sourced the DF or Online setup scripts. Other potentially usefull info will be at the following location: /afs/cern.ch/user/a/atlonl/public/lst04/gen_info/. It will contain the list of machines which we have access to (hosts_*.txt) and on which the onlinesw is installed and the info about these machines (machine-info_*.csv and in HTML format machine-info_*.html). There is also an example .onlinerc file where the log directories are set to a local dir.

The scripts available from the release are described shortly below. Otherwise information can be requested from Marc Dobson.

acquire_host_keys: script to get all host keys in the users known_hosts file so that you are never prompted to add a host key. Obsolete if you use the config described in the next section all the time. This script relies on the fact that the run_on script is in the path. (which is correct if the Online Software setup script has been sourced).
machine_list_from_ranges [<base_name> <init_num> <fin_num>] ...: Returns a list of machine names (one per line) based on the <base_name> and the <init_num> and <fin_num> begin and end numbers. e.g. base_name=tbed0 init_num=101 fin_num=110, will give you the names: tbed0101, tbed0102, etc..., tbed0110. It is possible to specify as many sets of base_name and numbers you want on the command line. The output can be re-directed to a file to be used by other scripts (e.g. run-on.bash)
print_system_info: prints some info about the machine you are on, each info is seperated by a comma. In order the info that is printed is: machine name, number of processors, nominal CPU speed, cpu type, exact cpu speed reported by /proc/cpuinfo, total bogomips, nominal memory, exact memory reported by /proc/meminfo, total swap space, kernel version, system library version (glibc)
gather_system_info: script which uses the run-on script and the print_system_info script to retrieve the system info of all the machines specified in the appropriate file (-h option to get the usage information). Some paths need to be changed in the script if you plan to use it. It adds the headers for the different info returned by the print_system_info script and outputs a csv (comman seperated value) file which can then be imported into an application usc as Excel or StarOffice. An example of this is the machine-info.csv and xls files mentioned earlier.
run-on: a script to run a command on a list of machines specified in a file which you mention on the command line. To get the usage info use the -h option. Internally it by default uses an expect script which gives you more friendly output and suppresses some warning messages. If you have a command to execute which could take some time, use the -b and -e options which will try to pipeline the command to all the machines (unlike the default or the +e option which issues the command on each machine and waits for it to complete before going on to the next machine). This script relies on the fact that the run_on_expect.exp script is in the path. (which is correct if the Online Software setup script has been sourced).
run_on_expect.exp: is the expect script used by the run-on script.
play_daq_n_times: a script to run play_daq 'n' times for every partition listed in a partitions file, possibly repeating indefinitely from the start of the partition list when it has reached the end. Read the help information given with the -h option for more info. An example of its use is for running over a list of partitions 20 times each, doing a user defined timing script. The command line would look like:
play_daq_n_times -n 20 -p partition_list.txt user --usercycle user_timing_script.txt no_obk

For setting up SSH to work without prompting you to add a new host key copy the file /afs/cern.ch/user/a/atlonl/public/lst04/gen_info/ssh_user_config to ~/.ssh/config in the account you are using for your tests. Note though that a message is printed to indicate that it has added a host key to the central file.

To execute commands on all the lxshare machines (includes the tbed machines) there is also the wgsh command supplied by IT. The only disadvantage with this command is that it replicates the command on ALL machines in the cluster not just those which we have been authorised to use!!! Hence writing my own script.