Project Configuration File

From Unofficial BOINC Wiki

Jump to: navigation, search

Contents

[edit] General

A BOINC Powered Project is described by a configuration file named config.xml in the Project Directory. A config.xml file looks like this:

<boinc>

  <config>
    <host>                  project.hostname.ip   </host>
    <db_name>               databasename          </db_name>
    <db_host>               database.host.ip      </db_host>
    <db_user>               database_user_name    </db_user>
    <db_passwd>             database_password     </db_passwd>
    <shmem_key>             shared_memory_key     </shmem_key>
    <download_url>          http://A/URL          </download_url>
    <download_dir>          /path/to/directory    </download_dir>
    <download_dir_alt>      /path/to/directory    </download_dir_alt>
    <uldl_dir_fanout>       N                     </uldl_dir_fanout>
    <upload_url>            http://A/URL          </upload_url>
    <upload_dir>            /path/to/directory    </upload_dir>
    <cgi_url>               http://A/URL          </cgi_url>
    <stripchart_cgi_url>    http://A/URL          </stripchart_cgi_url>
    <log_dir>               /path/to/directory    </log_dir>

    [ <disable_account_creation/>                                                ]
    [ <show_results/>                                                            ]
    [ <one_result_per_user_per_wu/>                                              ]
    [ <workload_sim>0|1</workload_sim>                                           ]
    [ <max_wus_to_send>                  N    </max_wus_to_send>                 ]
    [ <min_sendwork_interval>            N    </min_sendwork_interval>           ]
    [ <daily_result_quota>               N    </daily_result_quota>              ]
    [ <ignore_delay_bound/>                                                     ]
    [ <locality_scheduling/>                                                     ]
    [ <locality_scheduling_wait_period>  N    </locality_scheduling_wait_period> ]
    [ <min_core_client_version>          N    </min_core_client_version          ]
    [ <choose_download_url_by_timezone/>                                         ]
    [ <cache_md5_info/>                                                          ]
    [ <min_core_client_version_announced> N </min_core_client_version_announced> ]
    [ <min_core_client_upgrade_deadline>  N </min_core_client_upgrade_deadline>  ]
    [ <choose_download_url_by_timezone>  N </choose_download_url_by_timezone>  ]
    [ <cache_md5_info>  N </cache_md5_info>  ]
    [ <nowork_skip>  N </nowork_skip>  ]
    [ <sched_lockfile_dir>  path </sched_lockfile_dir>  ]


    <!-- optional; defaults as indicated: -->
    <project_dir>  ../      </project_dir>  <!-- relative to location of 'start' -->
    <bin_dir>      bin      </bin_dir>      <!-- relative to project_dir -->
    <cgi_bin_dir>  cgi-bin  </cgi_dir>      <!-- relative to project_dir -->
  </config>

  <daemons>
    <daemon>
      <cmd>          feeder -d 3   </cmd>
      [ <host>       hostname.ip   </host>     ]
      [ <disabled>   1             </disabled> ]
    </daemon>
    <daemon>
    ...
    </daemon>
  </daemons>

  <tasks>
    <task>
      <cmd>          get_load        </cmd>
      <output>       get_load.out    </output>
      <period>       5 min           </period>
      [ <host>       host.ip         </host>       ]
      [ <disabled>   1               </disabled>   ]
      [ <always_run> 1               </always_run> ]
    </task>
    <task>
      <cmd>      echo "HI" | mail root@example.com     </cmd>
      <output>   /dev/null                             </output>
      <period>   1 day                                 </period>
    </task>
    <task>
    ...
    </task>
  </tasks>

</boinc>

[edit] Configuration Element Definitions

The general project configuration elements are:

host name of project's main host, as given by Python's socket.hostname(). BOINC Server-Side Daemon Programs and Tasks run on this host by default.
db_name Database name
db_host Database host machine
db_user Database user name
db_passwd Database password
shmem_key ID of scheduler shared memory. Must be unique on host.
download_url URL of data server for download
download_dir absolute path of download directory
download_dir_alt absolute path of old download directory (see Hierarchical Upload/Download Directories)
upload_url URL of file upload handler
uldl_dir_fanout fan-out factor of upload and download directories (see Hierarchical Upload/Download Directories)
upload_dir absolute path of upload directory
cgi_url URL of scheduling server
stripchart_cgi_url URL of stripchart server
log_dir Path to the directory where the assimilator, feeder, transitioner and

cgi output logs are stored. This allows you to change the default log directory path. If set explicitly, you can also use the 'grep logs' features on the administrative pages. Note: enabling 'grep logs' with very long log files can hang your server, since grepping GB files can take a long time. If you enable this feature, be sure to rotate the

logs so that they are not too big.
sched_lockfile_dir Directory where scheduler lockfiles are stored. Must be writeable to the Apache user.
workload_sim Do a simulation, based on current client workload, in deciding whether a job's deadline can be met.

[edit] Web Site Options

The following control features that you may or may not want available to Participants.

disable_account_creation If present, disallow account creation
show_results Enable web site features that show results (per user, host, etc.)


[edit] Controlling Results

The following settings control the way in which Results are scheduled, sent, and assigned to Participants and the Participant's Hosts.

one_result_per_user_per_wu

If present, send at most one Result of a given Work Unit to a given Participant. This is useful for checking accuracy/validity of Results. It ensures that the results for a given Work Unit are generated by different Participants.

If you have a Validator that compares different Results for a given Work Units to ensure that they are equivalent, you should probably enable this. Otherwise, you may end up Validating Results from a given Participant with results from the same Participant.
max_wus_to_send The Maximum number of Results that are sent per Scheduler RPC. This helps to prevent those Participant's Hosts with trouble from getting too many results and trashing them. But you should set this large enough so that a host which is only connected to the net at intervals has enough work to keep it occupied in between connections.
min_sendwork_interval The Minimum number of seconds to wait after sending Results to a given Participant's Host, before new Results are sent to the same Host. This helps to prevent those Hosts with download or application problems from trashing lots of Results by returning lots of Error Results. But don't set it to be so long that a host goes idle after completing its work, before getting new work.
daily_result_quota The Maximum number of Results (per CPU) sent to a given Participant's Host in a 24-hour period. Helps prevent hosts with download or application problems from returning lots of error results. Be sure to set it large enough that a host does not go idle in a 24-hour period, and can download enough work to keep it busy if disconnected from the net for a few days. The maximum number of CPUS is bounded at four.
ignore_delay_bound

By default, results are not sent to hosts too slow to complete them within delay bound.

If this flag is set, this rule is not enforced.
locality_scheduling

When possible, send work that uses the same files that the Participant's Host already has. This is intended for projects which have large Work Unit Data Files, where many different Work Units use the same data file. In this case, to reduce download demands on the Data Server, it may be advantageous to retain the data files on the hosts, and send them work for the files that they already have.

See Locality Scheduling.
locality_scheduling_wait_period This element only has an effect when used in conjunction with the previous Locality Scheduling element. It tells the Scheduler to use 'trigger files' to inform the Project that more work is needed for specific files. The period is the number of seconds which the Scheduler will wait to see if the Project can create additional work. Together with project-specific Daemons or scripts this can be used for 'just-in-time' Work Unit creation. See Locality Scheduling.
min_core_client_version If the scheduler gets a request from a Participant's Host with a version number less than this, it returns an error message and doesn't do any other processing.
choose_download_url_by_timezone

When the Scheduler sends work to a Participant's Host, it replaces the download URL appearing in the data and executable file descriptions with the download URL closest to the host's timezone. The Project must provide a two-column file called download_servers' in the Project Root Directory.

This is a list of all Download Servers that will be inserted when work is sent to Participant's Hosts. The first column is an integer listing the server's offset in seconds from UTC.

The second column is the server URL in the format such as http://einstein.phys.uwm.edu. The download servers must have identical file hierarchies and contents, and the path to file and executables must start with '/download/...' as in 'http://einstein.phys.uwm.edu/download/123/some_file_name'.
cache_md5_info When creating work, keep a record (in files called foo.md5) of the file length and md5 sum of data files and executables. This can greatly reduce the time needed to create work, if (1) these files are re-used, and (2) there are many of these files, and (3) reading the files from disk is time-consuming.
min_core_client_version_announced Announce a new version of the BOINC Daemon, which in the future will be the minimum required version. In conjunction with the next tag, you can warn Participants with version below this to upgrade by a specified Deadline. Example value: 419.
min_core_client_upgrade_deadline

Use in conjunction with the previous tag. The value given here is the Unix epoch returned by time(2) until which hosts can update their BOINC Daemon. After this time, they may be shut out of the Project.

Before this time, they will receive messages warning them to upgrade.
nowork_skip

If the Scheduling Server has no work, it replies to RPCs without doing any database access (e.g., without looking up the user or host record).

This reduces the Database load, but it fails to update Preferences when users click on Update.

Use this setting if your Database Server is overloaded.

[edit] Project Tasks and Daemons

Tasks are periodic, short-running jobs. <cmd> and <period> are required. OUTPUT specifies the file to output and by default is COMMAND_BASE_NAME.out. Commands are run in the <bin_dir> directory which is a path relative to <project_dir> and output to <log_dir>.

The BOINC Server-Side Daemon Programs are continuously-running programs. The process ID is recorded in the <pid_dir> directory and the process is sent a SIGHUP in a DISABLE operation.

Both Tasks and BOINC Server-Side Daemon Programs can run on a different Host (specified by the <host> element). The default is the project's main host, which is specified in config.host A BOINC Server-Side Daemon Program or Task can be turned off by adding the <disabled> element. As well, there may be some tasks you wish to run via cron regardless of whether or not the project is enabled (for example, a script that logs the current CPU load of the host machine). You can do so by adding the <always_run> element (<disabled> takes precedence over <always_run>).

[edit] UCB Source

[edit] Copyright ©

  • 2005 University of California
  • 2005 Paul D. Buck

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

Personal tools