Work Scheduler

From Unofficial BOINC Wiki

Jump to: navigation, search

Contents

[edit] General

The Work Scheduler is that part of the BOINC Daemon that decides which item of work that is in the Work Buffer will be run on the available Central Processing Units (CPUs).

The addition of CPU Scheduling Policy and Work-Fetch Policy to be used by the BOINC Daemon, started with version 4.36.


[edit] Terminology

[edit] Debt

How much CPU Time is 'owed' to a Project in order to bring it into parity with other Projects, based on the Participant's Resource Share settings. Debt amounts are maintained in seconds.

Negative debt for a project means that it has used more than its share of CPU Time recently, and therefore owes the other projects some CPU time.

[edit] Short Term Debt

Short term debt drives which project gets the CPU next if the CPU is in Highest Debt First mode. All projects with no work on the computer have a short term debt of 0. Short term debt is shifted for all projects so that the average debt for all projects that have active work on the Host is 0.

[edit] Long Term Debt

Long term debt is part of the decision of which project to download from next. If the long term debt is less than negative Queue Size no work request will be made of that project unless a CPU is idle (Versions 4.x and 5.x) or the queue is not full (Versions 6.x+. Long term debt is shifted such that the mean is always 0 (the sum of all Long Term Debt values is 0).

[edit] Deadlines

The Result Deadline is the time by which it must be completed and reported. The Deadlines are set by individual BOINC Powered Projects. Work that is returned after the Deadline may or may not have any value to the Project, and it may or may not be granted Credit, even if it matches the other Results that were returned on time.


[edit] Goals of the Work Scheduler

The goals of the Work Scheduler and Work-Fetch Policies are:

There may be times when fetching more work would result in missed Deadlines.

The Work Scheduler has two modes, normal (or Highest Debt First) mode and Earliest Deadline First (or "panic" mode in popular parlance) mode.

In Highest Debt First mode the Work Scheduler does "round-robin" scheduling among Results, attempting to honor Resource Shares.

In Earliest Deadline First mode, the Work Scheduler runs Results with the nearest deadline. This allows the BOINC Client Software to meet deadlines that would otherwise be missed. Earliest Deadline First mode is entered if either a Work Unit has a Deadline that is very near, the Result is due in less than twice the "connect to" time, or the Work Scheduler determines that one of the Results will nearly not be completed on time.

The Work Scheduler decides which mode it is in each time a Result is completed, when the end of the Participant specified work period is reached, when new work is downloaded, or when the Participant takes some action through the BOINC Manager.

The Work-Fetch Policy has three modes: no download, download OK, and download required.

  • No downloads ("No Work Fetch") means no work will be downloaded from any Project.
  • In the downloads OK mode, projects with high long term debt can download work, but projects with very low long term debt cannot. Very low long term debt projects have probably recently caused a panic mode, or they have been dominating the work on the computer in some other way.
  • Download required means that there are not enough results to keep all of the CPUs busy or there is not enough work to get to the next time that you have indicated that you are likely to connect. Work should be retrieved from someplace even if it means that work is retrieved from a project with negative long term debt.

The Work Scheduler and the Work-Fetch Policy are independent of each other. Therefore, it is possible to be in panic mode, and still download more results.

[edit] BOINC Work Fetch and CPU Policy Design

[edit] Problem

The old Work-Fetch Policy and Work Scheduler policy can miss deadlines for a number of reasons. The computer is slow, too many projects are attached, a short deadline work unit is downloaded, or a work unit with a tight deadline is downloaded.

There is a difference between short deadlines and tight deadlines:

  • A short deadline is a deadline that would be missed because the debt did not increase to a level where the first time slice was given to the project before the work unit expired. For example the early work units from Pirates had a one hour deadline, and the CharMM work units from Protein Predictor have a 24 hour deadline.
  • A tight deadline is one where the time to crunch the work unit is a large fraction of the deadline. For example on one of my machines a Sulfur Cycle work unit from Climate Prediction.Net is estimated to take 145 days and has a 180 day deadline, which is more than half of the processing time for the CPU for the duration of the work. In this case the deadline is not short, but it is tight.

With the current policies, the slower the computer, the lower the fraction of time that the computer is on, and the tighter the deadlines for the projects that are attached to that computer, the fewer projects that computer may successfully attach to. In the case of the slowest computers the number may be one, even though there are several which could be run successfully individually.

[edit] Design goals

In order to keep the work the computer is running as varied as possible, each computer should be able to attach to as many projects as the Participant desires if that computer is capable of running each of the projects in isolation. The combination of the Work-Fetch Policy and the Work Scheduler should not download too much work for the CPU to complete on time, and should attempt to complete all work that is downloaded on time. Faster computers will be able to keep work from more different projects on hand than slower computers.

[edit] Design of the Work Scheduler

The Work Scheduler has two modes, normal and panic. In normal mode, the Work Scheduler uses the current debt calculations to attempt to balance the resource share with the work on hand. For some Participants with just a few projects and balanced Resource Shares, they may never leave this mode. In the panic mode, the Work Scheduler processes up the results with the nearest deadlines. It is possible to switch into the panic mode at any time, but the Work Scheduler will finish the current time segment processing the current result. It is only possible to switch out of the panic mode when the CPUs would be rescheduled. Having the Work Scheduler in panic mode is one of the drivers of the Work-Fetch Policy.

[edit] General Solution

Many people become concerned when their computers go into Earliest Deadline First mode. This is generally not going to cause a result to be returned late, nor will it destroy your long term Resource Share. To ensure your results are returned by their deadlines, even if you are only attached to one BOINC Powered Project, be sure to set your "connect to" less than the shortest Deadline of the BOINC Powered Projects you are attached to, preferably 1/2 of that Deadline or less.

If you want to reduce the chances that the Work Scheduler enters Earliest Deadline First mode, you must keep your "Connect to" settings below an even lower value. The "quick and dirty" way of finding this value is by dividing the shortest deadline by the number of BOINC Powered Projects you are connected to.

For example: You are connected to SETI@Home, Climateprediction.net (CPDN), Einstein@Home and Predictor@Home. Predictor@Home has the shortest Result Deadline with a length of 7 days. So, with your 4 projects, your maximum connect to value is (Result Deadline / # of Projects):

7/4 = 1.75 days

Similarly, with 2 or 3 projects (that includes Predictor@Home), the suggested maximum connect values are:

  • 7/3 = 2.33 days
  • 7/2 = 3.50 days

Do not exceed a Work Buffer size of 2.5 days in any of the Project's General Preferences.

This "rule of thumb" should work regardless of Resource Share, but please note that there is always a situation where a specific set up will go into panic mode.

[edit] Details - Short deadline - Less than 1 day until deadline

Result deadline is less than 24 hours or has already passed. This triggers the Work Scheduler into panic mode.

[edit] Details - Short deadline - Deadline is before reconnect time

Result deadline is within 2 * the connect rate (Connect Every x Days). This triggers the Work Scheduler into panic mode.

[edit] Details - CPU queue overload - Computer is overcommitted

  • Version 4.45: Sort the Work Units by Deadline, with the earliest Deadline first. If at any point in this list, the sum of the remaining processing time is greater than 0.8 * up_frac * time to deadline, the CPU queue is overloaded. This triggers both no work fetch and the Work Scheduler into Earliest Deadline First mode. If you don't follow that (I didn't) see example under Computer is overcommitted.

[edit] Summary

Because of the interaction of the "Connect every" setting and the Work Scheduler, we recommend the following:

  1. Don't set the "Connect every" setting higher than about 2.5 days if at all possible, or be very careful about which Projects you attach.
  2. If attached to several Projects, the need to have a setting significantly higher than 0.1 days is lessened.
  3. If attached to several Projects the maximum recommended setting is 40% of the shortest Deadline divided by the number of Projects.

[edit] Related Messages

[edit] Work Scheduler Messages (General)

[edit] "Trigger" Messages

[edit] Work Fetch Policy Messages

[edit] Work Schedule Policy Messages

[edit] Also See

Personal tools