Redundancy and Errors
From Unofficial BOINC Wiki
Contents |
[edit] General
A BOINC 'Result' abstracts an instance of a computation, possibly not performed yet. Typically, a BOINC Server sends 'results' to Participant's Hosts, and the BOINC Daemon on those Hosts perform the computation and replies to the Project. But many things can happen to a Result:
- The client computes the Result correctly and returns it.
- The client computes the Result incorrectly and returns it.
- The client fails to download or upload files.
- The application crashes on the client.
- The client never returns anything because it breaks or stops the running of the BOINC Client Software.
- The Scheduler isn't able to send the Result because it requires more resources than any client has.
The BOINC System provides a form of Redundant Computing in which each computation is performed on multiple clients, the Results are compared, and are accepted only when a 'consensus' is reached. In some cases new Results must be created and sent.
BOINC manages most of the details; however, there are two places where the application developer gets involved:
- Validation: This performs two functions. First, when a sufficient number (a 'quorum') of successful Results have been returned, it compares them and sees if there is a 'consensus'. The method of comparing Results (which may need to take into account platform-varying floating point arithmetic) and the policy for determining consensus (e.g., best two out of three) are supplied by the application. If a consensus is reached, a particular Result is designated as the Canonical Result|'Canonical' Result]]. Second, if a Result arrives after a consensus has already been reached, the new Result is compared with the Canonical Result; this determines whether the user gets Credit.
- Assimilation: This is the mechanism by which the project is notified of the completion (success or unsuccessful) of a work unit. It is performed exactly once per work unit. If the Work Unit was completed successfully (i.e. if there is a canonical result) the project-supplied function reads the output file(s) and handles the information, e.g. by recording it in a Master Science Database. If the Work Unit failed, the function might write an entry in a log, send an e-Mail, etc.
[edit] Examples
In the following examples, the project creates a Work Unit with:
- min_quorum = 2
- target_nresults = 3
- max_delay = 10
[edit] Example #1
The BOINC System automatically creates three Results, which are sent at various times. At time 8, two successful Results have returned so the Validator is invoked. It finds a consensus, so the Work Unit is assimilated. At time 10 Result 3 arrives; Validation is performed again, this time to check whether Result 3 gets Credit.
time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
created validate; assimilate
WU x x x
created sent success
result 1 x x---------------x
created sent success
result 2 x x-------------------x
created sent success
result 3 x x-----------------------x
[edit] Example #2
In the next example, Result 2 is lost (i.e., there's no reply to the BOINC Scheduler). When Result 3 arrives a consensus is found and the Work Unit is assimilated. At time 13 the Scheduler 'gives up' on Result 2 (this allows it to delete the Canonical Result's output files, which are needed to validate late-arriving Results).
time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
created validate; assimilate
WU x x x
created sent success
result 1 x x---------------x
created sent lost give-up
result 2 x x-------- x
created sent success
result 3 x x-----------------------x
[edit] Example #3
In the next example, Result 2 returns an error at time 5. This reduces the number of outstanding Results to 2; because target_nresults is 3, BOINC creates another Result (result 4). A consensus is reached at time 9, before Result 4 is returned.
time 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
created validate; assimilate
WU x x x
created sent success
result 1 x x---------------x
created sent error
result 2 x x-------x
created sent success
result 3 x x-------------------x
created sent success
result 4 x x----------------------x
[edit] Also See
[edit] UCB Source
[edit] Copyright ©
- 2005 University of California
- 2005 Paul D. Buck
Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.

