SETI@Home Data Integrity
From Unofficial BOINC Wiki
In a world where accidents happen and mistakes are made, how do we know whether Results returned from a SETI@Home Science Application are legitimate? Because errors happen, it's good to have a check of the results. Fortunately, SETI@Home has enough volunteers such that we can process each piece of data more than once and compare the potential signals detected by different computers to one another. We use the result of this comparison to rank our Results by how confident we are that they were processed correctly.
The possible outcomes of the comparison of a signal are:
- We mark the signal as fully verified if 60% or more of the results for this Work Unit contain a matching signal.
- If the signal cannot be verified we mark the signal as unverified. This can happen for two reasons. Early in the Project, when we had fewer users, we were unable to process every Work Unit multiple times, so some early Work Units cannot be Validated. There are also many Work Units that were processed by more than one version of the SETI@Home Science Application. More recent versions include analysis that was not present in the early versions, so certain signals will only be found with new versions.
- If a signal is present in more than one of the compared Work Units, but less than 60%, we mark it as questionable.
- If a signal is present in only one Work Unit, but should have been detected in others, we mark it as an incorrect signal.
Using the results of this comparison, we assign each result a numerical score. We then choose the result with the best score in each repeated group and copy it to our master database, where it will be examined further.
The verification scores will be used in later processing when choosing potential candidate signals, those that are fully verified will be given higher priority than those that cannot be verified. Those that are marked as incorrect will not be considered further.