BOINC FAQ: Performance
From Unofficial BOINC Wiki
[edit] How can I make BOINC run as fast as possible?
Let us start out and talk about "tuning" your current system first. There are a number of things that you can do that will give you a potential immediate boost in speed. These include:
- Turn off the display and do not run the Screen Saver, or any other Screen Saver. Set the screen to go blank in 1 or 2 minutes instead.
- Turn off all background programs like instant messengers, disk organizers, etc.
- Do not use the computer to play music.
- Remove the rapid start systems from the start up folder, programs like Quicken and Microsoft Office have these applets that will put an icon in the Taskbar, but what they do is take up memory space.
- Make the Preference Setting to retain the suspended Work Units in memory.
- Change the Preference Setting to write to the disk to write a Checkpoint to the disk from every minute to every 15 minutes to 3 hours. Keep in mind that this setting may be ignored by some of the Science Applications.
- Make sure the Preference Setting for using processors is correctly set for your computer. It usually is correctly set, but if you have a Hyper-Threaded capable processor this should be set to 2 instead of one.
- While in the BIOS check the setting to use the Performance Defaults, or Optimized Defaults rather than the "Fail-Safe" Defaults.
- Make sure the Preference Setting for the use of Virtual Memory is set to a value in the general vicinity of 75% to 90%.
- Leave the computer on 24x7 and don't use it for anything else.
- Do not try to run more than one instance of the BOINC Client Software or a Science Application per each logical or physical CPU. Since the BOINC Client Software does the "smart" thing naturally, don't try to help!
Ok, now lets talk about upgrading the hardware.
For the most part SETI@Home on the BOINC Software and the majority of the Science Applications do not require many things besides those computer features which emphasize raw speed. There will be, however, Projects that will have unique needs and may change the recommendations below. With that being said, if you are serious about getting maximum speed; the general order of things to consider getting faster and bigger are:
- Increased number of processors (either increase the number of Physical Processors or get a CPU that has some version of hyper-threading for additional logical processors).
- Make sure the processor is not a low-end, bargain basement, type. CPUs like the Celeron are inexpensive, but they are also poor performers in the mathematically oriented Science Applications that we will be running. In general, the more expensive the processor, the better it will be to run a BOINC Powered Project.
- Increased cache sizes means more of the program and data can be stored in faster cache memory.
- Increased cache speed (Note this does not normally vary within a processor family at a particular clock speed).
- Increased number of Caches, Level 1 plus Level 2 is faster than Level 1 alone, adding a Level 3 cache also has a potential to increase throughput.
- Increased speed of the Arithmetic and Logic Unit (ALU), which includes the Floating Point Unit (FPU) and integer unit. Again this does not normally vary within a processor family.
- Increased memory speed. Memory speed can be increased through changes in timing and the number of memory channels. Make sure that you have at least dual channel capability for your RAM.
- When buying a motherboard, make sure you get one that uses a performance chip set. They are usually easy to spot - they cost more.
- Increased RAM capacity. The more RAM in the computer the merrier BOINC will be. Most computers come with 512 Megabytes of RAM or less. My typical machine starts with 1 Gigabyte of RAM (My Macintosh has 2.5 Gigabytes right now, my next will likely be 3 or more Gigabytes).
- Increased speed (in RPM) of the disk drive (increased RPM decreases the Rotational Latency).
- Decreased disk drive Seek Time (rated usually by access time in milliseconds (ms)) the lower the number is, the better the performance.
- Increased size of the disk drives Cache. This is mostly going to affect those projects which have Work Units that cannot completely fit into the main memory.
If that is too technical, in general buy a computer that is more expensive than what you want (well, what your spouse wants) to pay for a new system. In general, more expensive implies greater ability. SETI@Home running on the BOINC Client Software (and the other science applications generally) will do better with faster components in the CPU and memory arenas with only very, very, very small gains with faster disk drives and the other parts.
We will say that if you are upgrading only for improved Science Application performance, well, this may not be one of the most important things. Upgrade when you have another real problem that you need to address with an improved computer. At that point you can try to determine where BOINC and the specific project's could get a boost from improved components.
Ask on the message boards for recommendations. If at all possible, try before you buy.
Oh, and one last thing, the newest 64-bit processors coming on line will not, just because they are a 64-bit architecture, grant a significantly greater throughput for SETI@Home and / or any other Distributed Computing project. This may change in the future, but for the moment, if you do not have any other reason to buy a 64-bit machine it will be fine to get the fastest of the 32-bit Intel or AMD microprocessors.
One last choice, and one that I, Paul D. Buck, of international fame, do not recommend is to "Overclock" your computer. If you know what you are doing, you can get a performance boost this way. If you do not know what you are doing you can, quite literally melt the processor. Once again, not recommended.
[edit] Will the BOINC Software run faster with more RAM (e.g., 256 MB instead of 128 MB)?
The BOINC Client Software uses about 16-50 MB of RAM while it's running. Beyond a certain point (typically 128 MB, more if you run memory intense applications) more RAM won't make it run faster. As a general rule, if your normal applications on you computer run without any problem, the BOINC Client Software will run just as well. If you see significant slowdowns when running your normal software, you will likely need more memory for them before it seriously affects the BOINC Client Software.
Most Science Applications require high throughput in the CPU arena because they are "compute-bound" rather than "memory-bound" or "I/O-bound". The only memory factor that can be almost as important as the CPU throughput is the Memory Bandwidth. Though there is no reason to generate and run projects that process large amounts of data, most current scientific problems can be pared down to smaller chunks of data upon which we beat (seemingly to death) to generate a Result.
As an example, the SETI@Home Classic process did about 275 Billion operations on 107 seconds of data that was about 380 Kilo-Bytes in size. A very small chunk of data with lots, and lots, and lots, and lots of processing.
You should note that the Science Applications will have distinct and separate requirements on their own. In other words, we do have some "it depends" based on the exact requirements and demands for the science application itself. So, this means that you not only have to be concerned about the needs of the BOINC Client Software (which are minimal) but also those demands posed by the science applications. For that information you will have to check the Project's Web Site for details.
It is possible that for some types of research that the data models will be very large and could be helped by more RAM. But this is not likely. For the most part, the science models are sliced and diced to very small chunks of stuff so that they will run well on the usual desktop PC. The researchers know what is out there and are not likely to create a system that is so large and ugly that no one will run it because they cannot.
Those programs would still be run on the monolithic Supercomputers of yore.
[edit] What is meant by the term "Compute-Bound"?
The limiting factor to producing work is dependent on the number of operations done per second because of the complexity or large number of iterations. The faster the CPU gets the less of an impact it has on the calculations. To put this into a historical context I owned an early 8-Bit CPU that ran at about 1 MHz, it had a cache memory (the memory chips cost more than either the processor or the Floating Point Unit (FPU)) and it took several days to produce a calculated fractal image. Several years later running the same process on a Pentium Pro, that same image was done in seconds. The work was the same in both cases, but the later generation CPU was able to produce results faster. On the first computer we say a Compute-Bound process (the number of calculations need made the total process slow) was made into an I/O bound process (the time to compute each result was faster than the time it took to display that result) in a later generation machine.
[edit] What is meant by the term "Memory-Bound"?
The limiting factor in a process that is Memory-Bound is the amount of available memory to hold the data. As an example we could look at climate prediction problems (running something like Climateprediction.net) as a problem that has a potential problem with the number of variables and the sampling interval. If we could do this, to get a good model we might (theoretically) need to make samples of temperature, barometric pressure, wind direction, etc. at 1 Kilometer intervals in cubes over a continent. Now, we have billions of measurements taken over time that now need to be processed. But the sheer mass of data becomes a limiting factor, simply because we cannot fit it into memory.
Oh, by the way, this problem is also usually Compute-Bound at the same time as it is Memory-Bound, the major distinction is that you run out of memory usually first.
[edit] What do you mean by the term "Memory Bandwidth"?
The total speed of the system in moving data from the memory to the processor and back. Increasing memory clock speed raise the Memory Bandwidth, so does making the data path wider. This is why we went from 4-bit to 8-bits to 16-bits to 32-bits to 64-bits and likely why eventually we might start to see the 128-bit machines show up anytime now. Just for fun, I have to point out that some numbers are actually more efficient if we represent them as a shorter value (like the value for 0) and increasing the memory width provides no benefit and in fact can reduce efficiency.
[edit] I was thinking of upgrading my computer to make the BOINC Client Software and my Science Applications run faster. What should I buy?
For the most part the BOINC Client Software and the majority of Science Applications do not require many things besides those computer features which emphasize raw speed. There will be, however, projects that will have unique needs and may change the recommendations below.
Some of the things that affect the processing speed of the computer are:
- FSB Speed: Front Side Bus speed, this is the master clock that regulates how fast the CPU/RAM and all else in the system can be.
- CPU Speed: A multiple of FSB speed. (example 800 FSB x 4.0 = 3.2 GHz)
- L1, L2, L3 Cache size: Most systems have only L1 and L2 cache. A L2 cache of 1024K seems to be the sweet spot for SETI@Home. Data is reprocessed a few times for most operations, and if more of the data stays in Cache, less RAM access you have to access data across the slower memory bus.
- RAM Speed: How fast the CPU can get the RAM to return its data (or program segments). This is controlled by the FSB speed. Some FSBs settings won't allow maximum speed of some RAM. (example FSB 533 with DDR 3200 RAM).
- DIMM: Dual In-line Module. Pentium 3, and older
- DDR: Double (or Dual) Data Rate RAM, gets more data per access. Examples: PC2100, PC2700, PC3200 DDR.
- DDR2: ummmm...Not quad, but better than DDR, Examples: DDR2-400, DDR2-533
- Single vs. Dual RAM sticks: Some motherboards can take advantage of two RAM sticks and get data faster from them. Each stick has a maximum retrieval rate, but if the motherboard uses one, then the other alternatively, it can double the speed capacity. Some computers will only take advantage of dual-channel mode, and the resulting increase, in speed if both RAM sticks are of identical size.
- Laptop CPU throttling: Newer CPUs can have their Front Side Bus, and internal clocks changed while they are running. If a laptop wants to save battery, or cut down on heat, the BIOS/Windows® drivers may be written to reduce these clocks for either purpose. The result is slower Work Unit crunching.
- BIOS RAM settings: Things called RAS and CAS settings can affect how fast RAM is used. Even if you have fast RAM, slower settings for these values will under-utilize the RAM's speed capabilities.
With that being said, if you are serious about getting maximum speed the general order of things to consider getting faster and bigger are:
- Increase the number of Processors (either Dual-CPUs or a version of Hyper-threading for additional logical CPUs).
- Increased cache sizes means more of the program and data can be stored in faster memory.
- Increased cache speed, this does not normally vary within a processor family.
- Increase the number of Caches, Level 1 plus Level 2 is faster than Level 1 alone, adding a Level 3 cache also has a potential to increase throughput.
- Increased speed of the Arithmetic and Logic Unit (ALU), which includes the Floating Point Unit (FPU) and integer unit. Again this does not normally vary with a processor family.
- Increased memory speed, memory speed can be increased through changes in timing, number of memory channels, wider data paths, and higher bus (data transfer) speeds.
- A motherboard with dual channel RAM access capability.
- Increased speed of the disk drive (increased RPM decreases Rotational Latency) with 15,000 RPM being better than 10,000 RPM which is better than 7,200 RPM.
- Decreased disk drive Seek Time (rated usually in milliseconds (ms), with lower numbers being better).
- Increased size of the disk drive cache or buffer (Standard size today is about 2 Mega-Bytes now, with 8 Mega-Bytes being the newer standard, and it is better to have more).
- Locate a software "RAM Disk" program (though if the computer shuts down before saving the work to the real disk drive you will lose all that work).
- Increased Motherboard Front Side Bus Speed (FSB)
If that is too technical, in general buy a computer that is more expensive than what you want (well, what your spouse wants) to pay for a new system. In general, more expensive implies greater ability. The BOINC Software and the science applications generally do better with faster components in the CPU and memory arenas with only very, very, very small gains with faster disk drives and the other parts.
We will say that upgrading only for the BOINC Software and your science applications may not be one of the most important things. Upgrade when you have another real problem that you need to address with an improved computer. At that point you can try to determine where the BOINC Software and the specific project's science applications could get a boost from improved components.
In addition, even though most BOINC science applications are compute-bound rather than memory-bound, applications other than BOINC may benefit substantially from increased RAM size. Increasing RAM size may also be beneficial if you plan to run BOINC in the background while doing other work in the foreground.
Ask on the message boards for recommendations. If at all possible, try before you buy.
Oh, and one last thing, the newest 64-bit processors coming on line will not, just because they are a 64-bit architecture, grant a significantly greater through put for most of the science applications. This may change in the future, but for the moment, if you do not have any other reason to buy a 64-bit machine it will be fine to get the fastest 32-bit Intel or AMD microprocessors.
[edit] Does a Macintosh run BOINC faster than a PC?
It is hard to compare apples and oranges, even harder to compare Apples and PCs. For one thing, raw speed as expressed in MegaHertz (MHz) is not a valid way to determine which computer is really "faster". By comparing the amount of work done by the processors you can determine which does your work faster. In general, once again, buy your computer to do what you need and want it to do. Then install the BOINC Client Software and go on about your life.
If you can do the work, the parts for a rough equivalent Intel (or AMD) computer system will cost less than an equivalent Macintosh. But the key here is that you have find all the parts to put it together, and put them together, then make it work. If there are any problems, well, unfortunately they are your problems.
Just as a point of comparison my Macintosh is a PowerMac, Dual Processor G5 (2.0 GHz) with 1.5 GBytes of memory and it gets roughly the same throughput of Work Units with SETI@Home Classic as does my Hyper-Threaded 3.0 GHz Intel based system. Because of the internal components and displays I bought (I got the 23" wide screen for the Macintosh where with the Intel I have to suffer with a 20.1" LCD display) the cost for the two systems was comparable. For being robust and capable I have to give the nod towards the Macintosh, but that is my opinion.
[edit] What are logical CPUs?
Well, you can go to the Intel web site to get an in-depth technical description. Suffice it to say that there is a lot of repetition within most programs. There are many things that can also be done in parallel if we get sufficiently clever. What Intel did is to come up with a processor that has some "extra" internal resources that can be used when the processor determines that it can actually do several things at the same time. With the extra resources available it can take advantage of this and it "looks" like there are really two CPUs in operation when in real life there is just one Physical CPU and it is being "clever" and uses internal resources to processes the second of two threads of execution by managing these resources are not needed by the other thread.
Effectively we increase the internal capabilities (more Floating Point Unit (FPU) stuff) and try to keep two balls in the air at the same time. Since the early days of computers each generation increases the amount of work done in a computer by increasing the available resources, increasing the speeds of the resources, and making better use of the available resources. If we keep all, and I mean all available resources busy, well, incredible things happen.
By the way, the BOINC Client Software demonstrates this very clearly. You are keeping your computer busy doing something useful instead of just heating the room. By keeping millions of PCs fully occupied we are getting awesome results.
[edit] I heard about "over-clocking" will that make my systems faster?
Over-clocking should be attempted with caution, as if you could cause problems with your computer. If you get it wrong: Best case scenario - system instability, which will give incorrect results, may cause random shut-downs. Worst case scenario - motherboard/memory/cpu burns out meaning you have to buy a new one. If you do want to give it a go, there are lots of useful guides out there. It can be a cheap way of running your computer at faster speeds than specified, though you will probably have to shell out for more a better heatsink/fan than your stock one.
[edit] Should I "over-clock" my system?
No. But you asked me a "should I" question and I don't think the risk is really worth the reward. Long time over-clocking fans will tell you that it is a perfectly safe thing to do. IBM, Intel, and other chip makers will beg to differ. I got to tell you that I am going to side with the largest collection of engineers that say that it is not the thing to do.
With the instabilities created by "pushing" the processor past its performance "envelope" it is possible that the Floating Point Unit (FPU) will not return accurate results. This in turn will cause the Results to be, shall we say, less than accurate.
Less accurate calculations may mean that your Results will not pass Validation.
Besides, this is to do accurate science. It is not a race for Credit.
[edit] Are there settings that I can make that will improve performance?
There are two "General Preferences" that you can set to make the best use of your system.
| Preference | Setting |
|---|---|
| Do work while computer is running on batteries? | Yes |
| Do work while computer is in use? | Yes |
| Do work only between the hours of | (no restriction) |
| Leave applications in memory while preempted? | Yes |
| Switch between applications every | 60 minutes |
| Connect to network about every | 4 days |
| On multiprocessors, use at most | (your-max) processors |
| Write to disk at most every | 1800 seconds |
Our recommendations would be to have the work units resident in memory, and the time interval between 60 minutes and 3 hours. In general there is little reason to increase it above 60 minutes (the default) but if you want every sliver of performance you can increase it.
I will wager you will not be able to detect the difference.
The "Keep resident" setting is the one that will get you more bang for the buck. However, if you do not have a good amount of physical memory 512 to 1 Gigabyte or more, and a large amount of free-space on your disk drive. Well, you won't see much from this.
The last setting is the one for the "write to disk interval". This setting, made high enough can get you a lot more if you do not mind the risk of losing an hour or so of work.
Running the processing all the time has, usually, no discernible slowing down of the normal work done on the machine, but you get maximum benefit of those spare CPU cycles.
[edit] What is an "optimized" program?
When a program is translated from the original source language into machine executable instructions there is a possibility that the program is not in a form that will ensure the fasted possible execution speed or smallest size.
So, when we talk about optimization we have to be specific on which optimization we desire. If we have a machine with very little memory we may want to compile the programs for the smallest memory footprint/size. Since memory is not an issue in most modern computers we shall move onto the more challenging issue, execution speed.
A compiler program is intended to take the code that the programmer wrote and to change that into a form that can be processed by a computer. The most significant issue is to have an ensured conversion that will perform as programmed. In other words, the translated/compiled program must be correct. It must faithfully carry out the programmer's intention (well, what the programmer wrote down as his intention. The compiler cannot read the programmer's mind ...).
A secondary purpose is to make the program as fast as it can be with no fancy footwork. However, it is likely that the compiler will make a generic program and will chose instructions that may work, but will not necessarily be optimal for a specific model of computer. The classic situation is the PC world where we have two (actually more than two, but we shall stick to the two main providers) companies designing and selling processors. The internal architecture of these machines, even within a single company's processor line, are different. So much so, that a program that is compiled with optimizations for a specific architecture, a specific CPU model and "stepping&auot; is not likely to be optimal for another CPU model and "stepping&auot;.
So, we need bunches of compiled programs for the various CPU, "steppings", Operating Systems, etc. to achieve the best and fastest processing.
[edit] Where can I get a version of BOINC or a Science Application that has been "optimized"?
Optimized versions of BOINC Client Software and the SETI@Home Science Application are available for:
Mac OS X and Solaris: http://boinc.berkeley.edu/download_other.php
Linux:
[edit] What's the average processing time per Work Unit?
The first difficulty is that Work Units' processing times across the various BOINC Powered Projects are dependent on a lot of factors. Not the least of which is the actual characteristics of the Participant's Computer, the load on the computer for non-BOINC activities, etc.
Complicating things we have projects that have within the Project itself varying length Work Units. For example, LHC@Home has Work Units that will run for a different number of "turns", with 10,000, 100,000, and 1,000,000 with the implication that the time for a Work Unit to be processed within this Project has a time difference of 100 between the shortest and longest run times.
With that being said, the length of time it takes to process work for the Projects from shortest processing time to fastest is on maximum processing time is:
How did I determine this?
Well, I took 7,625 rows of data from my BOINCView Logs and entered them into a MySQL DataBase table (a REAL UGLY table), and then I ran this query:
SELECT AppName,
Min(FinalCPUTime),
Max(FinalCPUTime) MyMax,
Avg(FinalCPUTime) MyAvg,
Count(appname)
FROM BOINCViewLog
WHERE FinalCPUTime > 0
GROUP BY AppName
ORDER BY MyMax;
Which gave me this data:
| AppName | Min(FinalCPUTime) | Max(FinalCPUTime) | Avg(FinalCPUTime) | Count(AppName) |
|---|---|---|---|---|
| mfoldB125 | 4059.515625 | 9951.40625 | 7238.0598221144 | 956 |
| mfoldB120 | 4.25898504257202 | 15765.953125 | 7100.4262898567 | 382 |
| setiathome | 22.90625 | 30884.96875 | 13010.96382848 | 2641 |
| einstein | 28794.828125 | 51789.125 | 39240.910247093 | 43 |
| sixtrack | 2.8125 | 63085.390625 | 2671.8234310456 | 3115 |
| hadsm3 | 1793074.375 | 2325737.25 | 2041320.1964286 | 14 |
Now, the data in this table includes the actual Science Application names, 3 columns of the times to process Work Units (Minimum above 0, Maximum, and Average and a count of the number of records that the other data is based upon and the values are ordered on the maximum processing time.
If we look at the question from a slightly different perspective we have to change some things, and our answer changes. The length of time it takes to process work for the Projects from shortest processing time to fastest is on 'average processing time is:
Which is derived from the table:
| AppName | Min(FinalCPUTime) | Max(FinalCPUTime) | Avg(FinalCPUTime) | Count(AppName) |
|---|---|---|---|---|
| sixtrack | 2.8125 | 63085.390625 | 2671.8234310456 | 3115 |
| mfoldB120 | 4.25898504257202 | 15765.953125 | 7100.4262898567 | 382 |
| mfoldB125 | 4059.515625 | 9951.40625 | 7238.0598221144 | 956 |
| setiathome | 22.90625 | 30884.96875 | 13010.96382848 | 2641 |
| einstein | 28794.828125 | 51789.125 | 39240.910247093 | 43 |
| hadsm3 | 1793074.375 | 2325737.25 | 2041320.1964286 | 14 |
This was done by changing the SQL very slightly, to:
SELECT AppName,
Min(FinalCPUTime),
Max(FinalCPUTime) MyMax,
Avg(FinalCPUTime) MyAvg,
Count(appname)
FROM BOINCViewLog
WHERE FinalCPUTime > 0
GROUP BY AppName
ORDER BY MyAvg;
[edit] My AMD based computer is faster than my Intel based computer on one project, but not the other. Why is this?
Simple analogy: The Pentium 4 is a dragster, the Athlon is a sports car. On a quarter mile track the Pentium 4 rules, on a Rally Course the Athlon wins.
The Prescott can perform more operations per second than the Athlon under ideal conditions. The real world is a different story. The Pentium 4 has a slow floating point unit (because the designers counted on SSE/2/3 speeding up most tasks ... unfortunately Einstein@Home is not one of those), about half the speed of the Athlon, so even under ideal conditional a 3GHz Pentium 4 is only a match for a 1.5GHz Athlon.
To further complicate matters the information that the processor needs must be available but often is not, that can be caused by a Cache "miss" (which requires pulling the information from either L2 cache or main memory which are tens or hundreds of times slower). Due to the higher clock speed and longer pipeline (more operations being processed at the same time) the Pentium 4 suffers a higher penalty for misses. Branches are also be a problem since the processor does not know in advance which way the program will go. Both processors use "branch prediction" to try to make an educated guess as to which way the program will go and have those instructions ready to go (incorrect guesses often suffer a cache miss and flush of "speculative" operations from the incorrectly assumed path). Both processors have different branch predictors so they will get different "hit" rates for the same program, and again the Athlon suffers a smaller penalty for misses.
Source: Senator2

