Wikipedia: History of Benchmark

Difference (from prior major revision) (minor diff, author diff)

Changed: 5c5,11

Types - chip level, subsystem level, system level, os level, application level???

As computer architecture advanced, it became more and more difficult to compare the performance of various computer systems simply by looking at their specifications. Therefore, tests were developed that could be performed on different systems, allowing the results from these tests to be compared across different architectures.

Benchmarks are designed to mimic a particular type of workload on a component or system. "Synthetic" benchmarks do this by specially-created programs that impose the workload on the component. "Application" benchmarks, instead, run actual real-world programs on the system. Whilst application benchmarks usually give a much better idea of real system performance than synthetic benchmarks, they tend to reflect all aspects of a system rather than individual parts of it, so synthetic benchmarks can be useful for evaluation of, say, a hard disk or networking device.

Computer manufacturers have a long history of trying to set up their systems to give unrealistically high performance on benchmark tests that is not replicated in real usage. For instance, during the 1980's some compilers could detect a specific mathematical operation used in a well-known floating-point benchmark and replace the operation with a mathematically-equivalent operation that was much faster. However, such a transformation was rarely useful outside the benchmark.

More generally, users are recommended to take benchmarks, particularly those provided by manufacturers themselves, with ample quantities of salt. If performance is really critical, the only benchmark that matters is the actual workload that the system is to be used for. If that is not possible, benchmarks that resemble real workloads as closely as possible should be used, and even then used with scepticism. It is quite possible for system A to outperform system B when running program furble on workload X (the workload in the benchmark), and the order to be reversed with the same program on your own workload.

Added: 10a17,21

* SPEC?
* BAPco?
* 3DMark?
* Quake
* Khornerstone

	Revision 5 . . (edit) November 1, 2001 12:13 am by WojPob
	Revision 4 . . (edit) October 29, 2001 8:07 pm by (logged).104.217.xxx [Minor grammar changes.]
	Revision 2 . . (edit) October 26, 2001 5:38 pm by Rkinder [Added links for Dhrystone, Whetstone]