WikiPedia: SummarizingStatisticalData

Showing revision 7

: ProbabilityAndStatistics -- StatisticalRegularity
: PlanningResearch -- InterpretingStatisticalData

In general, statistical data can be described as a list of subjects or Units and the data associated with each of them. Although most research uses many data types for each Unit, we will limit ourselves to just one data item each for this simple introduction.

We have two objectives for our summary:

1) How do the data for different Units seem similar? Statistical textbooks call the answer to this question a measure of CentralTendancy.

2) How do they differ? This is often called variability.

When we are summarizing a quantity like length or weight or age, it is common to answer the first question with the MeaN, the MediaN, or the MoDe.

The mean is what we call the average in ordinary English, just the sum of all the observations divided by the number of observations. Once we have chosen this method of describing the communality, we usually use the standard deviation to describe how the observations differ. The SD (as we abbreviate it) is the square root of the average of squared deviations from the mean.

The median is that value that separates the highest half of the sample from the lowest half. When we use the median to describe what the observations have in common, there are several choices for a measure of variability, the range, the interquartile range, and the absolute deviation. You can find out more about these from a good introductory statistics textbook.

The mode is the value that has the largest number of observations. The mode is not necessarily unique, unlike the mean and the median.

[RABeldin ]