Measures of Central Tendency
Measures of central tendency locate a distribution of data along an appropriate scale.
The following table lists the functions that calculate the measures of central tendency.
Function Name | Description |
---|---|
Geometric mean | |
Harmonic mean | |
Arithmetic average | |
50th percentile | |
Most frequent value | |
Trimmed mean |
The average is a simple and popular estimate of location. If the data sample comes from a normal distribution, then the sample mean is also optimal (minimum variance unbiased estimator (MVUE) of µ).
Unfortunately, outliers, data entry errors, or glitches exist in almost all real data. The sample mean is sensitive to these problems. One bad data value can move the average away from the center of the rest of the data by an arbitrarily large distance.
The median and trimmed mean are two measures that are resistant (robust) to outliers. The median is the 50th percentile of the sample, which will only change slightly if you add a large perturbation to any value. The idea behind the trimmed mean is to ignore a small percentage of the highest and lowest values of a sample when determining the center of the sample.
The geometric mean and harmonic mean, like the average, are not robust to outliers. They are useful when the sample is distributed lognormal or heavily skewed.
Measures of Central Tendency
This example shows how to compute and compare measures of location for sample data that contains one outlier.
Generate sample data that contains one outlier.
x = [ones(1,6),100]
x = 1×7
1 1 1 1 1 1 100
Compute the geometric mean, harmonic mean, mean, median, and trimmed mean for the sample data.
locate = [geomean(x) harmmean(x) mean(x) median(x)... trimmean(x,25)]
locate = 1×5
1.9307 1.1647 15.1429 1.0000 1.0000
The mean (mean
) is far from any data value because of the influence of the outlier. The geometric mean (geomean
) and the harmonic mean (harmmean
) are influenced by the outlier, but not as significantly. The median (median
) and trimmed mean (trimmean
) ignore the outlier value and describe the location of the rest of the data values.