<< Chapter < Page Chapter >> Page >

An important characteristic of any set of data is the variation in the data. In some data sets, the data values are concentrated closely near the mean; in other data sets, the data values are more widely spread out from the mean. The most common measure of variation, or spread, is the standard deviation. The standard deviation is a number that measures how far data values are from their mean.

The standard deviation

  • provides a numerical measure of the overall amount of variation in a data set, and
  • can be used to determine whether a particular data value is close to or far from the mean.

The standard deviation provides a measure of the overall variation in a data set

The standard deviation is always positive or zero. The standard deviation is small when the data are all concentrated close to the mean, exhibiting little variation or spread. The standard deviation is larger when the data values are more spread out from the mean, exhibiting more variation.

Suppose that we are studying the amount of time customers wait in line at the checkout at supermarket A and supermarket B . The average wait time at both supermarkets is five minutes. At supermarket A , the standard deviation for the wait time is two minutes; at supermarket B . The standard deviation for the wait time is four minutes.

Because supermarket B has a higher standard deviation, we know that there is more variation in the wait times at supermarket B . Overall, wait times at supermarket B are more spread out from the average; wait times at supermarket A are more concentrated near the average.

Calculating the standard deviation

If x is a number, then the difference " x – mean" is called its deviation . In a data set, there are as many deviations as there are items in the data set. The deviations are used to calculate the standard deviation. If the numbers belong to a population, in symbols a deviation is x μ . For sample data, in symbols a deviation is x x .

The procedure to calculate the standard deviation depends on whether the numbers are the entire population or are data from a sample. The calculations are similar, but not identical. Therefore the symbol used to represent the standard deviation depends on whether it is calculated from a population or a sample. The lower case letter s represents the sample standard deviation and the Greek letter σ (sigma, lower case) represents the population standard deviation. If the sample has the same characteristics as the population, then s should be a good estimate of σ .

To calculate the standard deviation, we need to calculate the variance first. The variance is the average of the squares of the deviations (the x x values for a sample, or the x μ values for a population). The symbol σ 2 represents the population variance; the population standard deviation σ is the square root of the population variance. The symbol s 2 represents the sample variance; the sample standard deviation s is the square root of the sample variance. You can think of the standard deviation as a special average of the deviations. Formally, the variance is the second moment of the distribution or the first moment around the mean. Remember that the mean is the first moment of the distribution.

Get Jobilize Job Search Mobile App in your pocket Now!

Get it on Google Play Download on the App Store Now




Source:  OpenStax, Business statistics -- bsta 200 -- humber college -- version 2016reva -- draft 2016-04-04. OpenStax CNX. Apr 05, 2016 Download for free at http://legacy.cnx.org/content/col11969/1.5
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Business statistics -- bsta 200 -- humber college -- version 2016reva -- draft 2016-04-04' conversation and receive update notifications?

Ask