Standard deviation.
There are three schools of thoughts about standard deviation.
- Those who don’t know what it is and are happier for it.
- Those who are ready to pull out stats program and merrily calculate away.
- Those who buried the memory from a long-ago stats class and only vaguely remember the concept.
In any dataset there will be oddities. One common way of removing outlier data is to consider the standard deviation, i.e. the amount of variation of a random variable might be expected about the mean or average.
Determining the standard deviation of a dataset and using that metric as a determiner for outliers is a common technique. Medicare, for example, occasionally uses the standard deviation approach to identify data to be excluded from a calculation. (To be fair, Medicare also employs other statistical tools in finding outliers.) With any tool, care has to be taken because sometimes legitimate data simply does not fit the expected model. .
For example, let us say you are trying to determine the average number of hospital outpatient visits across Ohio acute care hospitals. A small, rural hospital may have only a several hundred. Conversely, the Cleveland Clinic typically has hundreds of thousands. Should both be included? Or should one or the other excluded? The choice the analyst makes will affect the outcome and any conclusions that might derive from it.
When reporting results, methodology disclosure becomes important and should be documented. An analyst’s choices help determine the conclusions that can be drawn from the data.