Microsoft Excel is generally considered the industry standard in spreadsheet software, specifically because it can adapt to multiple functions and because it is easy to use. Once you know how to use Excel, you know how to use Excel for life. Excel has a variety of uses, from preparing simple order sheets to calculating to creating complex graphic statistical analyses. In this article, we will give you a walk-through on how to find outliers in Excel and why finding outliers is an essential piece of data analytics in statistics.
What Is the Outliers Function in Excel?
Before learning how to find outliers in Excel, you should first know that there is an outliers function embedded in the software that makes it easy to calculate what is and isn't an outlier. In fact, there are two methods of doing this, including a helpful graph that gives you a visual of the outliers and a formula that helps identify the outlier without forcing you to identify the outliers by hand.
Let's take a look at some vocabulary you'll need to know before you start learning how to find outliers in Excel.
An outlier is a value that is significantly higher or lower than most of the values in your data and skews your data enough that you might draw an incorrect conclusion about the statistics represented by your data. Outliers aren't bad, but they definitely aren't good, either. You can think of an outlier as a bit of imperfection that can ruin your data - like finding a fly on your cake.
A box plot is a kind of graph that makes it easy to visually spot outliers. You can, of course, use Excel to create a box plot if you are so inclined, although that information will be on another tutorial. To put it simply, a box plot is useful because the box is the central tendency of the data. These are plots that show you how data is clustered around a central measure such as the median (middle) value in the data. The upper bound line is the limit of the centralization of that data. If you have values that are outliers on a box plot, then they become very obvious and stick out like a sore thumb.
Essentially, quartiles represent how data is broken up into quarters. Quartiles are actually a central tool in learning how to find outliers in Excel, since the formula for outliers relies on quartiles to make a calculation. A quartile is a dividing point which splits the data into quarters. There are 3 quartiles: Q1, Q2, and Q3. The first quartile (Q1) marks the lower quarter of your data where lesser values live. The middle values live between Q1 and Q3, with Q2 representing the center of your range, and values above Q3 are the upper portion of the data. Outliers live outside the inner quartile range. We'll go over how to calculate quartiles below.
Array and Quart
The array is the range of values that you are evaluating. And the quart is a number that represents the quartile you wish to return (e.g., 1 for the 1st quartile, 2 for the 2nd quartile, and so on). We mention array and quart under this vocabulary lesson because it is vital that you get these two meanings ingrained in your head since they are the values that you need to actually perform the calculation in Excel. Be careful not to get the two confused.
Now that you have a firm grasp on the terms you will be using to input the formulas for how to find outliers in Excel, it's time to get down to the step-by-step instructions.
Steps for How to Find Outliers in Excel
As you know, Excel has a ton of functions for statistical analysis that can greatly improve your data analysis. One of those functions is the ability to input a formula to find outliers in statistical formulas. Outliers are pests in statistical analysis since the extreme data points can skew your results and cause misleading assumptions. Outliers can drag your statistical average up or down, which means you could be looking at wildly inaccurate data. Removing outliers gives you a better idea of your actual data.
So, how do you do it? There are a few steps involved in how to find outliers in Excel - enough steps that it could be potentially intimidating to get a good grip on the process. But if you stay with it and follow each step, you will be finding outliers very quickly.
STEP ONE: CALCULATE THE QUARTILES
This first step is sort of like a preparation for the main outlier formula. You need to perform this formula well in order to move on to the next step. Remember, a quartile is how your data is split into quarters.
In Excel, you can easily get quartile values by using the QUARTILE function. This function requires two arguments: a range of data (array) and the quartile number you want (quart). The two arguments in the functions are the data that you want to explore. When you have your data in Excel, it is mostly a list of values in a column, which makes this list your data array. The other part of the arguments is the quartile you want to define.
You should identify the 1st and 3rd quartiles by using these formulas:
Excel will do most of the hard work for you. But say that your target data array covers the cells from B3 to B22. When you input the formula to find the quartile, your array will actually be B3:B22 (the colon sets the range of the cells). That means that your final formula for quartile 1 will be:
and your final formula for quartile 3 will be:
After inputting the formula, hit enter and Excel will do the rest.
STEP TWO: CALCULATE THE INTERQUARTILE RANGE
The interquartile range (or IQR) is the middle 50% of values in your data. It is calculated as the difference between the 1st quartile value and the 3rd quartile value.
There is a simple formula that subtracts the 1st quartile from the 3rd quartile. You can use any cell you want to calculate the interquartile range, but for the sake of this example, we will use F cells.
You calculate the interquartile range by subtracting Q1 from Q3. So in Excel, you would simply input
and hit enter.
STEP THREE: FIND THE LOWER BOUND
The lower and upper bounds are the smallest and largest values of the data range that we want to use. Any values smaller or larger than these bound values are the outliers. We'll calculate the lower bound limit in cell F5 by multiplying the IQR value by 1.5 and then subtracting it from the Q1 data point, which will look like this:
STEP FOUR: FIND THE UPPER BOUND
To calculate the upper bound in cell F6, we'll multiply the IQR by 1.5 again, but this time add it to the Q3 data point, which makes the formula
Calculating the lower and upper bounds might seem tedious at first glance, but it's actually another vital piece of information that you will use later on. By knowing what the smallest and largest values of your data range are, you are more easily able to find data that doesn't quite fit. If any of your data falls below or above these limits, it will be considered an outlier.
STEP FIVE: IDENTIFY THE OUTLIERS
Now that we've got all our underlying data set up, it's time to identify our outlying data points. We're looking for values that are lower than the lower bound value or higher than the upper bound value.
In Excel, you can use the OR function to perform this logical test and show the values that meet these criteria by entering the following formula into cell C2:
After you hit enter, Excel will spit out a value. This is the value that you will copy into each of the corresponding C cells of your data array, which would be C3 to C22.
A TRUE value indicates an outlier, while a FALSE value indicates that there is no outlier.
Why Should You Find Outliers?
When performing data analysis, you usually assume that your values cluster around some central data point (a median). But sometimes a few of the values fall too far from the central point. These values are called outliers (they lie outside the expected range). Outliers can skew your statistical analyses, leading you to false or misleading conclusions about your data. Using outliers is an easy way of proving or disproving your statistical assumption. If you have more outliers than you have accurate data points, then the chances are that your conclusion about your statistical data is not going to fall in line with your hypothesis.
It's easier than you think to learn how to find outliers in Excel. And since an outlier is a value that causes a misleading assumption, it's actually pretty important to know how to find them and how to get rid of them in order to accurately interpret your data. Excel makes this highly involved piece of statistical analysis into something that anybody can perform.