Problem 42485. Eliminate Outliers Using Interquartile Range
Given a vector with your "data" find the outliers and remove them.
To determine whether data contains an outlier:
- Identify the point furthest from the mean of the data.
- Determine whether that point is further than 1.5*IQR away from the mean.
- If so, that point is an outlier and should be eliminated from the data resulting in a new set of data.
- Repeat steps to determine if new data set contains an outlier until dataset no longer contains outlier.
IQR: Interquartile Range is the range between the median of the upper half and the median of the lower half of data: http://www.wikihow.com/Find-the-IQR
To find an outlier by hand:
Data: [ 53 55 51 50 60 52 ] we will check for outliers.
Sorted: [ 50 51 52 53 55 60 ] where the mean is 53.5 and 60 is the furthest away (60-53.5 > 53.5-50).
1.5 * IQR = 1.5 * (55-51) = 6
Since 60-53.5 = 6.5 > 6, 60 is an outlier.
New Data: [ 53 55 51 50 52 ] we will check for outliers.
New Data Sorted: [ 50 51 52 53 55 ] where the mean is 52.2 and 55 is the furthest away.
1.5* IQR = 1.5 * (54-50.5) = 4.5
Since 55-52.2 = 2.8 < 4.5, 55 is NOT an outlier.
Our original data had one outlier, which was 60.
Example:
Input data = [53 55 51 50 60 52]
Output new_data = [53 55 51 50 52]
since 60 is an outlier, it is removed
*Note: A number may be repeated within a dataset that is an outlier. You should not remove all instances, but remove only the first instance and check the new dataset to determine whether this number is still an outlier (see 5th test suite).*
Solution Stats
Problem Comments
-
7 Comments
Hi,
first of all, nice problem.
But there might be a problem with the 5th test case. At one point the 61 is further away from the mean than the 65.
I agree: it looks like the two 61's in correct_data should be 65's.
Fixed. Thanks for catching.
58.5 should be 53.5 a few places in the description. Also, I'm getting weird behavior with this problem: I can get my function to pass all the test cases on my local machine, but all test cases are failed on the Cody server no matter what I've tried to far.
Fixed the typos, thanks for noticing. I'm not sure why your code may not be working. I'd suggest checking you aren't using something like iqr which is only in the Statistics (I think?) Toolbox.
That was it. I wish that Cody would give a warning that an unsupported function was being used. Better yet, I wish the Cody computer had all the toolboxes activated.
But why, in your example< do you say that 1.5 * (54-50.5)= 4.5 ? shouldn't this be 5.25?
Solution Comments
Show commentsProblem Recent Solvers27
Suggested Problems
-
305 Solvers
-
Make a random, non-repeating vector.
10571 Solvers
-
Getting the indices from a vector
11152 Solvers
-
Numbers spiral diagonals (Part 1)
218 Solvers
-
Return fibonacci sequence do not use loop and condition
711 Solvers
More from this Author1
Problem Tags
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!