What you will learn
- calculate the five-number summary for a data set,
- construct and interpret boxplots (box-and-whisker plots),
- use the interquartile range (IQR) to measure spread,
- identify outliers using the rule,
- compare distributions using parallel boxplots,
- organise categorical data in two-way tables,
- critically analyse statistical claims and identify sources of bias.
Two classes sit the same test. Class A scores: . Class B scores: .
- Class A five-number summary: min , , median , , max .
- Class B five-number summary: min , , median , , max .
- Class A has a higher median and smaller IQR () vs Class B ().
- Class A performed more consistently; Class B had wider variation.
Key idea: the boxplot reveals both the typical score (median) and how spread out the scores are (IQR).
1. The five-number summary
The five-number summary consists of:
| Statistic | Meaning |
|---|---|
| Minimum | Smallest value |
| (lower quartile) | Median of the lower half |
| Median () | Middle value |
| (upper quartile) | Median of the upper half |
| Maximum | Largest value |
The interquartile range (IQR) measures the spread of the middle of data:
Data (already ordered): .
There are values.
- Minimum , Maximum .
- Median (average of the 6th and 7th values).
- Lower half: . .
- Upper half: . .
- IQR .
2. Constructing and interpreting boxplots
A boxplot displays the five-number summary graphically:
- The box spans from to and contains the middle of data.
- The line inside the box marks the median.
- The whiskers extend to the minimum and maximum (or to the most extreme non-outlier values).
3. Identifying outliers
An outlier is a value that is unusually far from the rest of the data. The standard rule:
Outlier boundaries
Any data value below the lower fence or above the upper fence is classified as an outlier.
A data set has , , IQR . The maximum value is . Is an outlier?
- Upper fence .
- Since , the value is an outlier.
- On a boxplot, the upper whisker would stop at (or the largest non-outlier value), and would be plotted as an individual dot.
4. Comparing distributions and two-way tables
Parallel boxplots (drawn on the same scale) allow direct comparison of centre, spread, and shape.
When comparing, comment on:
- Centre: which group has a higher/lower median?
- Spread: which group has a larger/smaller IQR?
- Shape: is either distribution symmetric or skewed?
- Outliers: does either group have unusual values?
A two-way table organises categorical data by two variables. It shows frequencies and can reveal associations.
A survey of students asks about pet ownership and gender.
| Owns a pet | No pet | Total | |
|---|---|---|---|
| Female | 32 | 18 | 50 |
| Male | 28 | 22 | 50 |
| Total | 60 | 40 | 100 |
- .
- .
- .
- Females are slightly more likely to own a pet in this sample, but the difference is small.
5. Analysing statistical claims
When evaluating a statistical claim, consider:
- Sample size: is it large enough to be reliable?
- Sampling method: is it random and representative, or biased?
- Measures used: does the claim use mean, median, or mode? Which is most appropriate?
- Visualisation tricks: are axes truncated or scales distorted?
- Causation vs correlation: does the claim imply cause when only association is shown?
A company claims “9 out of 10 dentists recommend our toothpaste.” What questions should you ask?
- How were the dentists selected? (If they were paid by the company, the sample is biased.)
- What was the exact question? (“Do you recommend brushing teeth?” is different from “Do you recommend this specific brand?”)
- How large was the sample? ( dentists is too small to generalise.)
- Were dentists who disagreed excluded from the report?
Practice
Tier 1: basic skills
- Find the five-number summary for: .
- Calculate the IQR for the data in Q1.
- A data set has , . Find the upper and lower fences for outlier detection.
- The five-number summary for a data set is: . Sketch a boxplot.
- A boxplot has its median closer to than to . Is the distribution positively or negatively skewed?
- In a two-way table, out of people surveyed are left-handed. What proportion is left-handed?
- A data set has and IQR . What is ?
- True or false: the median always lies exactly in the centre of the box in a boxplot.
Tier 2: mixed practice
-
The heights (cm) of students are: . Find the five-number summary, identify any outliers, and sketch a boxplot.
-
Two classes have the following five-number summaries for a maths test (out of ):
- Class X: .
- Class Y: . Draw parallel boxplots and write two comparison statements.
-
A two-way table shows transport mode and year level:
Bus Car Walk Total Year 9 30 20 10 60 Year 10 15 35 10 60 Total 45 55 20 120 Find and . What do you notice?
-
A newspaper reports “Average house prices rose by .” Explain why the median might be a better measure than the mean for house prices, and how a few expensive sales could distort the mean.
-
A data set has values: . Show that is an outlier using the rule.
Tier 3: explain and apply
- A study claims students who eat breakfast score higher on tests. The data shows a correlation. Explain why this does not prove causation and suggest a confounding variable.
- Two factories produce bolts. Factory A: median length mm, IQR mm. Factory B: median length mm, IQR mm. Which factory produces more consistent bolts? Which is closer to the target of mm? Discuss trade-offs.
- A survey of people finds that support a new policy. The survey was conducted online and only advertised on one social media platform. Identify two sources of potential bias and explain how each could affect the results.
- Explain the difference between the range and the IQR as measures of spread. Give an example where the range is misleading but the IQR is not.
Challenge
Harder reasoning
- A data set of values has , median , . If the value is added to the data set, explain qualitatively how each part of the five-number summary might change and whether would be classified as an outlier.
- Two data sets both have median and IQR , but one is symmetric and the other is positively skewed. Sketch boxplots for both and explain how the whisker lengths differ.
- A researcher collects data from people and presents a boxplot showing no outliers. A critic argues that with data points, some outliers are expected. Evaluate this argument.
- Design a two-way table for students that shows an association between “plays sport” and “gets more than hours of sleep.” Then modify it so there is no association. Explain the difference.