Year 10 Mathematics | Victorian Curriculum 2.0
Boxplots and distributions
Topic 14 | Statistics & Probability | Practice

What you will learn

  • calculate the five-number summary for a data set,
  • construct and interpret boxplots (box-and-whisker plots),
  • use the interquartile range (IQR) to measure spread,
  • identify outliers using the 1.5×IQR1.5 \times \text{IQR}1.5×IQR rule,
  • compare distributions using parallel boxplots,
  • organise categorical data in two-way tables,
  • critically analyse statistical claims and identify sources of bias.
Why boxplots?

A boxplot packs five key statistics into one compact diagram. When you place two boxplots side by side, differences in centre, spread, and skewness become immediately visible. This makes boxplots one of the most efficient tools for comparing groups — far more informative than just quoting two averages.

Where you'll see this
  • Sport: comparing race times of two athletes across a season.
  • Health: comparing blood pressure distributions before and after a treatment.
  • Business: comparing customer wait times at different branches.
  • Media literacy: spotting misleading claims that cherry-pick a single statistic while ignoring spread.
Worked example 0 Real-world example: comparing test scores

Two classes sit the same test. Class A scores: 42,55,60,63,65,68,70,72,78,8542, 55, 60, 63, 65, 68, 70, 72, 78, 8542,55,60,63,65,68,70,72,78,85. Class B scores: 35,50,52,58,60,62,65,80,82,9535, 50, 52, 58, 60, 62, 65, 80, 82, 9535,50,52,58,60,62,65,80,82,95.

  1. Class A five-number summary: min =42= 42=42, Q1=60Q_1 = 60Q1​=60, median =66.5= 66.5=66.5, Q3=72Q_3 = 72Q3​=72, max =85= 85=85.
  2. Class B five-number summary: min =35= 35=35, Q1=52Q_1 = 52Q1​=52, median =61= 61=61, Q3=80Q_3 = 80Q3​=80, max =95= 95=95.
  3. Class A has a higher median and smaller IQR (72−60=1272 - 60 = 1272−60=12) vs Class B (80−52=2880 - 52 = 2880−52=28).
  4. Class A performed more consistently; Class B had wider variation.

Key idea: the boxplot reveals both the typical score (median) and how spread out the scores are (IQR).

1. The five-number summary

The five-number summary consists of:

StatisticMeaning
MinimumSmallest value
Q1Q_1Q1​ (lower quartile)Median of the lower half
Median (Q2Q_2Q2​)Middle value
Q3Q_3Q3​ (upper quartile)Median of the upper half
MaximumLargest value

The interquartile range (IQR) measures the spread of the middle 50%50\%50% of data:

Interquartile range

IQR=Q3−Q1\text{IQR} = Q_3 - Q_1IQR=Q3​−Q1​

Worked example 1 Finding the five-number summary

Data (already ordered): 12,15,18,20,22,25,27,30,35,40,42,5012, 15, 18, 20, 22, 25, 27, 30, 35, 40, 42, 5012,15,18,20,22,25,27,30,35,40,42,50.

There are 121212 values.

  1. Minimum =12= 12=12, Maximum =50= 50=50.
  2. Median =25+272=26= \dfrac{25 + 27}{2} = 26=225+27​=26 (average of the 6th and 7th values).
  3. Lower half: 12,15,18,20,22,2512, 15, 18, 20, 22, 2512,15,18,20,22,25. Q1=18+202=19Q_1 = \dfrac{18 + 20}{2} = 19Q1​=218+20​=19.
  4. Upper half: 27,30,35,40,42,5027, 30, 35, 40, 42, 5027,30,35,40,42,50. Q3=35+402=37.5Q_3 = \dfrac{35 + 40}{2} = 37.5Q3​=235+40​=37.5.
  5. IQR =37.5−19=18.5= 37.5 - 19 = 18.5=37.5−19=18.5.

2. Constructing and interpreting boxplots

A boxplot displays the five-number summary graphically:

102030405060Min12Q119Median26Q337.5Max50
Labelled boxplot showing the five-number summary.
  • The box spans from Q1Q_1Q1​ to Q3Q_3Q3​ and contains the middle 50%50\%50% of data.
  • The line inside the box marks the median.
  • The whiskers extend to the minimum and maximum (or to the most extreme non-outlier values).

3. Identifying outliers

An outlier is a value that is unusually far from the rest of the data. The standard rule:

Outlier boundaries

Lower fence

Lower fence=Q1−1.5×IQR\text{Lower fence} = Q_1 - 1.5 \times \text{IQR}Lower fence=Q1​−1.5×IQR

Upper fence

Upper fence=Q3+1.5×IQR\text{Upper fence} = Q_3 + 1.5 \times \text{IQR}Upper fence=Q3​+1.5×IQR

Any data value below the lower fence or above the upper fence is classified as an outlier.

Worked example 2 Detecting an outlier

A data set has Q1=19Q_1 = 19Q1​=19, Q3=37.5Q_3 = 37.5Q3​=37.5, IQR =18.5= 18.5=18.5. The maximum value is 808080. Is 808080 an outlier?

  1. Upper fence =37.5+1.5×18.5=37.5+27.75=65.25= 37.5 + 1.5 \times 18.5 = 37.5 + 27.75 = 65.25=37.5+1.5×18.5=37.5+27.75=65.25.
  2. Since 80>65.2580 > 65.2580>65.25, the value 808080 is an outlier.
  3. On a boxplot, the upper whisker would stop at 65.2565.2565.25 (or the largest non-outlier value), and 808080 would be plotted as an individual dot.

4. Comparing distributions and two-way tables

Parallel boxplots (drawn on the same scale) allow direct comparison of centre, spread, and shape.

When comparing, comment on:

  • Centre: which group has a higher/lower median?
  • Spread: which group has a larger/smaller IQR?
  • Shape: is either distribution symmetric or skewed?
  • Outliers: does either group have unusual values?

A two-way table organises categorical data by two variables. It shows frequencies and can reveal associations.

Worked example 3 Two-way table

A survey of 100100100 students asks about pet ownership and gender.

Owns a petNo petTotal
Female321850
Male282250
Total6040100
  1. P(owns a pet)=60100=0.6P(\text{owns a pet}) = \dfrac{60}{100} = 0.6P(owns a pet)=10060​=0.6.
  2. P(owns a pet∣female)=3250=0.64P(\text{owns a pet} \mid \text{female}) = \dfrac{32}{50} = 0.64P(owns a pet∣female)=5032​=0.64.
  3. P(owns a pet∣male)=2850=0.56P(\text{owns a pet} \mid \text{male}) = \dfrac{28}{50} = 0.56P(owns a pet∣male)=5028​=0.56.
  4. Females are slightly more likely to own a pet in this sample, but the difference is small.

5. Analysing statistical claims

When evaluating a statistical claim, consider:

  • Sample size: is it large enough to be reliable?
  • Sampling method: is it random and representative, or biased?
  • Measures used: does the claim use mean, median, or mode? Which is most appropriate?
  • Visualisation tricks: are axes truncated or scales distorted?
  • Causation vs correlation: does the claim imply cause when only association is shown?
Worked example 4 Spotting bias

A company claims “9 out of 10 dentists recommend our toothpaste.” What questions should you ask?

  1. How were the dentists selected? (If they were paid by the company, the sample is biased.)
  2. What was the exact question? (“Do you recommend brushing teeth?” is different from “Do you recommend this specific brand?”)
  3. How large was the sample? (101010 dentists is too small to generalise.)
  4. Were dentists who disagreed excluded from the report?

Practice

Fluency

Tier 1: basic skills

    1. Find the five-number summary for: 5,8,12,15,18,20,22,25,305, 8, 12, 15, 18, 20, 22, 25, 305,8,12,15,18,20,22,25,30.
    2. Calculate the IQR for the data in Q1.
    3. A data set has Q1=10Q_1 = 10Q1​=10, Q3=30Q_3 = 30Q3​=30. Find the upper and lower fences for outlier detection.
    4. The five-number summary for a data set is: 2,8,14,20,282, 8, 14, 20, 282,8,14,20,28. Sketch a boxplot.
    5. A boxplot has its median closer to Q1Q_1Q1​ than to Q3Q_3Q3​. Is the distribution positively or negatively skewed?
    6. In a two-way table, 404040 out of 100100100 people surveyed are left-handed. What proportion is left-handed?
    7. A data set has Q1=25Q_1 = 25Q1​=25 and IQR =12= 12=12. What is Q3Q_3Q3​?
    8. True or false: the median always lies exactly in the centre of the box in a boxplot.
Reasoning

Tier 2: mixed practice

    1. The heights (cm) of 151515 students are: 152,155,158,160,162,164,165,167,170,172,175,178,180,195,198152, 155, 158, 160, 162, 164, 165, 167, 170, 172, 175, 178, 180, 195, 198152,155,158,160,162,164,165,167,170,172,175,178,180,195,198. Find the five-number summary, identify any outliers, and sketch a boxplot.

    2. Two classes have the following five-number summaries for a maths test (out of 505050):

      • Class X: 15,28,35,40,4815, 28, 35, 40, 4815,28,35,40,48.
      • Class Y: 20,25,30,42,5020, 25, 30, 42, 5020,25,30,42,50. Draw parallel boxplots and write two comparison statements.
    3. A two-way table shows transport mode and year level:

      BusCarWalkTotal
      Year 930201060
      Year 1015351060
      Total455520120

      Find P(bus∣Year 9)P(\text{bus} \mid \text{Year 9})P(bus∣Year 9) and P(bus∣Year 10)P(\text{bus} \mid \text{Year 10})P(bus∣Year 10). What do you notice?

    4. A newspaper reports “Average house prices rose by 20%20\%20%.” Explain why the median might be a better measure than the mean for house prices, and how a few expensive sales could distort the mean.

    5. A data set has values: 3,5,7,8,10,12,14,15,503, 5, 7, 8, 10, 12, 14, 15, 503,5,7,8,10,12,14,15,50. Show that 505050 is an outlier using the 1.5×IQR1.5 \times \text{IQR}1.5×IQR rule.

Reasoning

Tier 3: explain and apply

    1. A study claims students who eat breakfast score higher on tests. The data shows a correlation. Explain why this does not prove causation and suggest a confounding variable.
    2. Two factories produce bolts. Factory A: median length 50.250.250.2 mm, IQR =0.8= 0.8=0.8 mm. Factory B: median length 50.050.050.0 mm, IQR =2.5= 2.5=2.5 mm. Which factory produces more consistent bolts? Which is closer to the target of 50.050.050.0 mm? Discuss trade-offs.
    3. A survey of 200200200 people finds that 60%60\%60% support a new policy. The survey was conducted online and only advertised on one social media platform. Identify two sources of potential bias and explain how each could affect the results.
    4. Explain the difference between the range and the IQR as measures of spread. Give an example where the range is misleading but the IQR is not.

Challenge

Reasoning

Harder reasoning

    1. A data set of 202020 values has Q1=15Q_1 = 15Q1​=15, median =22= 22=22, Q3=30Q_3 = 30Q3​=30. If the value 606060 is added to the data set, explain qualitatively how each part of the five-number summary might change and whether 606060 would be classified as an outlier.
    2. Two data sets both have median =50= 50=50 and IQR =10= 10=10, but one is symmetric and the other is positively skewed. Sketch boxplots for both and explain how the whisker lengths differ.
    3. A researcher collects data from 500500500 people and presents a boxplot showing no outliers. A critic argues that with 500500500 data points, some outliers are expected. Evaluate this argument.
    4. Design a two-way table for 808080 students that shows an association between “plays sport” and “gets more than 888 hours of sleep.” Then modify it so there is no association. Explain the difference.
Year 10 Mathematics study companion | Practice