Year 10 Mathematics | Victorian Curriculum 2.0
Boxplots and distributions
Topic 14 | Statistics & Probability | Answer key

Tier 1

    1. Min =5= 5=5, Q1=10Q_1 = 10Q1​=10 (median of 5,8,12,155, 8, 12, 155,8,12,15: average of 888 and 121212), median =18= 18=18 (5th value), Q3=23.5Q_3 = 23.5Q3​=23.5 (median of 20,22,25,3020, 22, 25, 3020,22,25,30: average of 222222 and 252525), max =30= 30=30.
    2. IQR =Q3−Q1=23.5−10=13.5= Q_3 - Q_1 = 23.5 - 10 = 13.5=Q3​−Q1​=23.5−10=13.5.
    3. IQR =30−10=20= 30 - 10 = 20=30−10=20. Lower fence =10−1.5×20=10−30=−20= 10 - 1.5 \times 20 = 10 - 30 = -20=10−1.5×20=10−30=−20. Upper fence =30+1.5×20=30+30=60= 30 + 1.5 \times 20 = 30 + 30 = 60=30+1.5×20=30+30=60.
    4. Boxplot with whisker at 222, box from 888 to 202020, median line at 141414, whisker to 282828.
    5. Positively skewed (the data is more spread out above the median than below).
    6. 40100=0.4\dfrac{40}{100} = 0.410040​=0.4 or 40%40\%40%.
    7. Q3=Q1+IQR=25+12=37Q_3 = Q_1 + \text{IQR} = 25 + 12 = 37Q3​=Q1​+IQR=25+12=37.
    8. False. The median is only centred if the distribution is symmetric. In a skewed distribution, the median is closer to one quartile.

Tier 2

    1. Five-number summary: min =152= 152=152, Q1=160Q_1 = 160Q1​=160 (median of positions 1—7), median =167= 167=167 (8th value), Q3=178Q_3 = 178Q3​=178 (median of positions 9—15), max =198= 198=198. IQR =178−160=18= 178 - 160 = 18=178−160=18. Upper fence =178+1.5×18=178+27=205= 178 + 1.5 \times 18 = 178 + 27 = 205=178+1.5×18=178+27=205. Lower fence =160−27=133= 160 - 27 = 133=160−27=133. Both 195195195 and 198198198 are below 205205205, so there are no outliers.
    2. Class X has a higher median (353535 vs 303030) and a smaller IQR (40−28=1240 - 28 = 1240−28=12 vs 42−25=1742 - 25 = 1742−25=17). Class X performed better overall and more consistently. Class Y has a higher maximum (505050) but also a lower minimum (202020 vs 151515 — actually Class Y min is higher). Both classes have similar ranges.
    3. P(bus∣Year 9)=3060=0.5P(\text{bus} \mid \text{Year 9}) = \dfrac{30}{60} = 0.5P(bus∣Year 9)=6030​=0.5. P(bus∣Year 10)=1560=0.25P(\text{bus} \mid \text{Year 10}) = \dfrac{15}{60} = 0.25P(bus∣Year 10)=6015​=0.25. Year 9 students are twice as likely to catch the bus as Year 10 students.
    4. House prices are often positively skewed: most houses cluster around a typical value, but a few very expensive properties pull the mean upward. The median is resistant to extreme values and better represents the “typical” house price. A few multi-million-dollar sales can raise the mean significantly without affecting most buyers’ experience.
    5. Ordered: 3,5,7,8,10,12,14,15,503, 5, 7, 8, 10, 12, 14, 15, 503,5,7,8,10,12,14,15,50. Q1=5+72=6Q_1 = \dfrac{5 + 7}{2} = 6Q1​=25+7​=6. Q3=14+152=14.5Q_3 = \dfrac{14 + 15}{2} = 14.5Q3​=214+15​=14.5. IQR =14.5−6=8.5= 14.5 - 6 = 8.5=14.5−6=8.5. Upper fence =14.5+1.5×8.5=14.5+12.75=27.25= 14.5 + 1.5 \times 8.5 = 14.5 + 12.75 = 27.25=14.5+1.5×8.5=14.5+12.75=27.25. Since 50>27.2550 > 27.2550>27.25, the value 505050 is an outlier.

Tier 3

    1. Correlation does not prove causation because a third variable could explain both. For example, students from families with higher socioeconomic status may be more likely to eat breakfast and have access to tutoring, quiet study spaces, and parental support. The breakfast itself may not cause higher scores; the underlying variable (family resources) may drive both outcomes.
    2. Factory A is more consistent (IQR =0.8= 0.8=0.8 mm vs 2.52.52.5 mm). Factory B has a median closer to the target of 50.050.050.0 mm. Trade-off: Factory A produces bolts of very uniform length but slightly above target; Factory B hits the target on average but with much greater variability. If precision matters (e.g. safety-critical components), Factory A is preferable despite the slight offset, which could be corrected by recalibrating.
    3. Sources of bias: (i) Self-selection bias — only people who chose to respond are counted; those with strong opinions may be overrepresented. (ii) Platform bias — users of that particular social media platform may not be representative of the general population (e.g. younger demographic, specific political leanings). Both could overestimate or underestimate true support depending on the platform’s user base.
    4. Range uses only the two most extreme values, so a single outlier can make the range very large. IQR uses the middle 50%50\%50% and is resistant to outliers. Example: {10,12,14,15,16,18,100}\{10, 12, 14, 15, 16, 18, 100\}{10,12,14,15,16,18,100}. Range =100−10=90= 100 - 10 = 90=100−10=90 (misleadingly large). IQR =18−12=6= 18 - 12 = 6=18−12=6 (reflects the actual spread of most data).

Challenge

    1. Adding 606060: the minimum stays at 151515 (or whatever it was), the maximum becomes 606060. IQR =30−15=15= 30 - 15 = 15=30−15=15. Upper fence =30+1.5×15=52.5= 30 + 1.5 \times 15 = 52.5=30+1.5×15=52.5. Since 60>52.560 > 52.560>52.5, yes, 606060 is an outlier. The median may shift slightly upward (from the average of the 10th and 11th values to the 11th value of the new 21-value set). Q1Q_1Q1​ and Q3Q_3Q3​ may shift slightly but the effect is small.
    2. Symmetric: both whiskers are approximately equal length, extending evenly from the box. Positively skewed: the right whisker is much longer than the left; data extends further above Q3Q_3Q3​ than below Q1Q_1Q1​. Both have the same box size (IQR =10= 10=10) and median (505050), but the skewed version has the median closer to Q1Q_1Q1​.
    3. The argument has some merit: in a normal distribution, about 0.7%0.7\%0.7% of values lie beyond Q1−1.5×IQRQ_1 - 1.5 \times \text{IQR}Q1​−1.5×IQR or Q3+1.5×IQRQ_3 + 1.5 \times \text{IQR}Q3​+1.5×IQR, so we might expect roughly 0.007×500≈30.007 \times 500 \approx 30.007×500≈3—444 outliers. However, if the data is truly free of measurement errors and follows a tight distribution, it is possible (though unlikely) to have no outliers. The researcher should report the distribution shape and explain why outliers are absent.
    4. With association: Sport-yes/Sleep-yes =30= 30=30, Sport-yes/Sleep-no =10= 10=10, Sport-no/Sleep-yes =15= 15=15, Sport-no/Sleep-no =25= 25=25. P(sleep∣sport)=3040=0.75P(\text{sleep} \mid \text{sport}) = \dfrac{30}{40} = 0.75P(sleep∣sport)=4030​=0.75, P(sleep∣no sport)=1540=0.375P(\text{sleep} \mid \text{no sport}) = \dfrac{15}{40} = 0.375P(sleep∣no sport)=4015​=0.375. These differ, showing an association. No association: Sport-yes/Sleep-yes =22.5= 22.5=22.5, Sport-yes/Sleep-no =17.5= 17.5=17.5, Sport-no/Sleep-yes =22.5= 22.5=22.5, Sport-no/Sleep-no =17.5= 17.5=17.5 (using whole numbers: 23,17,22,1823, 17, 22, 1823,17,22,18). Now P(sleep∣sport)≈P(sleep∣no sport)≈0.5625P(\text{sleep} \mid \text{sport}) \approx P(\text{sleep} \mid \text{no sport}) \approx 0.5625P(sleep∣sport)≈P(sleep∣no sport)≈0.5625, so the variables are approximately independent.
Year 10 Mathematics study companion | Answer key