Year 9 Mathematics | Victorian Curriculum 2.0
Data analysis and distributions
Topic 14 | Statistics & Probability | Practice

What you will learn

  • construct and interpret back-to-back stem-and-leaf plots and comparative histograms,
  • describe the shape of a distribution: symmetric, positively skewed, negatively skewed, or bimodal,
  • explain the effect of outliers on the mean, median, and range,
  • identify and compare sampling methods (random, systematic, stratified, convenience) and recognise bias,
  • choose appropriate data displays for different data types,
  • plan and conduct a statistical investigation.
Why compare distributions?

A single set of summary statistics — mean, median, range — tells part of the story, but comparing two or more groups side by side reveals patterns that individual summaries hide. Are Year 9 students taller than Year 7 students? Do morning shifts outperform afternoon shifts? Comparative displays let you see differences in centre, spread, and shape at a glance, so you can draw evidence-based conclusions.

Where you'll see this
  • Sport: comparing player statistics across teams or seasons using side-by-side box plots and histograms.
  • Public health: analysing the distribution of infection rates in different regions to allocate resources.
  • Market research: survey data is split by age group or location; skewness and outliers guide product decisions.
  • Education: NAPLAN results compared across states use distribution shape and centre to measure progress.
Worked example 0 Real-world example: comparing test scores

Two classes sit the same maths test. Class A scores: 52,58,61,63,65,67,68,70,72,7452, 58, 61, 63, 65, 67, 68, 70, 72, 7452,58,61,63,65,67,68,70,72,74. Class B scores: 40,55,60,62,64,66,68,70,85,9040, 55, 60, 62, 64, 66, 68, 70, 85, 9040,55,60,62,64,66,68,70,85,90.

  1. Class A mean =65010=65= \dfrac{650}{10} = 65=10650​=65. Class B mean =66010=66= \dfrac{660}{10} = 66=10660​=66.
  2. Class A median =65+672=66= \dfrac{65 + 67}{2} = 66=265+67​=66. Class B median =64+662=65= \dfrac{64 + 66}{2} = 65=264+66​=65.
  3. Class A range =74−52=22= 74 - 52 = 22=74−52=22. Class B range =90−40=50= 90 - 40 = 50=90−40=50.
  4. The means are nearly equal, but Class B has much greater spread and potential outliers (404040 and 909090).

Key idea: similar centres can mask very different spreads — always look at both.

1. Back-to-back stem-and-leaf plots

A back-to-back stem-and-leaf plot displays two data sets sharing a common stem. The leaves for one group extend to the left, and the leaves for the other extend to the right.

Fitness groupControl groupLeafStemLeaf8 6 254 6 8 88 6 4 2 062 5 6 86 4 2 070 2 4 6 8280 2 4 6 892 4Key: 2 | 6 | 5 means 62 (fitness) and 65 (control)
Back-to-back stem-and-leaf plot comparing pulse rates (beats per minute) for a fitness group and a control group.

Reading the plot: the fitness group’s pulse rates cluster in the 60s, while the control group’s data is more spread out and shifted higher.

Worked example 1 Constructing a back-to-back stem-and-leaf plot

Group X times (seconds): 23,25,28,31,34,35,37,42,4523, 25, 28, 31, 34, 35, 37, 42, 4523,25,28,31,34,35,37,42,45. Group Y times: 20,22,26,29,30,33,38,40,41,4820, 22, 26, 29, 30, 33, 38, 40, 41, 4820,22,26,29,30,33,38,40,41,48.

  1. Stems are the tens digits: 2,3,42, 3, 42,3,4.
  2. Write Group X leaves to the left (in descending order away from the stem) and Group Y leaves to the right (in ascending order).
Group XStemGroup Y
8 5 320 2 6 9
7 5 4 130 3 8
5 240 1 8
  1. Group X median =34= 34=34, Group Y median =31.5= 31.5=31.5. Group X is slightly slower on average.

2. Shape of distributions

When you look at a histogram or stem-and-leaf plot, describe its shape:

  • Symmetric: roughly the same on both sides of the centre (mean ≈\approx≈ median).
  • Positively skewed (right-skewed): a long tail to the right. Most data is on the left. Mean >>> median.
  • Negatively skewed (left-skewed): a long tail to the left. Most data is on the right. Mean <<< median.
  • Bimodal: two distinct peaks, suggesting two sub-groups in the data.
Skew direction = tail direction

The skew is named for the direction of the tail, not the peak. A distribution with most values clustered low and a few very high values is positively skewed — the tail stretches to the positive (right) end.

Worked example 2 Identifying shape

A histogram of house prices in a suburb shows many homes in the $400,000–$600,000 range, fewer in the $600,000–$800,000 range, and a small number above $1,000,000.

  1. The bulk of the data is on the left.
  2. A long tail extends to the right (high-priced homes).
  3. The distribution is positively skewed.
  4. The mean will be pulled higher than the median by the expensive homes, so the median is a better measure of centre for this data.

3. Effect of outliers

An outlier is a data value that lies well outside the main body of the data.

  • Mean: strongly affected — one extreme value can pull the mean significantly.
  • Median: resistant — it depends only on the middle value(s), so one outlier barely changes it.
  • Range: strongly affected — it uses only the maximum and minimum.
Worked example 3 Outlier impact

Data set: 12,14,15,15,16,17,18,5012, 14, 15, 15, 16, 17, 18, 5012,14,15,15,16,17,18,50.

  1. Mean =1578=19.625= \dfrac{157}{8} = 19.625=8157​=19.625. Without the outlier 505050: mean =1077≈15.3= \dfrac{107}{7} \approx 15.3=7107​≈15.3.
  2. Median =15+162=15.5= \dfrac{15 + 16}{2} = 15.5=215+16​=15.5. Without 505050: median =15= 15=15. Barely changed.
  3. Range =50−12=38= 50 - 12 = 38=50−12=38. Without 505050: range =18−12=6= 18 - 12 = 6=18−12=6. Dramatically reduced.

When outliers are present, the median better represents the typical value.

4. Sampling methods and bias

When the population is too large to survey entirely, we take a sample. The method of sampling affects the reliability of conclusions.

MethodDescriptionStrengthsWeaknesses
Simple randomEvery member has an equal chance of selectionUnbiased, representativeNeeds a complete list of the population
SystematicSelect every kkk-th member from a listEasy to implementCan miss patterns if the list has a hidden cycle
StratifiedDivide into subgroups (strata), sample proportionally from eachEnsures all subgroups are representedRequires knowledge of subgroup sizes
ConvenienceChoose whoever is easiest to reachQuick and cheapOften biased — not representative
Bias traps

Bias occurs when the sampling method systematically favours certain outcomes. Surveying shoppers at a luxury mall about average income will overestimate the population mean. Always ask: “Who is missing from this sample?”

Worked example 4 Identifying bias

A school surveys students about favourite sports by asking only those at basketball training.

  1. The sample is convenience — it selects students already interested in basketball.
  2. Basketball is likely to be overrepresented; other sports underrepresented.
  3. A better approach: take a stratified random sample from each year level to capture the full school population.

5. Choosing displays and planning investigations

Different data types suit different displays:

  • Categorical data: bar chart, pie chart.
  • Numerical (discrete): dot plot, bar chart.
  • Numerical (continuous): histogram, stem-and-leaf plot, box plot.
  • Comparing two groups: back-to-back stem-and-leaf, side-by-side box plots, comparative histograms.

A well-planned statistical investigation follows these steps:

  1. Pose a question that can be answered with data.
  2. Plan data collection — choose sampling method, sample size, and variables.
  3. Collect data systematically.
  4. Analyse — calculate summary statistics, construct appropriate displays.
  5. Conclude — interpret results, acknowledge limitations.

Practice

Fluency

Tier 1: basic skills

    1. Classify each distribution shape: (a) tail on the right, (b) two peaks, (c) mirror-image shape, (d) tail on the left.
    2. Data: 3,5,6,7,7,8,9,403, 5, 6, 7, 7, 8, 9, 403,5,6,7,7,8,9,40. Find the mean, median, and range.
    3. Remove the outlier from Q2 and recalculate mean, median, and range. Which statistic changed most?
    4. For the data in Q2, which measure of centre better represents the typical value? Explain.
    5. A sample is taken by selecting every 10th student on a school roll. Name this sampling method.
    6. A survey asks 50 people at a train station about their preferred mode of transport. Explain why this sample might be biased.
    7. Construct a stem-and-leaf plot for: 14,18,22,25,27,31,33,36,38,4214, 18, 22, 25, 27, 31, 33, 36, 38, 4214,18,22,25,27,31,33,36,38,42.
    8. What type of display would you use to compare the heights of Year 9 boys and Year 9 girls?
    9. State whether the mean or median is higher for a positively skewed distribution.
    10. A histogram has bars of heights 2,5,8,6,3,12, 5, 8, 6, 3, 12,5,8,6,3,1. Describe the shape of this distribution.
Reasoning

Tier 2: mixed practice

    1. Two classes recorded the number of books read last term. Class A: 2,3,3,4,5,5,6,7,8,122, 3, 3, 4, 5, 5, 6, 7, 8, 122,3,3,4,5,5,6,7,8,12. Class B: 1,2,4,5,5,6,6,7,7,81, 2, 4, 5, 5, 6, 6, 7, 7, 81,2,4,5,5,6,6,7,7,8. Construct a back-to-back stem-and-leaf plot and compare the distributions.
    2. A data set has mean 242424 and median 181818. Is the distribution likely symmetric, positively skewed, or negatively skewed? Explain.
    3. A researcher wants to survey 200200200 out of 200020002000 students about study habits. The school has 800800800 Year 7, 700700700 Year 8, and 500500500 Year 9 students. Calculate how many students should be sampled from each year level using stratified sampling.
    4. Explain why the median is preferred over the mean when reporting typical house prices.
    5. A factory records the time (in seconds) to assemble a part. Morning shift: 42,44,45,46,47,48,5042, 44, 45, 46, 47, 48, 5042,44,45,46,47,48,50. Afternoon shift: 43,45,46,48,50,52,5843, 45, 46, 48, 50, 52, 5843,45,46,48,50,52,58. Compare using mean, median, and range.
    6. Describe a situation where a bimodal distribution would be expected. Explain what causes the two peaks.
    7. A student claims: “My sample of 10 friends is representative of the whole school.” Critique this claim.
    8. State three features you should always comment on when comparing two distributions.
Reasoning

Tier 3: explain and apply

    1. A company reports that the “average salary” is $95,000. The CEO earns $800,000 and the other 191919 employees earn between $50,000 and $70,000 each. Explain how the company’s claim could be technically true but misleading.
    2. Design a statistical investigation to determine whether Year 9 students spend more time on homework than Year 7 students. State the question, sampling method, variables, and how you would display the results.
    3. Two histograms have the same mean and range, but different shapes. Sketch two possible histograms and explain how this is possible.
    4. A data set of 202020 values has mean 303030. An extra value of 808080 is added. Calculate the new mean and explain why the median might be a better summary.
    5. Explain the difference between a population and a sample. Give an example where surveying the whole population is impractical.

Challenge

Reasoning

Harder reasoning

    1. Two data sets each have nnn values. Set A has mean xˉA\bar{x}_AxˉA​ and set B has mean xˉB\bar{x}_BxˉB​. If the two sets are combined, show that the combined mean is nxˉA+nxˉB2n\dfrac{n\bar{x}_A + n\bar{x}_B}{2n}2nnxˉA​+nxˉB​​. What happens if the sets have different sizes nAn_AnA​ and nBn_BnB​?
    2. A researcher adds a constant ccc to every value in a data set. How does this affect (a) the mean, (b) the median, (c) the range, (d) the standard deviation? Justify each answer.
    3. Construct a data set of 101010 values where the mean is 505050, the median is 454545, and the distribution is positively skewed. Verify your answer.
    4. A school of 120012001200 students is surveyed using stratified sampling by year level. Year 7: 350350350, Year 8: 320320320, Year 9: 280280280, Year 10: 250250250. If 120120120 students are to be sampled, calculate the number from each year level. One Year 10 student in the sample scored 000 on the test (absent). Discuss how this outlier should be handled.
Year 9 Mathematics study companion | Practice