Year 9 Mathematics | Victorian Curriculum 2.0
Data analysis and distributions
Topic 14 | Statistics & Probability | Answer key

Tier 1

    1. (a) positively skewed, (b) bimodal, (c) symmetric, (d) negatively skewed.
    2. Mean =858=10.625= \dfrac{85}{8} = 10.625=885​=10.625. Median =7+72=7= \dfrac{7+7}{2} = 7=27+7​=7. Range =40−3=37= 40 - 3 = 37=40−3=37.
    3. Without 404040: mean =457≈6.43= \dfrac{45}{7} \approx 6.43=745​≈6.43, median =7= 7=7, range =9−3=6= 9 - 3 = 6=9−3=6. The range changed most (from 373737 to 666), followed by the mean (from 10.62510.62510.625 to 6.436.436.43). The median barely changed.
    4. The median (777) better represents the typical value because the outlier (404040) inflates the mean.
    5. Systematic sampling.
    6. People at a train station are more likely to prefer trains, so train travel would be overrepresented. People who drive, cycle, or walk are less likely to be at the station.
    7. Stem | Leaf: 111 | 4  84\;848, 222 | 2  5  72\;5\;7257, 333 | 1  3  6  81\;3\;6\;81368, 444 | 222.
    8. A back-to-back stem-and-leaf plot or side-by-side box plots would both work well for comparing two numerical distributions.
    9. The mean is higher than the median in a positively skewed distribution (the tail of high values pulls the mean up).
    10. The bars rise then fall with a single peak, so the distribution is approximately symmetric (or very slightly positively skewed if the tail on the right is longer).

Tier 2

    1. Back-to-back stem-and-leaf: Stem 000: Class A leaves 8  7  6  5  5  4  3  3  28\;7\;6\;5\;5\;4\;3\;3\;2876554332 | Class B leaves 1  2  4  5  5  6  6  7  7  81\;2\;4\;5\;5\;6\;6\;7\;7\;81245566778. Stem 111: Class A leaf 222 | Class B (none). Class A has a wider spread (range 101010 vs 777) with an outlier at 121212. Class B is more tightly clustered. Medians are similar (A: 555, B: 5.55.55.5).
    2. Positively skewed. The mean (242424) is greater than the median (181818), which indicates a tail of high values pulling the mean up.
    3. Total =2000= 2000=2000. Proportions: Year 7 =8002000×200=80= \dfrac{800}{2000} \times 200 = 80=2000800​×200=80, Year 8 =7002000×200=70= \dfrac{700}{2000} \times 200 = 70=2000700​×200=70, Year 9 =5002000×200=50= \dfrac{500}{2000} \times 200 = 50=2000500​×200=50.
    4. House prices are typically positively skewed (a few very expensive houses push the mean up). The median gives a better sense of what a “typical” house costs because it is not affected by the extreme values.
    5. Morning: mean =46= 46=46, median =46= 46=46, range =8= 8=8. Afternoon: mean ≈48.9\approx 48.9≈48.9, median =48= 48=48, range =15= 15=15. The afternoon shift is slightly slower on average and has more variation, possibly due to the outlier at 585858.
    6. Example: heights of a mixed group of adult men and women. The two peaks correspond to the average female height and the average male height — two overlapping subpopulations create bimodality.
    7. A sample of 101010 friends is a convenience sample that is not random. Friends tend to share interests, backgrounds, and demographics, so the sample is likely biased and not representative of the whole school. A random or stratified sample would be more reliable.
    8. When comparing two distributions, comment on: (i) centre (mean or median), (ii) spread (range or IQR), and (iii) shape (symmetric, skewed, or bimodal). Also note any outliers.

Tier 3

    1. The CEO’s salary of $800,000 pulls the mean up. If the other 191919 earn an average of $60,000, the total is 19×60 000+800 000=1 940 00019 \times 60\,000 + 800\,000 = 1\,940\,00019×60000+800000=1940000, giving mean =1 940 00020=97 000= \dfrac{1\,940\,000}{20} = 97\,000=201940000​=97000 dollars. The “average” (mean) is close to $95,000 but the median is around $60,000. Most employees earn far less than the reported average. The company uses the mean to create a misleading impression.
    2. Question: “Do Year 9 students spend more time per week on homework than Year 7 students?” Sampling: stratified random sample of 303030 students from each year level. Variables: year level (categorical), homework hours per week (continuous). Display: side-by-side box plots or back-to-back stem-and-leaf plot. Calculate mean and median for each group and compare.
    3. Example: Histogram 1 is symmetric (bell-shaped). Histogram 2 is bimodal with one peak below the mean and one above. Both can have the same mean (balanced around the centre) and the same range (same min and max) but very different shapes. The bimodal histogram has more data at the extremes and less near the centre.
    4. Original sum =20×30=600= 20 \times 30 = 600=20×30=600. New sum =600+80=680= 600 + 80 = 680=600+80=680. New mean =68021≈32.4= \dfrac{680}{21} \approx 32.4=21680​≈32.4. The mean increased by 2.42.42.4. The median changes from the average of the 10th and 11th values to the 11th value — it might increase by only 000 or 111, making it more stable and representative.
    5. A population is the entire group of interest; a sample is a subset selected for study. Example: surveying every one of Australia’s ≈26\approx 26≈26 million residents about exercise habits is impractical due to cost and time. A representative sample of a few thousand provides useful estimates instead.

Challenge

    1. Combined sum =nxˉA+nxˉB= n\bar{x}_A + n\bar{x}_B=nxˉA​+nxˉB​. Combined count =2n= 2n=2n. Combined mean =nxˉA+nxˉB2n=xˉA+xˉB2= \dfrac{n\bar{x}_A + n\bar{x}_B}{2n} = \dfrac{\bar{x}_A + \bar{x}_B}{2}=2nnxˉA​+nxˉB​​=2xˉA​+xˉB​​. For different sizes: combined mean =nAxˉA+nBxˉBnA+nB= \dfrac{n_A \bar{x}_A + n_B \bar{x}_B}{n_A + n_B}=nA​+nB​nA​xˉA​+nB​xˉB​​, which is a weighted average of the two means.
    2. (a) Mean increases by ccc (every value increases by ccc, so the sum increases by ncncnc, and the mean by ccc). (b) Median increases by ccc (the middle value shifts by ccc). (c) Range is unchanged (max and min both increase by ccc, so their difference is the same). (d) Standard deviation is unchanged (deviations from the mean are the same since both each value and the mean shift by ccc).
    3. One possible set: 30,35,38,40,44,46,50,55,62,10030, 35, 38, 40, 44, 46, 50, 55, 62, 10030,35,38,40,44,46,50,55,62,100. Sum =500= 500=500, mean =50= 50=50. Median =44+462=45= \dfrac{44+46}{2} = 45=244+46​=45. The high value 100100100 creates a right tail, giving positive skew. Mean >>> median, confirming positive skewness.
    4. Proportions: Year 7 =3501200×120=35= \dfrac{350}{1200} \times 120 = 35=1200350​×120=35, Year 8 =3201200×120=32= \dfrac{320}{1200} \times 120 = 32=1200320​×120=32, Year 9 =2801200×120=28= \dfrac{280}{1200} \times 120 = 28=1200280​×120=28, Year 10 =2501200×120=25= \dfrac{250}{1200} \times 120 = 25=1200250​×120=25. The student who scored 000 was absent, not genuinely scoring zero. This value should be treated as missing data and excluded from analysis (or the student should be resurveyed). Including it would unfairly lower Year 10’s statistics and misrepresent that year level’s performance.
Year 9 Mathematics study companion | Answer key