Topic 15 | Statistics & Probability

Sampling & statistical investigations

Year 8 core: populations vs samples, sampling techniques and bias, comparing samples of the same size, and planning fair statistical investigations.

50-65 min Printable practice Answer key
How to use this page

Read the explanation, work through the examples, then complete the core practice before printing.

Study progress: Not started

Start here: why we can’t always ask everyone

Imagine you want to know the average height of all Year 8 students in Australia. There are hundreds of thousands of them. You can’t measure every single one — it would take years.

Instead, you pick a group — maybe 200200 Year 8 students chosen carefully — measure their heights, and use that to estimate the whole-country average.

That group is called a sample. The whole country of Year 8 students is called the population. This whole topic is about: how do you pick a good sample, and how much can you trust what it tells you?

What you will learn

1. Population vs sample

The population is the entire group you want information about. A sample is a subset of the population you actually collect data from.

A census collects data from every member of the population. A sample survey collects from a subset, usually because a census would be too costly or slow.

Worked example E Very easy: name the population and the sample

A school surveys 5050 students from one Year 8 class to ask about screen time. What is the population? The sample?

  • Population: all Year 8 students the school wants to understand (maybe the whole year level, maybe the whole school — depends on the question).
  • Sample: the 5050 students who were actually surveyed.

2. Sampling methods

Common sampling methods

Simple random

Every member of the population has an equal chance of being chosen. Use a random number generator or drawing from a hat.

Systematic

Choose every kkth member after a random start. E.g. every 1010th name on the roll.

Stratified

Divide the population into groups (strata) by a feature (year level, gender, region), then sample within each group in proportion.

Cluster

Divide into natural clusters (e.g. whole classes), randomly pick clusters, survey everyone in the chosen clusters.

Convenience

Ask whoever is easy to reach. Fast but usually biased.

Quota / judgement

Fill preset numbers of people in predetermined categories - subjective and biased.

3. Sources of bias

A sample is biased when certain members of the population are systematically more or less likely to be sampled. Common traps:

Worked example 1 Spot the bias

A school wants to know how many students favour longer recess. A survey is handed out at the canteen during lunch and only those who hand it back are counted.

Biases:

  • Selection: lunch-goers are over-represented.
  • Non-response: students in favour may be more likely to respond.

A better method: stratify by year level, pick a random sample from each stratum, follow up non-responders.

4. Sample size and variation

Different random samples of the same size will give slightly different results - but bigger samples are more stable. Doubling the sample size roughly halves the random variation.

Worked example 2 Variation in small samples

A school of 500500 students has 60%60\% in favour of a uniform change. Three separate random samples of 1010 students each give percentages 70%70\%, 50%50\%, 80%80\% - wide variation. Three samples of 100100 might give 58%58\%, 62%62\%, 59%59\% - much closer to the true 60%60\%.

5. Planning an investigation

A basic workflow:

  1. Question: what do we want to find out?
  2. Population: whom does it apply to?
  3. Sampling plan: method, size, how to pick.
  4. Data collection: tool, timing.
  5. Analysis: summary statistics, displays, comparisons.
  6. Report: findings with uncertainty acknowledged.

Practice: Year 8 core

Fluency

Population, sample, census

    1. A school has 820820 students. The Principal surveys every student in the school. Census or sample?
    2. A shop owner asks every tenth customer about satisfaction. Sampling method?
    3. A researcher wants to know heights of all Australian 1313-year-olds. Census or sample? Why?
    4. A market researcher surveys only people in shopping centres. Name one likely bias.
    5. State the population and suggest a suitable sample for: “What proportion of Year 8 students at our school ride a bike to school?”
Fluency

Sampling methods and bias

    1. Which sampling method divides the population into strata and samples from each? Stratified, cluster, or convenience?
    2. What type of bias arises from a survey question like “Do you agree that more homework is harmful?”
    3. Explain why a phone-in survey is usually biased.
    4. A school has 300300 Year 7, 280280 Year 8, 260260 Year 9 students. Using stratified sampling with a 10%10\% sample, how many from each year?
Reasoning

Explain and spot the mistake

    1. Sam claims “a sample of 2020 is enough to be certain about a school of 500500”. Is Sam correct? Explain.
    2. Explain why two random samples of the same size can give different summary statistics.
    3. Write an unbiased version of this question: “Don’t you agree that our coach is doing a great job?”
    4. A newspaper reports a poll of 500500 readers showing 70%70\% support a policy. What caveats should be stated before trusting the result?
Problem solving

Plan and analyse

    1. Design a statistical investigation to answer: “How much sleep do Year 8 students at our school get on a school night?” Include population, sample method, sample size, and a data display.
    2. A school has 10001000 students. You take four random samples of 5050 and count those who cycle: 18,22,16,2118, 22, 16, 21. Calculate the mean percentage and comment on variability.
    3. A factory tests 1%1\% of its daily output of 8000080\,000 screws. Is 800800 a large enough sample? What factors matter?
    4. Two weather stations collect rainfall each day for two weeks. Station A records 1010 days; Station B records 1414 days. Which would you trust more for “average daily rainfall this fortnight”?
Answers

Answer key

Attempt the practice first. When you're ready to check, expand the answers below.

Show the full answer key

Year 8 core - answers

Fluency

Population, sample, census

    1. Census (everyone in the population).
    2. Systematic.
    3. Sample. Reason: census of every 1313-year-old in Australia is impractical and costly.
    4. Selection bias toward shoppers; non-shoppers are under-represented.
    5. Population: all Year 8 students at our school. Sample: simple random sample of at least 3030 from the Year 8 roll.
Fluency

Sampling methods and bias

    1. Stratified.
    2. Question bias (loaded or leading wording).
    3. Self-selection: only motivated listeners call in, and they may hold strong or particular views.
    4. Year 7: 3030. Year 8: 2828. Year 9: 2626. Method: 10%10\% of each.
Reasoning

Explain and spot the mistake

    1. No. 2020 out of 500500 is 4%4\%; random variation alone can shift results by ±10\pm 10 percentage points. A bigger sample is needed for confidence.
    2. Each sample contains different individuals; small differences in who’s in the sample translate to small differences in the statistics.
    3. “How would you rate the coach’s performance this season on a scale from 11 (poor) to 55 (excellent)?” - avoids leading wording.
    4. Are the 500500 readers a random sample of all readers, or self-selected? Is the poll reflective of the newspaper’s audience only? What’s the margin of error?
Problem solving

Plan and analyse

    1. Population: all Year 8 students. Method: stratified random sample across classes. Sample size: 30\geq 30. Ask: “How many hours did you sleep last school night?”. Display: dot plot or column graph. Report mean, median, range, and acknowledge uncertainty.
    2. Mean cycling percentage =18+22+16+214×50×100=77200×100=38.5%= \dfrac{18 + 22 + 16 + 21}{4 \times 50} \times 100 = \dfrac{77}{200} \times 100 = 38.5\%. Variability: range 1616 to 2222 per sample of 5050, i.e. ±6\pm 6 percentage points - modest.
    3. 800800 is typically large enough for industrial QC at 1%1\% sampling. Factors: is the sample random across shifts and machines? Is 1%1\% enough given the tolerance required?
    4. Station B (more days = more data to average, less random day-to-day noise) - provided both stations are in the same area and used comparable instruments.

Prefer paper? Print the answer key as a separate booklet: open print view ->