Topic 09 | Science inquiry

Scientific inquiry: variables, validity, argument

Year 9 (Levels 9-10 band): designing fair tests, distinguishing validity/reliability/accuracy, sources of error and bias, and constructing arguments from evidence.

50-70 min Printable practice Answer key Challenge included
How to use this page

Read the explanation, work through the examples, then complete the core practice before printing.

Study progress: Not started

What you will learn

Worked example 0 Real-world example: does caffeine improve reaction time?

A student plans to test whether caffeine improves reaction time.

  1. Question: Does consuming caffeine reduce reaction time in 14-15 year olds?
  2. Hypothesis: If caffeine raises alertness, then reaction time will be shorter after 100100 mg caffeine than after no caffeine.
  3. Independent variable: caffeine dose (00 mg, 100100 mg).
  4. Dependent variable: reaction time (ms) on a standard online test.
  5. Controlled variables: time of day, sleep before test, noise, test type, familiarity with the test.
  6. Reliability: repeat the test multiple times per condition, average the results.
  7. Validity: the reaction-time test must measure what we think it measures; caffeine must actually enter the bloodstream (give 20 min).

Key idea: one thing changes (IV), one thing is measured (DV), everything else is held constant. That is a fair test.

1. Questions and hypotheses

An investigable question is specific and answerable by an experiment, not a broad opinion or a value judgement.

A hypothesis is a testable prediction, usually in “if … then …” form, that states an expected direction or relationship.

Good hypotheses are:

2. Variables

Variable typeWhat it isExample (plant growth test)
Independent (IV)what you changeamount of sunlight per day
Dependent (DV)what you measureheight after 2 weeks
Controlledwhat you keep the samesoil, water, plant species, pot size, room temperature

A fair test changes only the IV and measures the DV, with everything else controlled. Only then can you reasonably attribute a change in DV to a change in IV.

3. Validity, reliability, accuracy

These three are often confused.

TermWhat it asksHow to improve
ValidityDoes the experiment actually measure what it claims to?Control confounding variables; use a valid method
ReliabilityDoes repeating give similar results?Take multiple readings; increase sample size
AccuracyHow close is a measurement to the true value?Use calibrated instruments; minimise systematic error

4. Errors and bias

Random error: unpredictable fluctuations around the true value. Sources: reading scales, slight variation in timing.

Systematic error: a consistent bias in one direction. Sources: mis-calibrated instruments, parallax, incorrect zero.

Bias: a skew in sampling or interpretation that makes results unrepresentative.

Worked example 1 Spotting an error

Five students measure the length of a bench with a ruler: 1.231.23 m, 1.241.24 m, 1.221.22 m, 1.251.25 m, 1.231.23 m. Another student reads 1.101.10 m. Which is likely a random error and which is a larger problem?

  1. The five readings spread by 33 cm — likely random error from reading the scale.
  2. The 1.101.10 m reading is an outlier, inconsistent with the others. Possibly a systematic error (mis-read starting from 1313 cm instead of 00) or a mistake; it should be investigated, not averaged in blindly.

5. Designing a good experiment

A good experimental design includes:

  1. Clear question and hypothesis with reasoning.
  2. Identified IV, DV, controlled variables.
  3. Control group (where appropriate) or baseline measurement.
  4. Replication: repeat measurements or multiple trials.
  5. Suitable range of values for the IV.
  6. Calibrated and appropriate instruments.
  7. Risk assessment — ethical and safety considerations.
  8. Plan for recording data (tables) and processing (means, graphs).
Worked example 2 Improving a design

A student tests whether music speeds up homework. They do maths for 30 min with music and 30 min without. Critique the design.

  1. Only one trial each — no replication.
  2. No control of homework type (could be easier/harder).
  3. Only one student — no sample; results may not generalise.
  4. Time of day, fatigue not controlled.
  5. DV (“faster”) is vague — should be problems per minute or accuracy.

Better: 20 students, randomised order of music/no-music, same standardised test, measured time and accuracy, repeated multiple times, results averaged.

6. Analysing data and drawing conclusions

A scientific argument has three parts:

Worked example 3 Writing a conclusion

Data from a plant-light experiment: plants in 12 h light grew 8.48.4 cm on average; in 6 h light grew 3.23.2 cm; in 2 h light grew 0.90.9 cm. Write a conclusion.

Claim: Increasing daily sunlight increased plant growth over two weeks.

Evidence: Mean growth rose from 0.90.9 cm (2 h) to 3.23.2 cm (6 h) to 8.48.4 cm (12 h), a clear upward trend.

Reasoning: Plants use light for photosynthesis to make glucose. More hours of light allows more photosynthesis and more material for growth, consistent with the observed trend.

Note the conclusion should not over-reach: “plants” here means the species tested, “sunlight” means the lamp used, and the range tested was 2-12 h.

7. Evaluating a claim

When judging a scientific claim (or a news story), ask:


Practice: Year 9

Fluency

Question, hypothesis, variables

    1. Write an investigable question about how temperature affects the dissolving rate of salt.
    2. Write a hypothesis for the above question in “if … then … because …” form.
    3. For a test of “does fertiliser amount change tomato yield?”: identify the IV, DV, and three controlled variables.
    4. Explain the difference between an IV and a DV.
    5. State what a “control group” is and give an example.
Reasoning

Validity, reliability, accuracy

    1. Define validity, reliability, and accuracy in your own words.
    2. A thermometer reads 102C102^{\circ}\text{C} in boiling water at sea level (true value 100C100^{\circ}\text{C}). Classify the error.
    3. A stopwatch gives times of 12.40 s, 12.41 s, 12.39 s, 12.42 s. Classify: reliable? accurate? valid?
    4. Give an example of an experiment that is reliable but not valid.
    5. Why does repeating measurements improve reliability but not necessarily accuracy?
Problem solving

Designing and evaluating

    1. A student investigates whether a ball dropped from higher bounces more. Design a plan: IV, DV, three controlled variables, what data to collect, and how to analyse.
    2. Critique this design: “I tested a new fertiliser on my tomato plant. It grew taller than my neighbour’s tomato. Therefore the fertiliser works.” List three issues.
    3. A company funds a study concluding its sugary drink is “not linked to weight gain”. Suggest two potential sources of bias and how to address them.
    4. A class of 30 students has 28 results between 2.0 and 2.5 for an experiment. Two students report results of 8.7. Discuss whether to include or exclude the outliers and how to decide.
Reasoning

Arguments from data

    1. A graph shows ice-cream sales and drowning rates rising together through summer. A headline reads “Ice cream causes drownings.” Evaluate this causal claim (hint: think about a common cause).
    2. Data: reaction time (ms) after caffeine dose (mg): 0 -> 280, 50 -> 260, 100 -> 250, 150 -> 245, 200 -> 260. Describe the pattern and the most plausible interpretation.
    3. Write a three-part argument (claim, evidence, reasoning) for: “a LED bulb is more efficient than an incandescent bulb”, using typical figures from your topic knowledge.

Challenge

Reasoning

Harder reasoning

    1. A medical trial uses “double-blind” design: neither patient nor doctor knows who got the drug or placebo. Explain why this controls bias, and what would go wrong if either side knew.
    2. Two studies disagree about the effect of a new diet. Study A: n=15n = 15, 1212 weeks, self-reported weight. Study B: n=500n = 500, 66 months, weighed by researchers. Using the ideas of validity, reliability, and sample size, argue which result deserves more weight.
    3. A student claims their experiment “proves” their hypothesis. Explain why science never “proves” a hypothesis, only supports or falsifies it — and why that makes science more trustworthy, not less.
    4. A graph of test scores vs hours studied shows scatter but a clear upward trend. Write a balanced conclusion that distinguishes correlation from causation and identifies at least one confounding variable.
Answers

Answer key

Attempt the practice first. When you're ready to check, expand the answers below.

Show the full answer key

Year 9 answers

Fluency

Question, hypothesis, variables

    1. E.g. “Does increasing water temperature reduce the time for 55 g of salt to dissolve in 100100 mL of water?”
    2. “If water temperature is increased, then the time for salt to dissolve will decrease, because particles move faster at higher temperature, giving more frequent collisions with the solvent.”
    3. IV: mass of fertiliser per plant. DV: tomato yield (mass or count of fruit). Controlled: variety of tomato, size of pot, amount of water, light exposure, soil type, duration of experiment.
    4. The IV is the variable the experimenter changes; the DV is what is measured and is expected to respond to changes in the IV.
    5. A control group is a comparison group that does not receive the treatment/change, showing what happens without the IV. Example: placebo group in a drug trial.
Reasoning

Validity, reliability, accuracy

    1. Validity: the experiment measures what it claims to measure. Reliability: repeated measurements give consistent results. Accuracy: measurements are close to the true value.
    2. Systematic error — off by a consistent +2C+2^{\circ}\text{C}.
    3. Reliable (very consistent), reasonably accurate if the true time is near 12.4012.40 s. Validity depends on whether timing actually measures what we want.
    4. E.g. using a cheap bathroom scale that always reads 22 kg too low — gives repeatable (reliable) but inaccurate readings; still not valid for a “true weight” study.
    5. Repeating averages out random fluctuations, improving reliability. It does not fix systematic errors, which push every reading the same way.
Problem solving

Designing and evaluating

    1. IV: drop height (e.g. 25, 50, 75, 100, 125 cm). DV: bounce height (cm) from the floor to top of first bounce. Controlled: same ball, same surface, same ball release technique (no push), same temperature, same measurer. Data: measure bounce height 3 times per drop height; average. Analysis: plot bounce height (y) vs drop height (x); look for a linear trend and comment on outliers.
    2. Issues: (i) n=1n = 1; no replication; (ii) no control (different plants, different conditions — uncontrolled confounds); (iii) a single outcome (taller) doesn’t prove the fertiliser is responsible; (iv) no randomisation; (v) no measure of variability.
    3. Bias sources: (i) selective reporting of favourable results; (ii) study design choices favouring the sponsor (short duration, specific group). Address by independent replication, pre-registering the study, and full public data access.
    4. Investigate first: were the two outliers from a procedural mistake (e.g. different method)? If yes, exclude and state this. If there’s no mistake, keep them but report them — they may reflect real variation. Use a consistent rule (e.g. outlier test) rather than discarding data to make the result look cleaner.
Reasoning

Arguments from data

    1. Correlation does not imply causation. Both ice-cream sales and drownings rise in summer because of higher temperatures and more swimming; the common cause is hot weather, not ice cream.
    2. Reaction time falls from 280 ms (0 mg) to 245 ms (150 mg), showing caffeine shortens reaction time up to a point. At 200 mg it rises again (260 ms), suggesting a “too much” effect (jitteriness, over-arousal). Plausible interpretation: moderate doses improve alertness; high doses may impair.
    3. Claim: LED bulbs are more efficient than incandescents. Evidence: typical LED outputs 60%\sim 60\% of input as visible light vs 5%\sim 5\% for incandescents; LEDs use 1010 W to produce about the same light as a 6060 W incandescent. Reasoning: both convert electrical input into light and heat; LEDs use semiconductor electroluminescence, which diverts little energy to heat, while incandescents rely on a heated filament where most energy becomes heat. Therefore, for the same useful light, LEDs use far less electrical input, which is the definition of higher efficiency.
Reasoning

Challenge

    1. In a double-blind trial, neither side can consciously or unconsciously influence outcomes. If doctors knew, they might treat the drug group differently (more attentive care, interpret symptoms differently); if patients knew, the placebo effect and reporting would differ. Double-blinding removes both channels of bias.
    2. Study B deserves more weight: much larger nn (better reliability), researcher-measured weight (more accurate and valid than self-report), longer duration (captures real effects). Study A’s small sample and self-reported DV make both reliability and validity weaker.
    3. “Proof” in everyday use means certainty. In science, no finite set of observations can establish certainty — there may always be a future experiment that contradicts a theory. Science “supports” hypotheses tentatively and is open to revision. Paradoxically, this is a strength: self-correction is why science advances, while dogma that claims proof cannot be improved.
    4. Claim: higher hours of study are associated with higher test scores. Evidence: positive trend on the graph. Reasoning: more time on content could plausibly improve retention and skill. But correlation is not causation; confounding variables (e.g. motivation, sleep, subject aptitude, prior knowledge) might cause both more study and better scores. A controlled experiment or statistical control is needed to distinguish the effect of study time itself.

Prefer paper? Print the answer key as a separate booklet: open print view ->