Scientific inquiry: variables, validity, argument

What you will learn

write an investigable question and a testable hypothesis,
identify independent, dependent, and controlled variables,
distinguish validity, reliability, and accuracy,
recognise random and systematic errors and sources of bias,
evaluate experimental design and construct a scientific argument from data.

Worked example 0 Real-world example: does caffeine improve reaction time?

A student plans to test whether caffeine improves reaction time.

Question: Does consuming caffeine reduce reaction time in 14-15 year olds?
Hypothesis: If caffeine raises alertness, then reaction time will be shorter after $100$ mg caffeine than after no caffeine.
Independent variable: caffeine dose ( $0$ mg, $100$ mg).
Dependent variable: reaction time (ms) on a standard online test.
Controlled variables: time of day, sleep before test, noise, test type, familiarity with the test.
Reliability: repeat the test multiple times per condition, average the results.
Validity: the reaction-time test must measure what we think it measures; caffeine must actually enter the bloodstream (give 20 min).

Key idea: one thing changes (IV), one thing is measured (DV), everything else is held constant. That is a fair test.

1. Questions and hypotheses

An investigable question is specific and answerable by an experiment, not a broad opinion or a value judgement.

Weak: “Is exercise good?”
Better: “Does 10 minutes of jogging raise resting heart rate more than 10 minutes of walking?”

A hypothesis is a testable prediction, usually in “if … then …” form, that states an expected direction or relationship.

“If surface area of a chemical increases, then reaction rate will increase, because more particles are exposed to collisions.”

Good hypotheses are:

Specific,
Falsifiable (could be shown wrong by evidence),
Linked to existing theory (“because …“).

2. Variables

Variable type	What it is	Example (plant growth test)
Independent (IV)	what you change	amount of sunlight per day
Dependent (DV)	what you measure	height after 2 weeks
Controlled	what you keep the same	soil, water, plant species, pot size, room temperature

A fair test changes only the IV and measures the DV, with everything else controlled. Only then can you reasonably attribute a change in DV to a change in IV.

3. Validity, reliability, accuracy

These three are often confused.

Term	What it asks	How to improve
Validity	Does the experiment actually measure what it claims to?	Control confounding variables; use a valid method
Reliability	Does repeating give similar results?	Take multiple readings; increase sample size
Accuracy	How close is a measurement to the true value?	Use calibrated instruments; minimise systematic error

4. Errors and bias

Random error: unpredictable fluctuations around the true value. Sources: reading scales, slight variation in timing.

Reduced by: repeating measurements and averaging.

Systematic error: a consistent bias in one direction. Sources: mis-calibrated instruments, parallax, incorrect zero.

Reduced by: calibrating, checking zero, using better equipment.

Bias: a skew in sampling or interpretation that makes results unrepresentative.

Sampling bias: surveying only your friends.
Observer bias: expecting a result and seeing it.
Reduced by: random sampling, blinding, pre-registering the hypothesis.

Worked example 1 Spotting an error

Five students measure the length of a bench with a ruler: $1.23$ m, $1.24$ m, $1.22$ m, $1.25$ m, $1.23$ m. Another student reads $1.10$ m. Which is likely a random error and which is a larger problem?

The five readings spread by $3$ cm — likely random error from reading the scale.
The $1.10$ m reading is an outlier, inconsistent with the others. Possibly a systematic error (mis-read starting from $13$ cm instead of $0$ ) or a mistake; it should be investigated, not averaged in blindly.

5. Designing a good experiment

A good experimental design includes:

Clear question and hypothesis with reasoning.
Identified IV, DV, controlled variables.
Control group (where appropriate) or baseline measurement.
Replication: repeat measurements or multiple trials.
Suitable range of values for the IV.
Calibrated and appropriate instruments.
Risk assessment — ethical and safety considerations.
Plan for recording data (tables) and processing (means, graphs).

Worked example 2 Improving a design

A student tests whether music speeds up homework. They do maths for 30 min with music and 30 min without. Critique the design.

Only one trial each — no replication.
No control of homework type (could be easier/harder).
Only one student — no sample; results may not generalise.
Time of day, fatigue not controlled.
DV (“faster”) is vague — should be problems per minute or accuracy.

Better: 20 students, randomised order of music/no-music, same standardised test, measured time and accuracy, repeated multiple times, results averaged.

6. Analysing data and drawing conclusions

Tabulate raw data with clear units and headings.
Summarise with a mean, and sometimes range or standard deviation to show spread.
Plot IV on the x-axis, DV on the y-axis.
Look for patterns, trends, and outliers.
Check that the conclusion actually follows from the data.

A scientific argument has three parts:

Claim: a statement that answers the question.
Evidence: data or observations that support the claim.
Reasoning: a link explaining why the evidence supports the claim, using scientific principles.

Worked example 3 Writing a conclusion

Data from a plant-light experiment: plants in 12 h light grew $8.4$ cm on average; in 6 h light grew $3.2$ cm; in 2 h light grew $0.9$ cm. Write a conclusion.

Claim: Increasing daily sunlight increased plant growth over two weeks.

Evidence: Mean growth rose from $0.9$ cm (2 h) to $3.2$ cm (6 h) to $8.4$ cm (12 h), a clear upward trend.

Reasoning: Plants use light for photosynthesis to make glucose. More hours of light allows more photosynthesis and more material for growth, consistent with the observed trend.

Note the conclusion should not over-reach: “plants” here means the species tested, “sunlight” means the lamp used, and the range tested was 2-12 h.

7. Evaluating a claim

When judging a scientific claim (or a news story), ask:

What was the sample size? Was the sample representative?
Was there a control or baseline?
Were confounding variables controlled?
Has the result been replicated by others?
Who funded or conducted the work? Could bias influence conclusions?
Does the claim go beyond what the data actually show?

Practice: Year 9

Fluency

Question, hypothesis, variables

Write an investigable question about how temperature affects the dissolving rate of salt.
Write a hypothesis for the above question in “if … then … because …” form.
For a test of “does fertiliser amount change tomato yield?”: identify the IV, DV, and three controlled variables.
Explain the difference between an IV and a DV.
State what a “control group” is and give an example.

Reasoning

Validity, reliability, accuracy

Define validity, reliability, and accuracy in your own words.
A thermometer reads $102^{\circ}\text{C}$ in boiling water at sea level (true value $100^{\circ}\text{C}$ ). Classify the error.
A stopwatch gives times of 12.40 s, 12.41 s, 12.39 s, 12.42 s. Classify: reliable? accurate? valid?
Give an example of an experiment that is reliable but not valid.
Why does repeating measurements improve reliability but not necessarily accuracy?

Problem solving

Designing and evaluating

A student investigates whether a ball dropped from higher bounces more. Design a plan: IV, DV, three controlled variables, what data to collect, and how to analyse.
Critique this design: “I tested a new fertiliser on my tomato plant. It grew taller than my neighbour’s tomato. Therefore the fertiliser works.” List three issues.
A company funds a study concluding its sugary drink is “not linked to weight gain”. Suggest two potential sources of bias and how to address them.
A class of 30 students has 28 results between 2.0 and 2.5 for an experiment. Two students report results of 8.7. Discuss whether to include or exclude the outliers and how to decide.

Reasoning

Arguments from data

A graph shows ice-cream sales and drowning rates rising together through summer. A headline reads “Ice cream causes drownings.” Evaluate this causal claim (hint: think about a common cause).
Data: reaction time (ms) after caffeine dose (mg): 0 -> 280, 50 -> 260, 100 -> 250, 150 -> 245, 200 -> 260. Describe the pattern and the most plausible interpretation.
Write a three-part argument (claim, evidence, reasoning) for: “a LED bulb is more efficient than an incandescent bulb”, using typical figures from your topic knowledge.

Challenge

Reasoning

Harder reasoning

A medical trial uses “double-blind” design: neither patient nor doctor knows who got the drug or placebo. Explain why this controls bias, and what would go wrong if either side knew.
Two studies disagree about the effect of a new diet. Study A: $n = 15$ , $12$ weeks, self-reported weight. Study B: $n = 500$ , $6$ months, weighed by researchers. Using the ideas of validity, reliability, and sample size, argue which result deserves more weight.
A student claims their experiment “proves” their hypothesis. Explain why science never “proves” a hypothesis, only supports or falsifies it — and why that makes science more trustworthy, not less.
A graph of test scores vs hours studied shows scatter but a clear upward trend. Write a balanced conclusion that distinguishes correlation from causation and identifies at least one confounding variable.