Topic 13 | Statistics & Probability

Data display

Year 7 core: dot plots, stem-and-leaf plots and column graphs; describing distributions by shape, centre and spread; choosing the right display.

45-60 min Printable practice Answer key
How to use this page

Read the explanation, work through the examples, then complete the core practice before printing.

Study progress: Not started

What you will learn

1. Types of data

Data types

Categorical

Data that sorts into groups or categories. Examples: eye colour, favourite subject, yes/no.

Numerical - discrete

Counting data - separate whole values. Examples: number of siblings, number of goals.

Numerical - continuous

Measuring data - can take any value in a range. Examples: height, time, weight.

2. Frequency tables

A frequency is the count of how many times a value or category appears.

Worked example 1 Frequency table

Twenty students were asked their favourite colour. Build a frequency table from this list:

Red, Blue, Red, Green, Red, Blue, Yellow, Red, Blue, Green, Red, Red, Blue, Yellow, Green, Red, Blue, Red, Green, Blue.

ColourFrequency
Red77
Blue66
Green44
Yellow22
Other11

Check: 7+6+4+2+1=207 + 6 + 4 + 2 + 1 = 20.

As a column graph:

02468Red7Blue6Green4Yellow2Other1FrequencyColour
Column graph of favourite colours. Each bar's height matches the frequency in the table above. Gaps between bars show this is categorical data.

3. Types of graph

Choosing a display

Column / bar graph

Use for categorical data or discrete numerical data. Bars have gaps between them.

Dot plot

Use for small discrete numerical data sets. Each dot represents one data value stacked above its number on the axis.

Stem-and-leaf plot

Use for numerical data when you want to keep individual values. The stem is the leading digit(s), the leaf is the final digit.

Line graph

Use when data is continuous over time, e.g. temperature during a day.

012345Number of siblings
Dot plot: number of siblings for 12 students. Each dot is one student. The stack above 2 (four dots) shows that 2 siblings is the mode.
Worked example 2 A stem-and-leaf plot

Build a stem-and-leaf plot for: 42,37,51,46,38,49,52,41,35,45,50,4842, 37, 51, 46, 38, 49, 52, 41, 35, 45, 50, 48.

Sort mentally and split each value into “tens” and “units”:

Stem | Leaf
  3  | 5 7 8
  4  | 1 2 5 6 8 9
  5  | 0 1 2

Read: 35=353 \mid 5 = 35, 41=414 \mid 1 = 41, etc.

4. Interpreting graphs

When reading any graph, ask:

161820222426289am1011121pm234527°C peakTemperature (°C)Time
Line graph: temperature (°C) during a school day. The trend rises until 2 p.m. then falls. The peak (27 °C) is easy to read from the graph.

The line graph above shows continuous data over time. You can read the peak (27°27°C at 22 p.m.), the trend (rises then falls), and the symmetry (roughly even climb and descent).


Practice

Fluency

Tier 1: basic skills

    1. Classify as categorical, discrete numerical, or continuous numerical: eye colour.
    2. Classify: number of pets owned.
    3. Classify: weight of a parcel.
    4. Classify: gender identity.
    5. Classify: temperature at noon.
    6. Classify: shoe size (UK sizing: 5, 5.5, 6, …).
    7. Build a frequency table from: A, B, A, C, B, A, A, C, B, A.
    8. A frequency table shows 5,7,35, 7, 3 in three categories. What is the total sample size?
    9. Which graph is best for categorical data: line graph, column graph, or stem-and-leaf?
    10. Which graph keeps individual values visible: dot plot or column graph?
    11. Read from the stem-and-leaf plot: 23 5 82 \mid 3\ 5\ 8. Write the three values.
    12. In a dot plot, 44 dots stack above the number 77. What does this mean?
    13. A column graph has heights 8,12,5,158, 12, 5, 15. What is the sum of frequencies?
    14. A bar graph’s vertical axis starts at 5050 instead of 00. Why might this be misleading?
Reasoning

Tier 2: mixed practice

    Use this data set for questions 1-5: shoe sizes of 1515 students: 7,8,8,9,7,6,8,10,7,9,8,9,7,8,97, 8, 8, 9, 7, 6, 8, 10, 7, 9, 8, 9, 7, 8, 9.

    1. Build a frequency table.
    2. What is the modal shoe size (the most common)?
    3. Describe the distribution (symmetrical, skewed, or otherwise).
    4. If you were to draw a dot plot, how many dots would stack above 88?
    5. What type of graph would you not use for this data, and why?

    The following stem-and-leaf plot shows exam marks out of 100100 for a class:

    Stem | Leaf
      4  | 2 5 8
      5  | 0 3 3 7 9
      6  | 1 1 4 8
      7  | 0 2 5
    1. How many students are in the class?
    2. What is the lowest score? The highest score?
    3. What mark was scored by the most students?
    4. What is the range of the scores? (max - min.)
Reasoning

Tier 3: explain and spot the mistake

    1. Ben plots temperatures taken every hour from 66 a.m. to 66 p.m. as a column graph with gaps between bars. Is the column graph the best choice here? Explain.
    2. A graph shows sales for three products with bar heights 50,51,5250, 51, 52, and the yy-axis starts at 4949. Explain why this graph could mislead a reader.
    3. Can a single data point be both an outlier and the mode? Explain.
    4. A friend says “categorical data can be averaged”. Is this correct? Give an example that supports your view.
Problem solving

Tier 4: real-world problems

    1. A class survey of favourite sports gave: AFL 99, Soccer 77, Basketball 55, Cricket 44, Other 22. How many students were surveyed? Draw (describe) a column graph for this data.
    2. In one week a shop recorded daily customer numbers: Mon 4242, Tue 3838, Wed 4545, Thu 5050, Fri 6565, Sat 8080, Sun 6060. Which graph type would you use? What total was served?
    3. The temperatures in a city ( degC) every hour from 99 a.m. to 55 p.m. were: 18,20,22,24,26,27,26,24,2218, 20, 22, 24, 26, 27, 26, 24, 22. Which display is best? At what time was the maximum reached?
    4. A class measured heights (cm) of 1414 students: 145,150,152,150,155,148,162,158,150,155,160,153,149,156145, 150, 152, 150, 155, 148, 162, 158, 150, 155, 160, 153, 149, 156. Construct a stem-and-leaf plot.
    5. A town’s population over 55 decades was 1200012\,000, 1500015\,000, 2200022\,000, 2800028\,000, 3100031\,000. Which graph shows the trend best, and why?
Answers

Answer key

Attempt the practice first. When you're ready to check, expand the answers below.

Show the full answer key

Tier 1: basic skills

Fluency

Fluency

    1. Categorical
    2. Discrete numerical
    3. Continuous numerical
    4. Categorical
    5. Continuous numerical
    6. Discrete numerical (values come in fixed jumps)
    7. A: 55, B: 33, C: 22. Total 1010.
    8. 1515
    9. Column graph
    10. Dot plot
    11. 23, 25, 2823,\ 25,\ 28
    12. Four data values of 77 appeared in the sample.
    13. 4040
    14. It stretches small differences so bars look very different when they are actually close.

Tier 2: mixed practice

Reasoning

Mixed practice

    1. Size 66: 11, Size 77: 44, Size 88: 55, Size 99: 44, Size 1010: 11.
    2. 88 (appears most often).
    3. Roughly symmetrical around 88.
    4. 55 dots.
    5. A line graph would be inappropriate: shoe sizes are discrete, not a continuous change over time.

    Questions 6-9 from the stem-and-leaf plot:

    1. 1515 students.
    2. Lowest 4242; highest 7575.
    3. 5353 (two students scored 5353) and 6161 (two students scored 6161) - both are modes; the data is bimodal.
    4. Range =7542=33= 75 - 42 = 33.

Tier 3: explain and spot the mistake

Reasoning

Explain and spot the mistake

    1. A line graph would be better. Temperature varies continuously with time, so joining the hourly readings with a line shows the trend clearly. Columns with gaps suggest separate, independent categories rather than a single continuous variable.
    2. Starting the yy-axis at 4949 exaggerates tiny differences - the 5050-vs-5252 gap becomes several times taller than it should. A reader glancing at the bar heights might think product CC sells vastly more than AA, when it’s only 525050=4%\tfrac{52 - 50}{50} = 4\% more. Always check whether the yy-axis starts at zero before comparing bar heights.
    3. Usually not. The mode is the most frequent value while an outlier is a value unusually far from the rest. In an extreme case (e.g. a dataset where one far value appears many times) a single value could be both - but in typical distributions the mode sits in the middle of the bulk, not at the tail.
    4. Not in the arithmetic sense - you cannot average “red”, “blue”, “green”. You can count frequencies for each category and quote the mode (the most common category), but the mean and median don’t apply to purely categorical data.

Tier 4: real-world problems

Problem solving

Real-world problems

    1. 2727 students. Column graph: bars for each sport with heights 9,7,5,4,29, 7, 5, 4, 2; yy-axis shows frequency, xx-axis shows sport.

    2. Line graph (daily values over the week, with days on the xx-axis). Total customers served: 42+38+45+50+65+80+60=38042 + 38 + 45 + 50 + 65 + 80 + 60 = 380.

    3. Line graph. Maximum at 22 p.m. (2727 degC).

    4. Stem-and-leaf plot:

      Stem | Leaf
        14 | 5 8 9
        15 | 0 0 0 2 3 5 5 6 8
        16 | 0 2
    5. Line graph. It shows the trend (steady growth) over time clearly.

Prefer paper? Print the answer key as a separate booklet: open print view ->