What you will learn
- distinguish categorical from numerical data, and discrete from continuous data,
- build a frequency table,
- construct and read column graphs, dot plots, stem-and-leaf plots, and line graphs,
- choose the best display for a given data set.
1. Types of data
Data types
Data that sorts into groups or categories. Examples: eye colour, favourite subject, yes/no.
Counting data - separate whole values. Examples: number of siblings, number of goals.
Measuring data - can take any value in a range. Examples: height, time, weight.
2. Frequency tables
A frequency is the count of how many times a value or category appears.
Twenty students were asked their favourite colour. Build a frequency table from this list:
Red, Blue, Red, Green, Red, Blue, Yellow, Red, Blue, Green, Red, Red, Blue, Yellow, Green, Red, Blue, Red, Green, Blue.
| Colour | Frequency |
|---|---|
| Red | |
| Blue | |
| Green | |
| Yellow | |
| Other |
Check: .
As a column graph:
3. Types of graph
Choosing a display
Use for categorical data or discrete numerical data. Bars have gaps between them.
Use for small discrete numerical data sets. Each dot represents one data value stacked above its number on the axis.
Use for numerical data when you want to keep individual values. The stem is the leading digit(s), the leaf is the final digit.
Use when data is continuous over time, e.g. temperature during a day.
Build a stem-and-leaf plot for: .
Sort mentally and split each value into “tens” and “units”:
Stem | Leaf
3 | 5 7 8
4 | 1 2 5 6 8 9
5 | 0 1 2Read: , , etc.
4. Interpreting graphs
When reading any graph, ask:
- What is the variable on each axis?
- Which value is the largest (mode/max)? Which is the smallest?
- Is there a pattern or trend?
- Are there any unusual values (outliers)?
The line graph above shows continuous data over time. You can read the peak (C at p.m.), the trend (rises then falls), and the symmetry (roughly even climb and descent).
Practice
Tier 1: basic skills
- Classify as categorical, discrete numerical, or continuous numerical: eye colour.
- Classify: number of pets owned.
- Classify: weight of a parcel.
- Classify: gender identity.
- Classify: temperature at noon.
- Classify: shoe size (UK sizing: 5, 5.5, 6, …).
- Build a frequency table from: A, B, A, C, B, A, A, C, B, A.
- A frequency table shows in three categories. What is the total sample size?
- Which graph is best for categorical data: line graph, column graph, or stem-and-leaf?
- Which graph keeps individual values visible: dot plot or column graph?
- Read from the stem-and-leaf plot: . Write the three values.
- In a dot plot, dots stack above the number . What does this mean?
- A column graph has heights . What is the sum of frequencies?
- A bar graph’s vertical axis starts at instead of . Why might this be misleading?
Tier 2: mixed practice
- Build a frequency table.
- What is the modal shoe size (the most common)?
- Describe the distribution (symmetrical, skewed, or otherwise).
- If you were to draw a dot plot, how many dots would stack above ?
- What type of graph would you not use for this data, and why?
- How many students are in the class?
- What is the lowest score? The highest score?
- What mark was scored by the most students?
- What is the range of the scores? (max min.)
Use this data set for questions 1-5: shoe sizes of students: .
The following stem-and-leaf plot shows exam marks out of for a class:
Stem | Leaf
4 | 2 5 8
5 | 0 3 3 7 9
6 | 1 1 4 8
7 | 0 2 5Tier 3: explain and spot the mistake
- Ben plots temperatures taken every hour from a.m. to p.m. as a column graph with gaps between bars. Is the column graph the best choice here? Explain.
- A graph shows sales for three products with bar heights , and the -axis starts at . Explain why this graph could mislead a reader.
- Can a single data point be both an outlier and the mode? Explain.
- A friend says “categorical data can be averaged”. Is this correct? Give an example that supports your view.
Tier 4: real-world problems
- A class survey of favourite sports gave: AFL , Soccer , Basketball , Cricket , Other . How many students were surveyed? Draw (describe) a column graph for this data.
- In one week a shop recorded daily customer numbers: Mon , Tue , Wed , Thu , Fri , Sat , Sun . Which graph type would you use? What total was served?
- The temperatures in a city ( degC) every hour from a.m. to p.m. were: . Which display is best? At what time was the maximum reached?
- A class measured heights (cm) of students: . Construct a stem-and-leaf plot.
- A town’s population over decades was , , , , . Which graph shows the trend best, and why?
Answer key
Attempt the practice first. When you're ready to check, expand the answers below.
Show the full answer key
Tier 1: basic skills
Fluency
- Categorical
- Discrete numerical
- Continuous numerical
- Categorical
- Continuous numerical
- Discrete numerical (values come in fixed jumps)
- A: , B: , C: . Total .
- Column graph
- Dot plot
- Four data values of appeared in the sample.
- It stretches small differences so bars look very different when they are actually close.
Tier 2: mixed practice
Mixed practice
- Size : , Size : , Size : , Size : , Size : .
- (appears most often).
- Roughly symmetrical around .
- dots.
- A line graph would be inappropriate: shoe sizes are discrete, not a continuous change over time.
- students.
- Lowest ; highest .
- (two students scored ) and (two students scored ) - both are modes; the data is bimodal.
- Range .
Questions 6-9 from the stem-and-leaf plot:
Tier 3: explain and spot the mistake
Explain and spot the mistake
- A line graph would be better. Temperature varies continuously with time, so joining the hourly readings with a line shows the trend clearly. Columns with gaps suggest separate, independent categories rather than a single continuous variable.
- Starting the -axis at exaggerates tiny differences - the -vs- gap becomes several times taller than it should. A reader glancing at the bar heights might think product sells vastly more than , when it’s only more. Always check whether the -axis starts at zero before comparing bar heights.
- Usually not. The mode is the most frequent value while an outlier is a value unusually far from the rest. In an extreme case (e.g. a dataset where one far value appears many times) a single value could be both - but in typical distributions the mode sits in the middle of the bulk, not at the tail.
- Not in the arithmetic sense - you cannot average “red”, “blue”, “green”. You can count frequencies for each category and quote the mode (the most common category), but the mean and median don’t apply to purely categorical data.
Tier 4: real-world problems
Real-world problems
-
students. Column graph: bars for each sport with heights ; -axis shows frequency, -axis shows sport.
-
Line graph (daily values over the week, with days on the -axis). Total customers served: .
-
Line graph. Maximum at p.m. ( degC).
-
Stem-and-leaf plot:
Stem | Leaf 14 | 5 8 9 15 | 0 0 0 2 3 5 5 6 8 16 | 0 2 -
Line graph. It shows the trend (steady growth) over time clearly.
Prefer paper? Print the answer key as a separate booklet: open print view ->