2. Decide which statistics to use
The next step is to think about how you can use different statistics to answer the questions identified in Analyse your data.
Percentages: It is often helpful to present numbers as percentages of a total, as this gives readers a sense of scale and proportion; for example, 50% of all service users. However, be wary of using percentages when presenting data from small samples. We recommend avoiding percentages for samples of fewer than 50, and avoid drawing firm conclusions from small differences in percentages for samples of 50-100. Make sure you refer to the correct number of respondents. For example, if analysing data from a survey, use the number of people who have responded to a specific question rather than the number of people responding to the survey.
Measuring change: If you have asked the same questions before and after your intervention, you can subtract the ‘before’ score from the ‘after’ score to find out how much change has occurred. For example, if 50% of participants stated they felt confident before an activity, and this rose to 75% after the activity, you can cite an increase of 25 percentage points. You can work out the average change for your whole group or for sub-groups, or the percentage of respondents who experienced positive or negative change.
Cross-tabulation is a way of comparing results for different types of respondents. For example, if you want to know if your intervention is more effective for people who are unemployed, or those who are in employment, you could use cross-tabulation to compare their experiences.
Averages are used to summarise a dataset using a number that represents the middle of the distribution. They can be used to report on the average experience of users; for example, the average score for the class was 7.3 out of 10. There are three main types of average.
- Mean: This is what we normally mean when we say ‘average’. The mean is the total of all values divided by the number of responses. For example, if the values are 2, 3, 4, and 5, the mean is the total (14) divided by the number of values (4) = 3.5. This is less helpful if your data is skewed (if top or bottom values have a higher value than the middle value) or if your data has outliers (values far above or below the majority of values). For example, in comparing the duration of time spent using a service, one unusually long visit will disproportionately skew the mean average.
- Median: This is the value in the middle of your data set arranged from smallest to largest; for example, if the values are 1, 2, 3, and 4, 5, the median is 3. This can be helpful if your data is skewed and/or contains outliers. However, the median does not take into account all of the information in the data set, only the middle value. For example, if the median duration time is four minutes, this doesn’t tell you anything about the duration of time spent by other users.
- Mode: This is the value that occurs most frequently in your dataset. There may be more than one mode in your data set. The mode is the only measure of average that can be used with non-numerical data. For example, if 40% of your users engage with your service online, 30% via phone, and 30% in-person, no median or mean can be calculated but the mode is online users, as this is the most common.
Variation: To understand how much variation there is in your dataset, you can use two calculations:
- Range: This is the difference between the largest and smallest value in your dataset.
- Standard deviation: This is the average distance between a value and the mean average. This shows you how well the mean represents your dataset; the higher the standard deviation, the more dispersed the data set is.
3. Think critically about your data
Once you have chosen the most useful statistics, examine your data and ask yourself what it is telling you.