Making sense of quantitative data
Wai-ching Leung tells you the best way of dealing with your data
I discussed in a previous article how surveys are used to collect information from a large number of people.1 In surveys, closed - that is, forced choice-format questions are most often used, as they allow quantitative analysis to be carried out and the results to be generalised.
Imagine that you aim to investigate the possible effects of part time jobs on their course achievements among medical students in your medical school. You have just got 1000 questionnaires back, each with responses on details about their jobs, course assessment results, and personal details totalling more than 20 responses from each student. How are you going to make sense of them and present the information to other students? Armed with only a paper, a pencil, and a calculator, it will probably take many hours just to work out the average responses for each question asked. Just to calculate the correlation coefficient between any two variables would involve literally thousands of arithmetic operations and errors are almost inevitable. This is why the statistical software and a systematic method of analysing such quantitative data are essential.
Organising the information
The range of computer software available might initially seem daunting-from simple spreadsheet package - for example, Excel - simple statistical packages - for example, Minitab, SPSS - to more complicated statistical packages - for example, SAS. However, they all share a simple format for organising the information. Furthermore, once data files are collected in this format they can be interconverted. Data are basically organised as in a spreadsheet - each row holds the information relating to one subject while each column holds the information on each variable.
Coding and entering data
Coding data means a simple and consistent way of representing each observation and making them easier to be entered into the software for further analysis. To do this, it is important to understand the different types of scales used to measure variables (box 1). If the variable is in nominal or ordinal scale, code the data with a single easily recognisable symbol to avoid mistakes during data entry-for example, B for British, F for French, I for Indians; rather than 1 for British, 2 for French, etc. Interval variables should be entered as a number.
Sometimes, you might have collected continuous data - for example, age - but may wish to analyse the data in ordered categories - for example, age bands. Do not try to convert the data during data entry. To minimise errors, you should always enter the data in the same format as they are collected. Most software - for example, SPSS - allows you to convert interval data into ordered categories easily.
It is probably easy to enter data directly into a spreadsheet if the number of observations is relatively small. Otherwise, it is often best to set up a form using Access software. The form looks like a questionnaire and data can be entered as if you were answering the questions for your subjects. There are two advantages in using a form. Firstly, it is much more user friendly for someone who is not entirely computer literate. Secondly, you can preset the range of responses allowable for each variable. For example, you can set the "gender" variable to allow a choice of either "M" or "F." In the variable on examination marks, you can set it to a number between 0 and 1. In this way, the risk of errors during data entry is minimised.
Converting data files from spreadsheets to statistical package
Fortunately, almost any files in spreadsheet format can be converted for analysis by most statistical packages. As the commonest spreadsheet and statistical packages used are probably Excel and SPSS, it is helpful to know how to convert Excel to SPSS files. Ensure that the labels for each variable appear in the first row of the Excel file. Save the excel file as DBF3 file. Then run the SPSS software and open the DBF3 file. Your data will be automatically converted into SPSS ready for analysis.
Cleaning up data
No matter how careful you might be, occasional errors inevitably creep in when you enter your data into the spreadsheet. While some types of errors hardly matter, even a single error may seriously distort your results for other types. For example, to study the effects of part time jobs on examination performance among medical students, you might decide to enter the marks of the students as a decimal-for example, 0.72 to represent 72%. However, if you enter 72 instead of 0.72 for just one student, your results would be seriously distorted.
As mentioned previously, a form created by Access software preset to accept only data in the correct format or with values in a given range may minimise errors. Another way is to check your data using the statistical software itself before your analysis. For all nominal or ordinal variables, perform a "frequency count" - that is, a count of the number of responses in each categories. For example, you might find that in your gender variable, you have a few erroneous entries such as "N" and "D" in addition to your correct entries - that is, "M" (male) and "F" (female).
Then, perform basic descriptive statistics for all your interval variables. For almost all software, descriptive statistics will automatically include minimum, maximum, mean, median, and standard deviation. The minimum and maximum may immediately alert you to some data which are impossible - for example, an assessment mark of below 0 or over 100%, age of below 0 or over 100, etc. You can locate the errors and correct them. Other errors may not be so obvious. For example, you may have entered a student with 65% as 15%. To spot such errors, you should use your software to list all "outliers" for each interval variable - that is, a value more than, say, two standard deviations beyond the mean value. You can then check these outliers against the original documents - for example, returned questionnaires.
Basic steps in analysing data using software
Thanks to powerful statistical software packages now being widely available in all medical schools, it is no longer necessary to perform any statistical tests yourself. However, it is important that you know which statistical test to use for any given problem. The important factors determining which tests to use are:
- The types of data (box 1)
- The hypotheses you wish to prove or disprove
- The assumptions underlying each statistical test
Box 1: Types of data
- Nominal (categorical) scale
Observations are grouped in categories not in ranked order.
For example, Eye colour - blue, black, brown, etc
Nationality - British, French, Indian, etc
Gender - male, female
Blood group - O, A, B, AB
Binary (dichotomous) scale - If there are only two categories.
For example, gender, yes/no response
- Ranked (ordinal) scale
Observations are ranked order categories. However, differences between any two categories are not necessarily the same or measurable.
For example, Social classes - I - V.
Examination grades - A, B, C etc.)
Likert scale - strongly agree, agree, neutral, disagree, strongly disagree Council tax banding
Using the software in practice
The layout and the interfaces differ among software packages, but it does not take long to get familiar with them. However, the basic procedures of performing the tests are broadly similar.
- Select the data you wish to analyse. You may wish to analyse all or only a selection of your subjects
- Select the test you wish to perform. Choose the appropriate test using the menu bar
- Input the variables you want to study. Use the dialogue boxes to choose the variables you wish to analyse once you have selected the test
Analysing the data
Examine structure of data
Firstly, it would usually make sense to look at your data for each independent variable. Use the software to display the frequencies for each nominal or ordinal variable. This will tell you, for example, the number of male and female students you have surveyed, the number of students surveyed in each year of study, and the number of students who do and do not have a part time job. Use the software to perform the summary statistics - that is, minimum, maximum, mean, median, standard deviation - for each interval variable. This will tell you, for example, the mean and spread of the ages of the students. You will need these results on the background of your subjects in your report.
Next, do the same for the dependent variables. It might tell you, for example, how many students pass or fail their end of year examinations, or the mean and spread of the examination marks of the students.
Explore the relationship between variables
Next, explore your data by examining the relationship between variables. Before deciding how you examine the association, first decide
· What types of variables they are
· If they are interval variables, whether they are likely to be normally distributed
Box 2 shows the relevant methods to explore associations between variables.

Box 2: Commonly used tests for association
Cross tabulation often shows simple and interesting results that you have not anticipated. For example, it might show that you that a higher proportion of women students have a part time job than men, or that a higher proportion of first year students have a part time job. See box 3 for a hypothetical example.

Box 3: Cross tabulation of gender and part time jobs
It may also be interesting to explore the relationship between two interval variables. For example, you may wish to explore the relationship between the number of hours a student works per week during term time and the examination marks. A significantly negative correlation coefficient might suggest the harmful effects of part time work on the course results.
Comparing groups
Another type of test is to compare two or more groups. If the dependent outcome is binary - for example, whether a student passes or fails an examination - you need to compare the proportion of students who fail in each group and the chi-squared test is appropriate.
If the dependent variable is on an interval scale - for example, the examination marks, you need to compare the differences between groups. Box 4 shows the appropriate tests depending on your sample size and whether the observations are normally distributed.

Table 4 - Comparing two or more groups
a) Compare proportions
Chi-squared test if the dependent variable is binary (For example, comparing the proportion of male and female students are more likely to pass their examinations the first time.)
b) Comparing differences
Reporting and presenting your findings
Firstly, present your objectives and how you collect your data. How you report and present your findings depends somewhat on the precise purposes of your project, but the following general principles apply:
Response rate
If your project is a survey, state the proportion of subjects you invited who actually participated in your study - that is, the response rate. It might be possible to give some indication of whether those who did not participate were different from those who did - for example, you might be able to compare the age or gender profile of the population you wish to survey with those who actually participated in your survey.
Basic characteristics of subjects
If you are studying the effects of part time jobs on examination results, these characteristics might include gender, age, year of study, and the time spent in part time jobs per week. Make use of charts - for example, pie charts, bar charts - to present your results.
Overview of the outcome variables
If you are studying the effects of part time jobs on examination results, give an overview of the examination marks.
Testing of hypotheses
Next, present your results on your hypotheses testing-for example, comparing groups and association between variables.
Conclusions
Finally, draw your conclusions.
Conclusions
A systematic method will make analysis of quantitative data easy. It is essential to have a consistent way of coding your data, and to clean up your data before you analyse. Then, examine the structure of your data before testing your hypotheses. With modern statistical software, knowing when to use a particular statistical test is much more important than how to do them.
Further reading
- Bryman A. Quantitative data Analysis with SPSS for windows. Release 10 a guide for social scientists. London: Routledge, 2001.
- Pallanat J. SPSS survival manual: a step-by-step guide to data analysis using SPSS for windows (version 10). Buckingham: Open University Press, 2001.
- Kinnear PR, Gray CD. SPSS for windows made simple. Hove: Psychology Press, 1997.
Wai-Ching Leung, lecturer in public health medicine University of East Anglia
Email: w-c.leung@uea.ac.uk
studentBMJ 2001;09:217-260 July ISSN 0966-6494
- Leung WC. How to conduct a survey. studentBMJ 2001;9:143-5. (May.)