Estimating with samples
In the fifth part of our series on medical statistics, Wai-Ching Leung takes us through sampling
The last article showed that an important use of medical statistics is to draw conclusions from observations.1 We cannot be absolutely certain about our conclusions, however, partly because it is impractical or impossible to collect information from all relevant subjects. For example, we may wish to compare how much money undergraduate students spend at one university compared with another. Although it is theoretically possible to survey all students in both universities, we do not have the time and resources to do so. A common approach is to survey a sample of, say, 500 students from each university and generalise our findings.
Potential errors in estimating from samples
In generalising our results, we assume that the sample of subjects we collect information on (the study sample) is similar to, or representative of, the group of subjects we want to draw conclusions about (the study population). But this might not be true for two reasons. Firstly, our study sample may be distorted by the way we choose our subjects.2 For example, if we wish to find out how much money undergraduate students spend but decide to survey those students who visit luxurious nightclubs, our findings are likely to show a higher level of expenditure. This is because our sample of students is likely to be more extravagant than the university as a whole (the study population).
Secondly, even if we choose our study subjects properly, our study sample may still differ from our study population by chance. This is more likely to occur the fewer students we decide to sample. Taken to the extreme, if we survey only one student from each university, we would not be too surprised if the monthly expenditure of the student differs considerably from the average by chance. However, if we survey and take the mean of, say, 100 students, such differences would be less likely. It would be useful to know how accurate our estimate is.
In this article, we will look at how we should select our study sample to minimise bias. We will also look at ways to measure the accuracy of our estimates assuming we have selected our study sample properly.
Selecting the study sample
Our aim is to select a proportion of subjects so that they are representative of our study population. First of all, we must be clear what our study population is. The next step is to select our sample. As we have seen, some convenient selection methods can sometimes give rise to misleading results. We can avoid bias in selecting our sample if each subject in our study population has an equal chance of being selected.
Simple random sampling--We might obtain a list of all students' registration numbers and select them randomly using computer generated random numbers. This is the ideal method.
Systematic sampling--It would be cumbersome to select 1000 names using computer generated random numbers. Suppose we wish to sample 2% of all students. We could select every 50th student according to the student number. This method is satisfactory because the allocation of student number is unlikely to be related in any way to monthly expenditure.
Accuracy of the estimate and sample size
Suppose we survey a number of students on their monthly expenditure and use the mean expenditure of this sample as an estimate of the average monthly expenditure of all students in the university. How accurate is this estimate?
How can we assess the accuracy of our estimate if we do not know what the actual average monthly expenditure of all the students in the university? It is easier to approach the question in another way. You may remember the experiment in your physics lessons at school when you try to measure how long an object takes to fall 100 metres. You will consider your results to be accurate if you obtain roughly the same answer if you repeat your experiment several times. In the same way, we can ask ourselves another question: if we repeat our survey and estimate many times, how widely will the result vary from one another? If the estimates do not vary at all, then we can infer that our estimates are pretty accurate. Conversely, if the estimates vary widely, we can say that they are inaccurate. Standard deviation is a good measure of the spread of observations, especially when they are roughly normally distributed.3
Clearly, the accuracy depends on the number of students we survey. If we base our estimates on only a small number of students, these few students might by chance have a significantly higher or lower expenditure than average. Spurious results are less likely if our estimates are based on a large number of students.
Suppose we sample only one student in each survey and repeat the surveys a hundred times. What will be the average estimate and how widely will the results vary from one another? Each estimate will be based only on the expenditure of one student. So, the spread of these estimates is exactly the same as the spread of the expenditure amongst individual students. In other words, the standard deviation of the estimates will be identical to the standard deviation of the monthly expenditure of each student. Figure 1 shows the example of 100 such estimates in a university, each based on the expenditure of a single student. In this example, the mean expenditure is £453 per month and the standard deviation is £68.
Fig 1 Estimates from 100 surveys (one student per survey)Glossary
Suppose instead of sampling only one student, we base our estimates on a sample of four students in each of the 100 surveys. Each estimate is now based on the mean expenditure of four students. If we carry out the survey many times, our mean estimate will simply represent the mean average of the means of groups of four students. So, we would expect our mean estimate will be the same as the mean expenditure of all students. We would expect less variation in our estimates than if we were to sample one student in each survey because the unusually high or low values are likely to be averaged out. In other words, our estimates have become more accurate. In fact, the standard deviation is only half that if we had only sampled one student each survey. Figure 2 shows the distribution of 100 estimates each from the mean of four students in the same university as in figure 1. The mean is again £453 per month but the standard deviation is now reduced to £34. Comparing figures 1 and 2, you will see that although our mean estimates are identical, it is more accurate if we sample four (fig 1) rather than one (fig 2) student in each survey.
Fig 2 Estimates from 100 surveys (mean of four students per survey)Key points
The spread of our estimates is a good measure of how accurate they are. The smaller the spread, the more accurate are our estimates. One useful measure of this spread is the standard deviation of our estimates. The standard error of the mean uses the standard deviation and shows the degree of uncertainty in calculating an estimate from a sample. As we saw in the example above, it depends on the spread in the study population and the number of subjects we sample (sample size). The formula is SEM=SD/n where SD is the standard deviation in the study population and n is the sample size.
Confidence intervals for a single estimate
Although standard error of the mean gives an indication of the variability of our estimates, it gives no idea about where the actual mean of the population might lie. It would be useful to have a range of values for which we are, say, 95% certain that the actual population mean falls within. We can use the standard error to calculate this.
Again, it would be easier to frame the question slightly differently: if we repeat our estimates many times, what is the range of estimates it would fall within on 95% of occasions? In our previous example with four students per survey, the mean estimate is £453 per month and the standard error of the mean is £33. You will remember that about 95% of all observations fall within two standard deviations from the mean. The range of estimates which would fall in 95% of the cases is between:
mean-2SEM=453-(2*£33)=£387 and mean+2SEM=453+(2*£33)=£519
We sometimes write the results as £453 (95% confidence interval £387 to £519). Although 95% confidence intervals are widely used, it is also possible to report 90% or 99% confidence intervals.
Drawing conclusions from confidence intervals
In the example above, we saw how the confidence interval for a single value estimate gives an idea about the range of values between which the real mean might actually fall with 95% probability. Confidence intervals for estimates calculated from more than one measurement, however, such as the difference or ratio of two measurements, often allow us to draw useful conclusions.
Suppose we have sampled 500 students from each university and found that the mean expenditure of these samples of students were £470 and £430 at universities A and B. The estimated difference between the universities is £40. We can calculate the 95% confidence interval for the difference between universities A and B, and although you don't need to know how it is done, you should know how to interpret it. Suppose the 95% confidence interval for the difference (A-B) is £40 (£10 to £70). It tells us that if we repeat our survey many times, the difference between the two universities will be between £10 and £70 on 95% of occasions. Not only does it give us an idea about the likely size of the difference, we can also conclude that it is unlikely that there is no difference between universities A and B, since the lower confidence limit--that is, the minimum likely difference, £10--is positive.
What about if the 95% confidence interval for the difference (A-B) was £40 (-£10 to £90)? This also gives us an idea about the likely size of the difference. The wider confidence interval tells us we are less confident about our estimates. Further, it tells us that the likely difference can be positive (up to £90) or negative (down to -£10) and we cannot draw any conclusions whether the expenditure in university A is higher or lower than in university B.
Key points
- Collecting data from an entire population is often impossible. Generalising the findings in a proportion (or sample) of subjects may be more practical
- Bias can be avoided by ensuring that every subject in the study population has an equal chance of being sampled
- Even with fair sampling, subjects might differ from the study population by chance
- The standard error of the mean (SEM) is a good estimate of the accuracy of estimates. The smaller the SEM, the more accurate our estimate is
- The standard error of the mean is larger with larger standard deviation in the subject population and smaller sample size
- Confidence intervals can be calculated from the estimated mean and standard error of the mean
- The larger the sample size, the smaller the confidence interval will be and hence the estimate will be more accurate
- Confidence intervals for the difference between two estimates allow comparison of the two estimates
Glossary
study population--total group of
subjects which we are interested in investigating
study sample--proportion of the study population that we select to collect information on
sample size--number of subjects in the study sample
standard error of the mean (SEM)--standard deviation (SD) of the sample mean obtained if we repeatedly sample subjects and calculate their means; measure of how accurately our sample mean estimates the true mean in the study population given by SEM=SD/n
95% confidence interval--range of values in which our estimates will fall on 95% of the occasions if we repeat our study many times; thought of as a range of values for which we are 95% certain that the actual population mean falls within; upper confidence limit is given by mean +2SEM and the lower confidence limit by mean -2SEM
Wai-Ching Leung, locum general practitioner
Email: w.c.leung@doctors.org.uk
studentBMJ 2002;10:397-440 November ISSN 0966-6494
- Leung WC. Testing hypotheses. studentBMJ 2002;10:367-8. (October.)
Leung WC. Conducting a survey. studentBMJ 2001;9:143-4.
- Leung WC. Summarising information. studentBMJ 2002;10:311-2. (September.)