Sampling bias

Sampling bias occurs when certain members of a population are consistently more likely to be selected from a sample than others. This is also referred to as verification bias in medical fields. Sampling bias limits the generalizability of results because it poses a threat to external validity, particularly population validity. In other words, results from biased samples can only be generalized to populations that share characteristics with the sample.

Causes

Your choice of design research or method data collection may result in sampling bias. This type of research bias can occur in both probability and non-probability sampling.

In probability sampling, each member of the population has a known chance of being selected. For example, you can use a random number generator to select a simple random sample from your population.

Although this procedure reduces the risk of sampling bias, it cannot eliminate it. If your sampling frame – the actual list of individuals from which the sample is drawn – does not match the population, this can result in a biased sample.

You want to study the levels of procrastination and social anxiety among undergraduate students at your university using a simple random sample. You assign a number to each student in the research participant database, from 1 to 1,500, and use a random number generator to select 120 numbers.

Even though you used a random sample, not all members of your target population—the undergraduate students at your university—had the chance to be selected. Your sample does not include everyone who did not register to be contacted about their participation in the research. This may skew your sample toward individuals who experience less social anxiety and are more willing to participate in the research.

A non-probability sample is selected based on non-random criteria. For example, in a convenience sample, participants are selected based on their accessibility and availability.

Non-probability sampling often results in biased samples, as some members of the population are more likely to be included than others.

You want to study the popularity of plant-based foods among undergraduate students at your university. For convenience, you send a survey to everyone enrolled in introductory psychology courses at your university. They all complete it in exchange for course credits.

Since this is a convenience sample, it is not representative of your target population. People taking this course may be more liberal and attracted to plant-based foods than other students at your university.

Types of Sampling Bias

Self-selection bias

People with specific characteristics are more likely than others to agree to participate in a study.
People who seek more thrills are likely to participate in pain research studies. This can skew the data.

Non-response bias

People who refuse to participate or drop out of a study differ systematically from those who participate.
In a study on stress and workload, employees with high workload were less likely to participate. The resulting sample may not vary significantly in terms of workload.

Undercount bias

Some members of a population are insufficiently represented in the sample.
Administering general national surveys online risks overlooking groups with limited Internet access, such as older adults and low-income households.

Survivorship bias

Successful observations, people and objects are more likely to be represented in the sample than unsuccessful observations.
In scientific journals, there is a strong publication bias towards positive results. Successful research results are published much more often than unsuccessful results.

Pre-selection or advertising bias

The way participants are pre-selected or where a study is advertised can bias a sample.
When looking for volunteers to test a new sleep intervention, you may end up with a sample that is more motivated to improve their sleep habits than the rest of the population. As a result, they could have improved their sleep habits regardless of the effects of your intervention.

Healthy User Bias

Volunteers for preventive interventions are more likely to engage in health-promoting behaviors and activities than other members of the population.
A sample participating in a preventive intervention has a better diet, higher levels of physical activity, abstains from alcohol, and avoids smoking more than most of the population. The experimental results may result from the interaction of the treatment with these characteristics of the sample, rather than from the treatment itself.

How to avoid sampling bias

Using careful research design and sampling procedures can help you avoid sampling bias.

Define a target population and a sampling frame (the list of individuals from which the sample will be drawn). Match the sampling frame to the target population as much as possible to reduce the risk of sampling bias.

Make online surveys as short and accessible as possible.

Follow up on non-respondents.

Avoid convenience sampling.

Oversampling can be used to avoid sampling bias in situations where members of defined groups are underrepresented (undercounted). It is a method of selecting respondents from certain groups so that they constitute a larger proportion of the sample than the population.

Once all the data has been collected, the responses of the oversampled groups are weighted according to their actual share of the population in order to eliminate any sampling bias.

Example of correction

A researcher wishes to study the political opinions of different ethnic groups in the United States and focus in depth on Asian Americans, who represent only 5.6% of the US population. The researcher wants to study each ethnic group separately, but also gather enough data on Asian Americans to be able to draw accurate conclusions.

They assemble a nationally representative sample of 1,500 respondents, which oversamples Asian Americans. Random sampling is used to contact American households, and disproportionately larger samples are drawn from areas with higher Asian American populations. Of the 1,500 respondents, 336 are Asian Americans. Based on this sample size, the researcher can be confident in their findings about Asian Americans.

A weighting system is applied to ensure that responses from Asian Americans represent 5.6% of the total. This allows for accurate estimates of the sample as a whole.