Information bias is a type of error that occurs when key study variables are mismeasured or misclassified. Information bias can affect the results of observational or experimental studies due to systematic differences in how data are obtained from different study groups. Information bias is also called measurement bias or misclassification.
Information bias occurs when the information used in a study is either measured or recorded inaccurately. These measurements can take various forms, such as:
Information bias is one of the most common sources of bias in research . This affects the validity of observational studies, as well as clinical experiments and trials. Information bias can occur when:
The study is not double-blind: the researchers know whether a participant is assigned to the control group or the experimental group.
Researchers use different methods to assess the results in each group. For example, use medical records for one group and self-assessment questionnaires for the other when studying the state of the disease.
The independent variable (e.g., exposure to toxic substances) and/or the dependent variable (e.g., lung cancer risk) are recorded inaccurately. This may be due to errors in recording an individual's medical history, differing definitions of the disease, or varying diagnostic criteria among experts.
Instruments intended for objective measurements (e.g. weight) are not properly calibrated, results are not recorded correctly, or data is switched during the data entry or cleaning phase.
In general, information biases tend to produce erroneous results or conclusions that systematically differ from the truth.
Information bias can arise from non-differential misclassification if the experimental and control groups are affected in the same way, or from differential misclassification if it affects one group more than the other. Here, misclassification refers to classifying an individual or attribute in a category other than the one to which it should be assigned.
A non-differential classification error is caused by equally inaccurate measurements in all study groups. This can occur when study participants in both comparison groups have difficulty accurately recalling something that is not objectively verifiable, such as alcohol consumption levels.
A poor non-differential classification tends to give the impression that groups are more similar than they actually are. This also leads researchers to underestimate the association between variables (for example, between alcohol consumption and the risk of lung cancer).
A differential classification error is caused by a difference in measurement that exists between study groups, such as a group of’case study and a control group. Since participants in the case group already possess an attribute, such as a specific health problem, they may be able to recall prior exposure to risk factors more accurately than the healthy control group.
Poor differential classification can lead to an underestimation or overestimation of the association between variables.
Information bias is a general term describing systematic errors in the way data is collected or measured. There are several types of information bias:
Recall bias occurs when participants in one study group are better able to remember past events or behaviors than those in the other group.
Observer bias occurs when researchers know the hypothesis being studied or which group each participant is assigned to. This information can influence how researchers collect, measure, or interpret data.
Performance bias refers to situations in which researchers or study participants alter their behavior or responses because they are aware of the group allocation, i.e., they know who is in the control group and who is in the treatment group.
Mean regression is a phenomenon in which a variable that exhibits an extreme value (outlier) on its first measurement (above or below the mean) will tend to be closer to the mean on a second measurement. Mean regression can lead researchers to believe that an intervention or treatment is more effective than it actually is.
Recall bias has a significant impact on individuals' memories, leading them to think differently about what actually happened. However, life events compel them to generate new memories based on new experiences. The main causes of recall bias are as follows:
Social desirability bias
The primary cause of recall bias is social desirability bias. Many people alter the interpretation of their memories to make them more appealing and interesting, to be desired by others, and to meet social expectations. They may exaggerate or underestimate their interpretation, thus causing recall bias.
Limitations of the human brain
Another cause of recall bias is the limitations of the human brain. The brain is designed to hold a large amount of data and information. However, it sometimes cannot retain all of it. In order to integrate new information, it overwrites old data. Furthermore, old data can degrade and cause recall bias.
Selective recall bias
Some people remember or forget certain life events based on their tastes, emotional relevance, and personal biases. This is also a cause of recall bias. Events that have a significant mental or emotional impact are more likely to be remembered than those with a weak impact.
Telescoping effect
The telescoping effect is also a cause of recall bias. Most people forget the exact timeline or time period of events. However, when they report, they compress these events to make them recent or extend them into the distant past. This inability to remember the correct timeline of events leads to recall bias.
Mental health issues
People's memory can also be affected by particular mental health conditions such as amnesia, Alzheimer's disease, stress or depression. These mental health conditions cause people to forget significantly. They confuse old memories with new ones when reporting. This also leads to recall bias.
“Imagine a study aimed at understanding the relationship between pesticide exposure and the development of certain diseases. All study participants are asked to recall their recent exposure to pesticides.
When responses were collected from sick people, they easily recalled their recent pesticide exposure because they were more interested in knowing the cause of their illness. In contrast, when responses were collected from healthy people, they were unable to recall any recent or past pesticide exposures.
This usually occurs when a researcher is aware of the objective and assumptions of a study and has expectations about what will happen.
If a researcher tries to find a result particularly to support the hypothesis of the study or has a predetermined idea of what the results should be, it will be incentivized to distort the data to make them more consistent with its predictions.
This bias occurs most often in observational studies or any type of research where measurements are taken and recorded manually. In observational studies, a researcher records behaviors or takes measurements from participants without attempting to influence the outcome of the experiment.
Observational studies are used in a number of different research fields, most notably medicine, psychology, behavioral science, and ethnography.
«"You are conducting an observational study to investigate the effects of a new anti-nausea drug. Group A receives the actual treatment with the new drug, while group B receives a placebo.".
Participants do not know which group they belong to, but you, the researcher, do.
Subconsciously, you treat the two groups differently, asking more negative questions about group B and noting that the members of group A seem more energetic and optimistic.»
Performance bias occurs when one group of subjects in an experiment (for example, a control group or a treatment group) receives more attention from researchers than another group. The difference in the level of care leads to systematic differences between the groups, making it difficult, if not impossible, to conclude that a drug or other intervention caused an effect, as opposed to the level of care. A similar bias is verification bias, where outcomes are more likely to be found in treatment groups because researchers know which individuals belong to which group.
Performance bias can also refer to the fact that participants may alter their responses or behavior if they know which group they are assigned to. For example, if a weight-loss study is investigating whether a high-protein diet works to reduce weight, participants may increase their protein intake. This particular type of bias is also known as the Hawthorne effect.
Performance bias is a major threat to internal validity. Internal validity is a measure of the reliability of your results, or how confident you are that the outcome of your experiment is due to a single independent variable.
"If researchers know that an experimental group received an active drug, they can focus their attention on that group. Participants could be screened more frequently and given more diagnostic tests. The experimental group might be more likely to have a positive result, not because they received an active drug, but because they received more focused attention."
Regression to the mean may prove to be problematic, particularly in research studies that measure the effectiveness of an intervention, program or policy.
It can mislead researchers into believing that an intervention caused an observed change, when in fact it was caused by chance. This is particularly evident when researchers focus on measures of people, cases, or organizations at the extremes, such as the least successful, the most educated, or the most unhealthy.
RTM shows us that statistically, the lowest cases are likely to improve the second time around, while those at their peak will likely perform worse even without intervention. Because it can skew results, you should consider regression to the mean when designing your research as well as when analyzing your results.
Otherwise, you run the risk of attributing some results to a particular cause, when in reality they are most likely due to chance. Regression to the mean occurs when a non-random sample is selected from a population and you measure two imperfectly correlated variables, such as two consecutive measurements of blood pressure.
The lower the correlation between the two variables, the greater the effect of RTM.
The more extreme the value of the population mean, the more room there is for regression toward the mean.
Regression to the mean can be explained by considering, for example, that skill and performance are imperfectly correlated because of the role of luck. This can lead you to find a cause-and-effect relationship where there is none.
“You want to know if a self-paced online course can help middle school students fill in their math gaps. A school in your area agrees to participate in the pilot study.
To find out which students need it most, you administer a math test to a class of 8th graders. You choose the 10 lowest-performing students and assign them to the online course.
After the course is over, the 10 lowest-performing % students take another test. Their results, on average, show improvement. The principal, satisfied with the result, decides to launch the online course for all 8th grade students who have lower results in mathematics.
At the end of the year, these students' results are not much better than they were the year before. They certainly have not improved to the degree you would expect based on the results of the 10 lowest-performing % students.
The problem here is regression to the mean. Among the students who did poorly on the first test, there were also some who did poorly because of chance: perhaps because they didn't sleep well the night before, because they were sick, or because they were stressed. These students were going to do better on the second test, regardless of the intervention (the online program). So they pushed up the average score of the 10 worst-performing %s."
Information bias arises from the approach used to collect or measure the data in your study. There are several steps you can take to minimize information bias when collecting data: