Monday, February 8, 2010

A Short Description of Multiple Regression

The context for evidence-based practice revolves around the diagnosis and management of a patient. When a physician is confronted with a patient for whom he or she does not know the best method to proceed, that physician can search the literature to find guidance and direction. One of the challenges of living in a evidence-based world is to be able to then interpret the literature that is uncovered. And because most of us trained as clinical practitioners (for those that are involved in patient care), we do not generally have statistical expertise. We come across terms that confuse us: confidence intervals, p-values, t tests, Pearson’s r, Cronbach’s alpha, linear and multiple regression, analysis of variance (ANOVA), etc. And we simply glitch over those terms, looking to see if our answer is somehow embedded in that paper, but not understanding that answer when it appears. To help, I wish to discuss one form of analysis we often come across: multiple regression.

Multiple regression is nothing more than a statistical method for studying the relationship between several independent or predictor variables and a single dependent or criterion variable. This is a technique widely used in social sciences and increasingly common in biological clinical research. It uses linear equations with more than 2 variables, in the forms y= a + b1x1 + b2x2, where y is the dependent variable, a and b are constant numbers and x1 and x2 are independent variables.

There are two main uses for multiple regression. One is for prediction, and the other is for causal analysis. In prediction studies, what the research is attempting to do is to develop a formula for making predictions about the dependent variable, based on the observed values of the independent variables. (1) (Note: A dependent variable is what you measure in the experiment and what is affected during the experiment. The dependent variable responds to the independent variable. It is called dependent because it "depends" on the independent variable. In a scientific experiment, you cannot have a dependent variable without an independent variable. Further, dependent variables are also called response variables or outcome variables; independent variables may be called predictor variables or explanatory variables). We might, for example, want to predict future episodes of low back pain based on such variables as past episodes of pain, pain intensity, length of episode, and age. In a causal study, the independent variables are seen as the cause of the dependent variable and therefore the aim of the study becomes determining whether a given independent variable affects the dependent variable in a meaningful way, and to estimate how large that effect is. We might have data showing that people who participate in a back school have less severe episodes of later back pain. A multiple regression can determine if this relationship is real or if it could be explained away by the fact that the people who took the back school were younger, fitter and did more exercise than those who did not.

Multiple regression has some truly notable attributes. In prediction studies, it makes it possible to combine variables to optimize predictions about the dependent variable. But, in causal studies it actually separates the effects of independent variables on the dependent variable so you can look at the contribution of each variable on its own.

Some caveats: First, one of the main conceptual problems with regression techniques are that they can ascertain relationships, but cannot be sure about underlying causal mechanisms. Second, the more predictor variables you add to the model, the more likely some will appear to be significant due to chance alone.

References
1. Allison PD. Multiple regression: a primer. Thousand Oaks, CA: Pine Oaks Press, 1999

No comments: