Question 1

What is linear regression and what are regression coefficients?

Accepted Answer

Linear regression models the relationship between a dependent variable (Y) and one or more independent variables (X) by fitting a straight line: Y = b0 + b1*X. The regression coefficients are b0 (intercept, the predicted Y when X=0) and b1 (slope, the change in Y for each unit increase in X). In biostatistics, regression is used to model dose-response relationships, predict outcomes from biomarkers, and adjust for confounding variables. The coefficients are estimated using the ordinary least squares method, which minimizes the sum of squared residuals between observed and predicted values.

Question 2

How do I interpret the slope coefficient (b1)?

Accepted Answer

The slope b1 represents the average change in Y for every one-unit increase in X, holding all else constant. If b1 = 2.05, then for each unit increase in X, Y increases by 2.05 units on average. A positive slope indicates a positive relationship, negative indicates inverse. The p-value for the slope tests whether b1 is significantly different from zero. In biological contexts, for example, if X is drug dosage (mg) and Y is blood pressure reduction (mmHg), b1 = -0.5 means each additional mg reduces blood pressure by 0.5 mmHg on average.

Question 3

What does R-squared tell me about my regression model?

Accepted Answer

R-squared (coefficient of determination) measures the proportion of variance in Y explained by the regression model. R-squared = 0.85 means 85% of the variability in Y is accounted for by the linear relationship with X. Adjusted R-squared corrects for sample size and is more appropriate for comparing models. However, a high R-squared does not mean the model is correct. Always examine residual plots for patterns that suggest non-linearity. In biological data, R-squared values of 0.7+ are generally considered good, but this varies by field.

Question 4

What are the assumptions of linear regression?

Accepted Answer

Linear regression assumes: (1) Linearity - the relationship between X and Y is linear. (2) Independence - observations are independent. (3) Homoscedasticity - the variance of residuals is constant across all X values. (4) Normality - residuals are approximately normally distributed. (5) No significant outliers or influential points. Violations can lead to biased estimates, incorrect standard errors, and invalid p-values. Check these by examining residual plots: residuals vs fitted values for linearity and homoscedasticity, Q-Q plots for normality, and Cook's distance for influential points.

Regression Coefficient Calculator

Formula

Worked Examples

Example 1: Drug Dose-Response Relationship

Example 2: Body Weight and Metabolic Rate

Frequently Asked Questions

What is linear regression and what are regression coefficients?

How do I interpret the slope coefficient (b1)?

What does R-squared tell me about my regression model?

What are the assumptions of linear regression?

References