Prediction Study Pack

Kibin's free study pack on Prediction includes a 3-section study guide, 8 quiz questions, 10 flashcards, and 1 open-ended Explain review question. Sign up free to track your progress toward mastery, plus upload your own notes and recordings to create personalized study packs organized by course.

Last updated May 21, 2026

Topic mastery0%

Prediction Study Guide

Master the mechanics of simple linear regression and prediction by working through the least-squares equation ŷ = a + bx, the role of Pearson's r in calculating slope, and the coefficient of determination r². This pack also clarifies the critical distinction between valid interpolation and unreliable extrapolation, helping you know exactly when a regression line can and cannot be trusted for prediction.

Key Takeaways

  • In simple linear regression, the least-squares regression line is expressed as ŷ = a + bx, where b is the slope and a is the y-intercept, both calculated from sample data.
  • The slope b equals r(sy/sx), meaning it is directly tied to the Pearson correlation coefficient and the ratio of the standard deviations of y and x.
  • The point (x̄, ȳ) — the means of both variables — always lies exactly on the least-squares regression line.
  • Prediction using the regression equation is only valid within the range of the original data (interpolation); applying the equation beyond that range is called extrapolation and produces unreliable results.
  • The coefficient of determination r² measures the proportion of total variation in y that is explained by the linear relationship with x, ranging from 0 to 1.
  • A regression line should only be used for prediction when a statistically significant linear relationship between x and y has been established.

Building the Least-Squares Regression Line

The foundation of linear prediction is a single equation — the least-squares regression line — that minimizes the total squared vertical distance between observed data points and the line itself.

General Form of the Regression Equation

  • The equation is written ŷ = a + bx, where ŷ (read 'y-hat') is the predicted value of the response variable for a given value of the explanatory variable x.
  • The symbol ŷ signals that the output is a prediction, not a guaranteed observed value — actual data points will rarely fall exactly on the line.
  • This equation describes a straight line, so it applies only to relationships that are linear in nature.

Calculating the Slope (b)

  • The slope is computed as b = r · (sy / sx), where r is the Pearson correlation coefficient, sy is the standard deviation of the y-values, and sx is the standard deviation of the x-values.
  • A positive slope indicates that as x increases, predicted y also increases; a negative slope indicates the opposite.
  • The magnitude of the slope tells you how many units ŷ changes for each one-unit increase in x.

Calculating the Y-Intercept (a)

  • Once the slope is known, the y-intercept is found using a = ȳ − b · x̄, where x̄ and ȳ are the sample means of x and y respectively.
  • This formula guarantees that the point (x̄, ȳ) lies on the regression line — a key property of least-squares lines.
  • The y-intercept represents the predicted value of y when x equals zero, though this value is only meaningful in context if x = 0 is a realistic scenario.

The Role of Correlation in Regression

The Pearson correlation coefficient r and the coefficient of determination r² both measure how well the linear model fits the data, and they directly influence how much confidence you should place in any prediction.

Pearson Correlation Coefficient (r)

  • The value of r ranges from −1 to +1 and quantifies the strength and direction of a linear relationship between two quantitative variables.
  • Values of r close to +1 or −1 indicate a strong linear relationship; values near 0 suggest little to no linear association.
  • Because the slope formula b = r · (sy / sx) contains r directly, a weak correlation produces a slope pulled closer to zero, flattening the regression line.

Coefficient of Determination (r²)

  • The coefficient of determination r² is calculated by squaring the correlation coefficient and is interpreted as the proportion of the total variation in y that is explained by the linear regression on x.
  • For example, r² = 0.80 means that 80% of the variability in y is accounted for by the regression model, while the remaining 20% is due to other factors not captured by x.
  • A higher r² value indicates that predictions made from the regression line will tend to be more accurate.

About this Study Pack

Created by Kibin to help students review key concepts, prepare for exams, and study more effectively. This Study Pack was checked for accuracy and curriculum alignment using authoritative educational sources. See sources below.

Sources

More in Statistics

See all topics →

Browse other courses

See all courses →