QCE General Mathematics - Unit 3 - Bivariate data analysis 2

Fitting a Linear Model to Numerical Data | QCE General Mathematics

Learn least-squares lines, residual plots, interpolation, extrapolation and prediction for QCE General Mathematics bivariate data.

Updated 2026-05-18 - 5 min read

QCAA official coverage - General Mathematics 2025 v1.3

Exact syllabus points covered

  1. Model a linear relationship by using technology to fit a least-squares line to the data, in the form $y=mx+c$ where $m$ is slope (gradient) and $c$ is $y$-intercept.
  2. Understand and use $m=r\frac{s_y}{s_x}$ and $c=\bar y-m\bar x$ to determine the equation of a least-squares line, where $r$ is correlation coefficient, $s_y$ and $s_x$ are sample standard deviations, and $\bar y$ and $\bar x$ are means.
  3. Construct a residual plot and use it to assess the appropriateness of fitting a linear model to the data.
  4. Interpret the $y$-intercept and slope (gradient) of the fitted line.
  5. Distinguish between interpolation and extrapolation.
  6. Use the equation of the least-squares line to make predictions.
  7. Recognise and explain the potential dangers of extrapolation.

A linear model summarises a roughly straight scatterplot with an equation. In General Mathematics the fitted line is usually found with technology, then interpreted and tested for reasonableness.

Least-squares line and residuals

Original Sylligence diagram for general least squares residuals.

Least-squares line and residuals

The least-squares line

The least-squares line has the form:

$ y=mx+c $

where $m$ is the slope and $c$ is the $y$-intercept. The slope tells you the predicted change in $y$ for a one-unit increase in $x$. The intercept tells you the predicted $y$ value when $x=0$, but it is only meaningful if $x=0$ makes sense in the context.

The syllabus also gives:

$ m=r\frac{s_y}{s_x} $

and

$ c=\bar y-m\bar x $

where $r$ is the correlation coefficient, $s_x$ and $s_y$ are sample standard deviations, and $\bar x,\bar y$ are sample means.

Residuals

A residual is:

$ \text{residual}=\text{actual }y-\text{predicted }y $

Residual plots check whether a straight-line model is appropriate. A good residual plot has points scattered randomly around zero with no obvious curve or fan shape. A curved residual pattern suggests the original relationship is not well modelled by a straight line.

Worked example

Interpolation and extrapolation

Interpolation means predicting inside the observed $x$ range. Extrapolation means predicting outside it. Extrapolation is risky because the trend may not continue. A phone's value cannot keep decreasing below zero forever, and a plant's height cannot keep increasing linearly without limit.

Calculating a line from summary statistics

Technology often gives the least-squares line directly. If summary statistics are provided, use:

$ m=r\frac{s_y}{s_x} $

then substitute into:

$ c=\bar y-m\bar x $

This order matters. Find the gradient first, then use the mean point $(\bar x,\bar y)$ to find the intercept.

Model diagnostics

Residuals show what the line misses. A residual plot should be read like a diagnostic graph:

| Residual pattern | Interpretation | |---|---| | random scatter around zero | linear model is reasonably appropriate | | U-shape or curve | a non-linear model may be better | | fan shape | variation changes as $x$ changes | | one very large residual | possible outlier or unusual observation |

Do not only report the equation. A complete modelling answer should say whether the equation is appropriate for the data.

Prediction boundaries

Interpolation uses an $x$ value inside the observed range. Extrapolation uses an $x$ value outside the observed range. Extrapolation can become absurd even if the equation is mathematically easy to use. A model for test mark versus hours studied might predict more than $100\%$ if used far beyond the observed hours.

Depth: least-squares line and residuals

The least-squares regression line is the line that minimises the sum of the squared residuals. A residual is:

$ \text{residual}=y-\hat{y} $

where $y$ is the observed value and $\hat{y}$ is the value predicted by the model. A positive residual means the actual point is above the fitted line. A negative residual means the actual point is below the fitted line.

If summary statistics are supplied, the slope can be found using:

$ b=r\frac{s_y}{s_x} $

and the intercept is:

$ a=\bar{y}-b\bar{x} $

so the fitted line is $\hat{y}=a+bx$.

Checking whether the model is suitable

A linear model is more suitable when:

  • the scatterplot is roughly linear
  • the association is reasonably strong
  • residuals are scattered randomly around zero
  • there is no obvious curve or fan shape in the residual plot
  • predictions are made within the data range

| Residual pattern | Meaning | |---|---| | random scatter around zero | linear model may be suitable | | curved pattern | a non-linear model may be better | | increasing spread | prediction accuracy changes across the range | | one very large residual | an outlier may be influencing the model |

Interpolation and extrapolation

Interpolation means predicting within the range of observed $x$ values. Extrapolation means predicting outside the observed range. Extrapolation is riskier because the pattern may not continue.

Reporting predictions

A prediction answer should include:

  1. the substituted value of $x$
  2. the predicted value $\hat{y}$
  3. units and context
  4. a comment on reliability if the prediction is outside the data range

Quick check

Sources