QCE General Mathematics - Unit 3 - Bivariate data analysis 2

Fitting a Linear Model to Numerical Data | QCE General Mathematics

Learn least-squares lines, residual plots, interpolation, extrapolation and prediction for QCE General Mathematics bivariate data.

Updated 2026-05-18 - 5 min read

QCAA official coverage - General Mathematics 2025 v1.3

Exact syllabus points covered

Model a linear relationship by using technology to fit a least-squares line to the data, in the form $y=mx+c$ where $m$ is slope (gradient) and $c$ is $y$-intercept.
Understand and use $m=r\frac{s_y}{s_x}$ and $c=\bar y-m\bar x$ to determine the equation of a least-squares line, where $r$ is correlation coefficient, $s_y$ and $s_x$ are sample standard deviations, and $\bar y$ and $\bar x$ are means.
Construct a residual plot and use it to assess the appropriateness of fitting a linear model to the data.
Interpret the $y$-intercept and slope (gradient) of the fitted line.
Distinguish between interpolation and extrapolation.
Use the equation of the least-squares line to make predictions.
Recognise and explain the potential dangers of extrapolation.

A linear model summarises a roughly straight scatterplot with an equation. In General Mathematics the fitted line is usually found with technology, then interpreted and tested for reasonableness.

Least-squares line and residuals

Original Sylligence diagram for general least squares residuals.

Least-squares line and residuals

The least-squares line

The least-squares line has the form:

$ y=mx+c $

where $m$ is the slope and $c$ is the $y$-intercept. The slope tells you the predicted change in $y$ for a one-unit increase in $x$. The intercept tells you the predicted $y$ value when $x=0$, but it is only meaningful if $x=0$ makes sense in the context.

The syllabus also gives:

$ m=r\frac{s_y}{s_x} $

and

$ c=\bar y-m\bar x $

where $r$ is the correlation coefficient, $s_x$ and $s_y$ are sample standard deviations, and $\bar x,\bar y$ are sample means.

Residuals

A residual is:

$ \text{residual}=\text{actual }y-\text{predicted }y $

Residual plots check whether a straight-line model is appropriate. A good residual plot has points scattered randomly around zero with no obvious curve or fan shape. A curved residual pattern suggests the original relationship is not well modelled by a straight line.

Worked example

Interpolation and extrapolation

Interpolation means predicting inside the observed $x$ range. Extrapolation means predicting outside it. Extrapolation is risky because the trend may not continue. A phone's value cannot keep decreasing below zero forever, and a plant's height cannot keep increasing linearly without limit.

Calculating a line from summary statistics

Technology often gives the least-squares line directly. If summary statistics are provided, use:

$ m=r\frac{s_y}{s_x} $

then substitute into:

$ c=\bar y-m\bar x $

This order matters. Find the gradient first, then use the mean point $(\bar x,\bar y)$ to find the intercept.

Model diagnostics

Residuals show what the line misses. A residual plot should be read like a diagnostic graph:

| Residual pattern | Interpretation | |---|---| | random scatter around zero | linear model is reasonably appropriate | | U-shape or curve | a non-linear model may be better | | fan shape | variation changes as $x$ changes | | one very large residual | possible outlier or unusual observation |

Do not only report the equation. A complete modelling answer should say whether the equation is appropriate for the data.

Prediction boundaries

Interpolation uses an $x$ value inside the observed range. Extrapolation uses an $x$ value outside the observed range. Extrapolation can become absurd even if the equation is mathematically easy to use. A model for test mark versus hours studied might predict more than $100\%$ if used far beyond the observed hours.

Depth: least-squares line and residuals

The least-squares regression line is the line that minimises the sum of the squared residuals. A residual is:

$ \text{residual}=y-\hat{y} $

where $y$ is the observed value and $\hat{y}$ is the value predicted by the model. A positive residual means the actual point is above the fitted line. A negative residual means the actual point is below the fitted line.

If summary statistics are supplied, the slope can be found using:

$ b=r\frac{s_y}{s_x} $

and the intercept is:

$ a=\bar{y}-b\bar{x} $

so the fitted line is $\hat{y}=a+bx$.

Checking whether the model is suitable

A linear model is more suitable when:

the scatterplot is roughly linear
the association is reasonably strong
residuals are scattered randomly around zero
there is no obvious curve or fan shape in the residual plot
predictions are made within the data range

| Residual pattern | Meaning | |---|---| | random scatter around zero | linear model may be suitable | | curved pattern | a non-linear model may be better | | increasing spread | prediction accuracy changes across the range | | one very large residual | an outlier may be influencing the model |

Interpolation and extrapolation

Interpolation means predicting within the range of observed $x$ values. Extrapolation means predicting outside the observed range. Extrapolation is riskier because the pattern may not continue.

Fitting a Linear Model to Numerical Data | QCE General Mathematics

Exact syllabus points covered

The least-squares line

Residuals

Worked example

Interpolation and extrapolation

Calculating a line from summary statistics

Model diagnostics

Prediction boundaries

Depth: least-squares line and residuals

Checking whether the model is suitable

Interpolation and extrapolation

Reporting predictions

Quick check

Sources