QCE General Mathematics - Unit 3 - Bivariate data analysis 2
Fitting a Linear Model to Numerical Data | QCE General Mathematics
Learn least-squares lines, residual plots, interpolation, extrapolation and prediction for QCE General Mathematics bivariate data.
Updated 2026-05-18 - 5 min read
QCAA official coverage - General Mathematics 2025 v1.3
Exact syllabus points covered
- Model a linear relationship by using technology to fit a least-squares line to the data, in the form $y=mx+c$ where $m$ is slope (gradient) and $c$ is $y$-intercept.
- Understand and use $m=r\frac{s_y}{s_x}$ and $c=\bar y-m\bar x$ to determine the equation of a least-squares line, where $r$ is correlation coefficient, $s_y$ and $s_x$ are sample standard deviations, and $\bar y$ and $\bar x$ are means.
- Construct a residual plot and use it to assess the appropriateness of fitting a linear model to the data.
- Interpret the $y$-intercept and slope (gradient) of the fitted line.
- Distinguish between interpolation and extrapolation.
- Use the equation of the least-squares line to make predictions.
- Recognise and explain the potential dangers of extrapolation.
A linear model summarises a roughly straight scatterplot with an equation. In General Mathematics the fitted line is usually found with technology, then interpreted and tested for reasonableness.
Original Sylligence diagram for general least squares residuals.
The least-squares line
The least-squares line has the form:
$ y=mx+c $
where $m$ is the slope and $c$ is the $y$-intercept. The slope tells you the predicted change in $y$ for a one-unit increase in $x$. The intercept tells you the predicted $y$ value when $x=0$, but it is only meaningful if $x=0$ makes sense in the context.
The syllabus also gives:
$ m=r\frac{s_y}{s_x} $
and
$ c=\bar y-m\bar x $
where $r$ is the correlation coefficient, $s_x$ and $s_y$ are sample standard deviations, and $\bar x,\bar y$ are sample means.
Residuals
A residual is:
$ \text{residual}=\text{actual }y-\text{predicted }y $
Residual plots check whether a straight-line model is appropriate. A good residual plot has points scattered randomly around zero with no obvious curve or fan shape. A curved residual pattern suggests the original relationship is not well modelled by a straight line.
Worked example
Interpolation and extrapolation
Interpolation means predicting inside the observed $x$ range. Extrapolation means predicting outside it. Extrapolation is risky because the trend may not continue. A phone's value cannot keep decreasing below zero forever, and a plant's height cannot keep increasing linearly without limit.
Calculating a line from summary statistics
Technology often gives the least-squares line directly. If summary statistics are provided, use:
$ m=r\frac{s_y}{s_x} $
then substitute into:
$ c=\bar y-m\bar x $
This order matters. Find the gradient first, then use the mean point $(\bar x,\bar y)$ to find the intercept.
Model diagnostics
Residuals show what the line misses. A residual plot should be read like a diagnostic graph:
| Residual pattern | Interpretation | |---|---| | random scatter around zero | linear model is reasonably appropriate | | U-shape or curve | a non-linear model may be better | | fan shape | variation changes as $x$ changes | | one very large residual | possible outlier or unusual observation |
Do not only report the equation. A complete modelling answer should say whether the equation is appropriate for the data.
Prediction boundaries
Interpolation uses an $x$ value inside the observed range. Extrapolation uses an $x$ value outside the observed range. Extrapolation can become absurd even if the equation is mathematically easy to use. A model for test mark versus hours studied might predict more than $100\%$ if used far beyond the observed hours.
Depth: least-squares line and residuals
The least-squares regression line is the line that minimises the sum of the squared residuals. A residual is:
$ \text{residual}=y-\hat{y} $
where $y$ is the observed value and $\hat{y}$ is the value predicted by the model. A positive residual means the actual point is above the fitted line. A negative residual means the actual point is below the fitted line.
If summary statistics are supplied, the slope can be found using:
$ b=r\frac{s_y}{s_x} $
and the intercept is:
$ a=\bar{y}-b\bar{x} $
so the fitted line is $\hat{y}=a+bx$.
Checking whether the model is suitable
A linear model is more suitable when:
- the scatterplot is roughly linear
- the association is reasonably strong
- residuals are scattered randomly around zero
- there is no obvious curve or fan shape in the residual plot
- predictions are made within the data range
| Residual pattern | Meaning | |---|---| | random scatter around zero | linear model may be suitable | | curved pattern | a non-linear model may be better | | increasing spread | prediction accuracy changes across the range | | one very large residual | an outlier may be influencing the model |
Interpolation and extrapolation
Interpolation means predicting within the range of observed $x$ values. Extrapolation means predicting outside the observed range. Extrapolation is riskier because the pattern may not continue.
Reporting predictions
A prediction answer should include:
- the substituted value of $x$
- the predicted value $\hat{y}$
- units and context
- a comment on reliability if the prediction is outside the data range