QCE Specialist Mathematics - Unit 4 - Statistical inference

Sample Means | QCE Specialist Mathematics

Learn QCE Specialist sample means, sampling distributions, standard error and approximate normality for large samples.

Updated 2026-05-18 - 5 min read

QCAA official coverage - Specialist Mathematics 2025 v1.4

Exact syllabus points covered

  1. Understand the concept of the sample mean $\bar X$ as a random variable whose value varies between samples where $X$ is a random variable with mean $\mu$ and standard deviation $\sigma$.
  2. Use repeated random sampling data from a variety of distributions and a range of sample sizes to examine properties of the distribution of $\bar X$ across samples of a fixed size $n$, including its mean $\mu$, its standard deviation $\frac{\sigma}{\sqrt n}$, and its approximate normality if $n$ is large.
  3. Recognise and use the link between the normal distribution of the sample mean and the statistical notation $\bar X\sim N\left(\mu,\frac{\sigma^2}{n}\right)$.
  4. Use repeated random sampling data from a variety of distributions and a range of sample sizes to examine the approximate standard normality of $\frac{\bar X-\mu}{s/\sqrt n}$ for large samples, $n\ge30$, where $s$ is the sample standard deviation (Central limit theorem).
  5. Model and solve problems that involve sample means, with and without technology.

Mathematical Methods focuses heavily on sample proportions. Specialist extends the same inference idea to sample means, where each observation is numerical rather than just success or failure.

Suppose a random sample has observations:

$ X_1,X_2,\ldots,X_n. $

Random sample means the observations are independent and identically distributed. Independent means one observation does not change the probability behaviour of another. Identically distributed means the observations come from the same population model, with the same mean and standard deviation.

For example, if $X_1,\ldots,X_{20}$ represent the results of rolling the same fair die twenty times, the variables have the same distribution and are independent. If the die changes halfway through, or if observations are chosen in a biased way, the usual sample-mean formulas no longer automatically apply.

The sample mean is:

$ \bar X=\frac{X_1+X_2+\cdots+X_n}{n}. $

The sample mean is a random variable

Before the sample is collected, $\bar X$ can vary from sample to sample. That means $\bar X$ is a random variable. After data is collected, the observed value is written as $\bar x$.

If the population has mean $\mu$ and standard deviation $\sigma$, then:

$ E(\bar X)=\mu $

and

$ \operatorname{SD}(\bar X)=\frac{\sigma}{\sqrt n}. $

The standard deviation of the sample mean is called the standard error.

Distribution of sample means

Original Sylligence diagram for specialist sample mean.

Distribution of sample means

This is a distribution across repeated samples, not a statement about one sample becoming less spread out. Individual observations still have standard deviation $\sigma$. The sample mean has smaller variation because averaging cancels some of the random variation from observation to observation.

Approximate normality

For large samples, the distribution of $\bar X$ is approximately normal:

$ \bar X\sim N\left(\mu,\frac{\sigma^2}{n}\right). $

If $\sigma$ is unknown, use the sample standard deviation $s$. For large samples, QCAA uses the approximate standardisation:

$ \frac{\bar X-\mu}{s/\sqrt n}\approx N(0,1), \quad n\ge30. $

This is the bridge between sample data and probability calculations.

The central limit theorem explains why this works for a wide range of population shapes. If $n$ is large, the distribution of $\bar X$ is often approximately normal even when the original population is not normal. The syllabus uses $n\ge30$ as the large-sample threshold for the approximate standard normality involving $s$.

Sample standard deviation

The sample standard deviation uses $n-1$ in the denominator. You usually calculate it with technology, but conceptually it measures how spread out the observed sample values are around their sample mean.

The reason $s$ appears in:

$ \frac{\bar X-\mu}{s/\sqrt n} $

is that the true population standard deviation $\sigma$ is often unknown. For large samples, $s$ is a reasonable estimate of $\sigma$, so the statistic can be treated as approximately standard normal.

The sample standard deviation is:

$ s=\sqrt{\frac{\sum_{i=1}^n(x_i-\bar x)^2}{n-1}}. $

The $n-1$ denominator is why calculator menus usually distinguish between sample standard deviation and population standard deviation. For inference, use the sample version unless the question explicitly gives the population standard deviation.

Why the formulas work

Linearity of expectation gives:

$ E(\bar X) =E\left(\frac{X_1+\cdots+X_n}{n}\right) =\frac{E(X_1)+\cdots+E(X_n)}{n} =\mu. $

For independent observations, variances add:

$ \operatorname{Var}(X_1+\cdots+X_n)=n\sigma^2. $

Since $\bar X=\frac1n(X_1+\cdots+X_n)$ and $\operatorname{Var}(aX)=a^2\operatorname{Var}(X)$:

$ \operatorname{Var}(\bar X)=\frac1{n^2}n\sigma^2=\frac{\sigma^2}{n}. $

Taking the square root gives the standard error $\frac{\sigma}{\sqrt n}$.

Simulating sample means

A simulation makes the sampling distribution visible:

  1. Choose a population distribution.
  2. Generate a sample of size $n$.
  3. Calculate the sample mean.
  4. Repeat the process many times.
  5. Plot the resulting sample means.

For example, if individual waiting times are exponential with mean $4$, then single observations are right-skewed. But if you simulate many samples of size $1000$ and record each sample mean, the histogram of sample means is much tighter and more symmetric. Standardising those sample means using the standard error should produce values close to a standard normal distribution when the sample size is large.

Histogram of repeated sample means

Original Sylligence diagram for specialist sampling histogram.

Histogram of repeated sample means

The important detail in a simulation is what the histogram is actually showing. It is not a histogram of all the raw waiting times. It is a histogram of many values of $\bar x$, one from each repeated sample. That is why the scale is much tighter than the original population distribution and why the central limit theorem becomes visible.

Solving probability questions

For a known population standard deviation, standardise the sample mean using:

$ Z=\frac{\bar X-\mu}{\sigma/\sqrt n}. $

Then use the normal distribution to find the probability. If the question gives a sample standard deviation instead and the sample is large, use $s$ in place of $\sigma$.

Quick check

Sources