Sample size

As described in "Sampling distributions" (under Inferential statistics) a
larger sample contains more information. Hence, a parameter (like the
population mean) can be estimated with more precision (lower standard
deviation) as the sample size increases.

The number of observations in a sample helps to control the probability of
making a Type II error (the probability of accepting a false null hypothesis).

For purpose of hypothesis testing about the population mean the following rule
is applied in practice.

Is n large (n>=30)?
    no  -- Is the population
           approximately normal?
            no  -- increase sample size
                   to 30 or more
            yes -- Is the value of
                   sigma known?
                   no  -- estimate sigma -- use the t-distribution
                   yes -- use the normal distribution (z)
    yes -- Is the value of
           sigma known?
           no  -- estimate sigma -- use the normal distribution
           yes -- use the normal distribution (z)

The determination of an "appropriate" sample size depends on the parameter in
question. For the population mean the following information is necessary:

1. The population variance (or some estimate, e.g., the range divided four).
2. The degree of confidence.
3. Some specified bound for the sample mean.

Therefore, the estimate of the sample size can be obtained by applying the
following formula:

    n = ( z(a/2) * sigma / bound ) ** 2

where z(a/2) is the value from the table of a normal distribution with alpha
over two (a/2) as the level of significance, sigma is the population variance,
and bound is the limit of the interval for the sample mean. The whole
expression is squared (** 2).


Illustration.

Suppose we are interested in estimating the mean GPA at UWF, with 95%
confidence, to within 0.25 of a point.

z(a/2) = z(0.025) = 1.96

bound = 0.25

Suppose that sigma = 0.725 (for instance, assuming that the lowest GPA is 1.1
the range would be 2.9 = 4 - 1.1, and 2.9/4 = 0.725).

The estimated sample size would be:

 n = (1.96 * 0.725 / 0.25) ** 2 = 32.308

Therefore, we need to sample about 33 students.