Statistics

Statistics

Data

  • Continuous data is data that can take any value; e.g. height
  • Discrete data is data that can only take distinct values; e.g. number of people taller than 2 meters.

Outlier

  • Any extreme data, either very small or large, compared to the rest of the data set is set to be an outlier.

Sampling techniques

  • Simple random sample: every possible sample has an equal chance of being selected; not practical.
  • Convenience sampling: respondents are chosen based upon their availability; practical; can introduce bias if group is small.
  • Systematic sampling: participants are taken at regular intervals from a list of the population; more practical than random sampling, but not very practical in general.
  • Stratified sampling: the population is split into groups based on factors relevant to the research, then a random sample from each group is taken in proportion to the size of that group; creates representative sample; not very practical.
  • Quota sampling: the population is split into groups based on factors relevant to the research, then convenience sampling from each group is used until a required number of participants are found; creates a representative sample; can introduce bias if sample contains only similar members.

Mean

  • Total sum of items (values) / number of items

Interquartile range

  • IQR = Q3 – Q1
  • Q3 = upper quartile
  • Q1 = lower quartile

Outlier using IQR

  • To identify outliers, we use the following formula.
  • The data value x is an outlier if x<Q1-1.5(Q3-Q1) or x>Q3+1.5(Q3-Q1).

Exam Tip

This formula is not included in the formula booklet, so make sure you have it memorized when heading into an exam

Effects of constants on data

  • Adding a constant, k, to every data value will:
    • Change the mean, median and mode by k
    • Not change the standard deviation or IQR
  • Multiplying every data value by a constant, k, will:
    • Multiply the mean, median and mode by k
    • Multiply the standard deviation and IQR by k

Box and whisker diagrams

  • Contains: the smallest value, lower quartile, median, upper quartile, and largest value.