Design and Bias#

Types of Data Sets#

The best data may be an exhaustive data set full of everything you need. This ideal data set is called a census.

More commonly one often would uses a subset or sample of a census. There are many types depending on how they were collection and who participated:

  • **Chance: ** Randomly selected participants of the data

  • **Self-selected: ** People are able to participate by choice.

  • **Administrative database: ** A conveniently formed data set from initially collected for administration purposes.

Types of Samples#

Judgmental Sample#

Only certain people are able to become participants determined by the method.

Simple Random Sample#

A simple random sample (SRS) is a sample where each possible choice gets an equal chance to participate or be chosen.

\[P_1 = P_2 = ...=P_n\]

SRS is the most ideal however not always possible.

Cluster Sample#

A cluster sample is one which instead of each individual people are a sample, they are grouped into cluster or groups which are chosen as participants.

For example, if odd numbered participants are in group O and even numbered participants are in group E. If we chose group O to form \(\text{Sample A}\), then we know for sure the probability that odd participants are in the sample and even participants are definitely not in the sample.,

\[\text{Group O}: \{1, 3, 5, ..., O\}\]
\[\text{Group E}: \{2, 4, 6, ..., E\}\]
\[P_1 = P_3 = ... = P_O = 1\]
\[P_2 = P_4 = ... = P_E = 0\]
  • \(E\) : Number of even participant

  • \(O\) : Number of odd participant

Example in a Sentence:

Grouping everyone by states and randomly choosing a state.

Stratified Sample#

Similar to cluster sample, stratified sample groups participants however instead we allow all members of a groups to be selected. In a stratified sample one may decide to select only one participant per group.

From here we may decide to choose one from each sample

Example in a Sentence:

Grouping everyone by states and randomly choosing a participant from each state.