Design and Bias

Design and Bias#

Types of Data Sets#

The best data may be an exhaustive data set full of everything you need. This ideal data set is called a census.

More commonly one often would uses a subset or sample of a census. There are many types depending on how they were collection and who participated:

**Chance: ** Randomly selected participants of the data
**Self-selected: ** People are able to participate by choice.
**Administrative database: ** A conveniently formed data set from initially collected for administration purposes.

Types of Samples#

Judgmental Sample#

Only certain people are able to become participants determined by the method.

Simple Random Sample#

A simple random sample (SRS) is a sample where each possible choice gets an equal chance to participate or be chosen.

\[P_1 = P_2 = ...=P_n\]

SRS is the most ideal however not always possible.

Cluster Sample#

A cluster sample is one which instead of each individual people are a sample, they are grouped into cluster or groups which are chosen as participants.

For example, if odd numbered participants are in group O and even numbered participants are in group E. If we chose group O to form \(\text{Sample A}\), then we know for sure the probability that odd participants are in the sample and even participants are definitely not in the sample.,

\[\text{Group O}: \{1, 3, 5, ..., O\}\]

\[\text{Group E}: \{2, 4, 6, ..., E\}\]

\[P_1 = P_3 = ... = P_O = 1\]

\[P_2 = P_4 = ... = P_E = 0\]

\(E\) : Number of even participant
\(O\) : Number of odd participant

Example in a Sentence:

Grouping everyone by states and randomly choosing a state.

Stratified Sample#

Similar to cluster sample, stratified sample groups participants however instead we allow all members of a groups to be selected. In a stratified sample one may decide to select only one participant per group.

From here we may decide to choose one from each sample