Design and Bias#
Types of Data Sets#
The best data may be an exhaustive data set full of everything you need. This ideal data set is called a census.
More commonly one often would uses a subset or sample of a census. There are many types depending on how they were collection and who participated:
**Chance: ** Randomly selected participants of the data
**Self-selected: ** People are able to participate by choice.
**Administrative database: ** A conveniently formed data set from initially collected for administration purposes.
Types of Samples#
Judgmental Sample#
Only certain people are able to become participants determined by the method.
Simple Random Sample#
A simple random sample (SRS) is a sample where each possible choice gets an equal chance to participate or be chosen.
SRS is the most ideal however not always possible.
Cluster Sample#
A cluster sample is one which instead of each individual people are a sample, they are grouped into cluster or groups which are chosen as participants.
For example, if odd numbered participants are in group O and even numbered participants are in group E. If we chose group O to form \(\text{Sample A}\), then we know for sure the probability that odd participants are in the sample and even participants are definitely not in the sample.,
\(E\) : Number of even participant
\(O\) : Number of odd participant
Example in a Sentence:
Grouping everyone by states and randomly choosing a state.
Stratified Sample#
Similar to cluster sample, stratified sample groups participants however instead we allow all members of a groups to be selected. In a stratified sample one may decide to select only one participant per group.
From here we may decide to choose one from each sample
Example in a Sentence:
Grouping everyone by states and randomly choosing a participant from each state.