Cost Estimation#
| Statistics | Description | 
|---|---|
| NTuples | # of tuples in a table | 
| NPages | # of disk pages in a table | 
| Low/High | min/max value in a column | 
| Nkeys | # of distinct values in a column | 
| IHeight | Index height | 
| INPages | # of disk pages in an index | 
Selectivity#
The ratio of output to to all possible outputs. This is similar to probability
- col = value $\(P_\text{sel} = \frac{1}{\text{NKeys}(t)}\)$ 
- col1 = col2 $\(P_\text{sel} = \frac{1}{\max(\text{NKeys(t1)}, \text{NKeys(t2))}}\)$ 
- col > value $\(P_\text{sel} = \frac{\text{High}(t) - \text{value}}{\text{High}(t) - \text{Low}(t) + 1}\)$ 
- Missing Information \(\rightarrow\) assume \(1/10\) 
Joint Selectivity#
Selectivity on two or more tables by either AND, OR, or JOIN is calculated with the following two assumption:
- Each binning or grouping of the records in the table are uniformly distributed 
- Each predicate is independent of the other 
Thus, the three operations has the selectivity of:
- AND
- OR
- JOIN