Inter-Operator Parallelism#
Types of Partitioning#
Shared data can be partitioned in three ways by:
Range
Great for equijoins, ranged queries, and group by
Hash
Great for equijoins and group by
Random (Round-Robin)
Good for spreading load
Queries#
Scan
Scan in parallel and merge before output
\(\sigma_p\) skip entire machines that has no tuples satisfying \(p\)
Search by Key
If data partitioned by range of key then use only that machine
Otherwise have every machine do lookup.
Insert
If partitioned by key, then insert there
Otherwise insert anywhere
If the key field is unique then check everything.