Speedup#

There are many ways to speedup a program. We’ll use matrix operations as the classical example. This section will not cover parallel processing.

Unrolling#

Optimize the use of independent instruction to accumulate as much binary size (registers) as possible.

result = 0
for i in range(0, len(arr), 4):
    result += arr[i]  
    result += arr[i+1]
    result += arr[i+2]
    result += arr[i+3]

Cache Blocking#

Optimize temporal locality of cache by changing the way we access data.

  • For NxN matrix operations one method is to split the matrices into sub-matrix