When blast encounters the following equation: a = a * b * 4, the depending on alignment or not the following assembly is executed in burst (only important parts are shown)

To perform the multiplication between 2 variables in a datasegment that is aligned:


The result is multiplied in a register (of the interpretor) with a constant, blast knows to only execute the move once for all records combined, in future versions, for ssmd, values will be constant when they are equal accross all records in the input. This will bring big optimizations not normally possible with pre-compiled code.


The datasegment and stack are setup as arrays of arrays and are said to be aligned when each basepointer of the subarrays has the same offset to the other. If thats not the case we call the data unaligned and blast has to calculate each base index resuling in un-packed instructions but still very fast code:


And thats it, the above script executes in 1.7 ns on my machine if i feed it 64 data records.