4d66e8c12e
The loop uses a 32-bit accumulator. The current code would only zero the lower 16 bits thereof.
The loop uses a 32-bit accumulator. The current code would only zero the lower 16 bits thereof.