implement a more efficient version of the function by using a word of data type unsigned long to pack eight copies of c, and then step through the region using word-level writes. you might find it helpful to do additional loop unrolling as well. on our reference machine, we were able to reduce the cpe from 1.00 for the straightforward implementation to 0.127. that is, the program is able to write 8 bytes every clock cycle. here are some additional guidelines. to ensure portability, let k denote the value of sizeof(unsigned long) for the machine on which you run your program. . you may not call any library functions