Implement block multiply matrix mpi

5/28/2023

Additionally, a comparative study between single-core and multicore platforms has been examined. This work has a comparative study of using most popular compilers: Intel C++ compiler 17.0 over Microsoft Visual Studio C++ compiler 2015. Making parallel implementation guidelines of said algorithms, where the target architecture’s characteristics need to be taken into consideration when said algorithms are applied are presented. This paper is different from other papers by concentrating on several main technique and the results therein. Our optimization is designed by using AVX instruction sets, OpenMP parallelization, and memory access optimization to overcome bandwidth limitations. Our goal is to accelerate and optimize square single-precision matrix multiplication from 2080 to 4512, i.e. AVX is supporting variety of applications such as image processing. Said prescript processes a chunk of data both individually and altogether. This paper is focused on Intel Advanced Vector Extension (AVX) which has been borne of the modern developments in AMD processors and Intel itself.

0 Comments

Implement block multiply matrix mpi

Leave a Reply.

Author

Archives

Categories