Lorenzo La Corte - S4784539 - 2023/2024 - Università degli Studi di Genova
The goal of this project is to parallelize and accelerate the mandelbrot program using OpenMP and CUDA. This report is focused on the ****iterative process to achieve both optimizations. In particular, it aims to discuss the following points:
mandelbrot.cu, with a comparison between different possible configurations and also against the OpenMP version.Before starting, I can set up the environment in which icc is installed (in the laboratory machines), by opening a shell and executing the command:
source /opt/intel/oneapi/setvars.sh
For the third part, I will also need to set up the environment for using the NVIDIA HPC SDK (Software Development Kit), which contains nvc++ compiler, through the commands:
NVARCH=`uname -s`_`uname -m`; export NVARCH; NVCOMPILERS=/opt/nvidia/hpc_sdk; export NVCOMPILERS; MANPATH=$MANPATH:$NVCOMPILERS/$NVARCH/23.7/compilers/man; export MANPATH; PATH=$NVCOMPILERS/$NVARCH/23.7/compilers/bin:$PATH; export PATH;
mandlebrot.cpp Metrics and PerformancesFirstly, I can benchmark the time spent by the program, without enabling any kind of optimization. To get also other useful insights, I can leverage intel advisor:
$ icc mandelbrot.cpp -diag-disable=10441 -o mandelbrot && ./mandelbrot out.txt
Time elapsed: 13 seconds.
$ icc mandelbrot.cpp -O3 -g -o mandelbrot_profiling && ./mandelbrot_profiling
icc: remark #10441: The Intel(R) C++ Compiler Classic (ICC) is deprecated and will be removed from product release in the second half of 2023. The Intel(R) oneAPI DPC++/C++ Compiler (ICX) is the recommended compiler moving forward. Please transition to use this compiler. Use '-diag-disable=10441' to disable this message.
Time elapsed: 13 seconds.
Please specify the output file as a parameter.
$ advix-gui
The analysis is conducted on the sequential version of the program:
| Program Elapsed Time | 13.10s | 
|---|---|
| Vector Instruction Set | SSE2, SSE | 
| Number of CPU Threads | 1 | 
It’s already clear that no vectorization is enabled by default:
| Metrics | Total | 
|---|---|
| CPU Time | 13.09s (100%) | 
| Time in scalar code | 13.09s (100%) | 
| Vectorization Gain/Efficiency | Not Available (No vectorized loops found or not enough data) | 
| Function Call Sites and Loops | Total Time % | Total Time | Self Time | Why No Vectorization? | 
|---|---|---|---|---|
| [loop in main at mandelbrot.cpp:33] | 100% | 13.090s | 13.090s | outer loop was not auto-vectorized: consider using SIMD directive | 
This is shown by the 100% times in scalar code and the lack of vectorized loops. The advisor is suggesting that SIMD directives, which are commands that allow for vectorized operations, could be used to improve the efficiency of the code.