Lorenzo La Corte - S4784539 - 2023/2024 - Università degli Studi di Genova
The goal of this homework is to parallelise and vectorise the following program corresponding to an implementation of the Discrete Fourier Transform algorithm.
This report is focused on code analysis and on the ****iterative process to speed up the application. In particular, it aims to discuss the following points:
Before starting, I can set up the environment in which icc
is installed (in the laboratory machines), by opening a shell and executing the command:
source /opt/intel/oneapi/setvars.sh
Compiling and running the program produces these results:
$ icc -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000
DFTW computation in 1.021012 seconds
Xre[0] = 10000.000000
icc
optimizations comparisonChanging the optimization level can be a first interesting experiment:
$ icc -O0 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000
DFTW computation in 14.842891 seconds
Xre[0] = 10000.000000
$ icc -O1 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000
DFTW computation in 3.886813 seconds
Xre[0] = 10000.000000
$ icc -O2 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000
DFTW computation in 1.020945 seconds
Xre[0] = 10000.000000
$ icc -O3 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000
DFTW computation in 1.025665 seconds
Xre[0] = 10000.000000
The biggest difference in time spent for the computation is between O0
and O1
, while from O2
and O3
the time eventually increases:
Optimization Level | Time |
---|---|
-O0 | 14.842891s |
-O1 | 3.886813s |
-O2 | 1.020945s |
-O3 | 1.025665s |
This is the only part of the code with a quadratic complexity: