Lorenzo La Corte - S4784539 - 2023/2024 - Università degli Studi di Genova


The goal of this homework is to parallelise and vectorise the following program corresponding to an implementation of the Discrete Fourier Transform algorithm.

This report is focused on code analysis and on the ****iterative process to speed up the application. In particular, it aims to discuss the following points:

  1. hotspot identification,
  2. possible vectorization issues,
  3. scalability using a proper number of threads on the laboratory workstation.

Setup

Before starting, I can set up the environment in which icc is installed (in the laboratory machines), by opening a shell and executing the command:

source /opt/intel/oneapi/setvars.sh

Running the Program

Compiling and running the program produces these results:

$ icc -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000 
DFTW computation in 1.021012 seconds
Xre[0] = 10000.000000

icc optimizations comparison

Changing the optimization level can be a first interesting experiment:

$ icc -O0 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000 
DFTW computation in 14.842891 seconds
Xre[0] = 10000.000000 

$ icc -O1 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000 
DFTW computation in 3.886813 seconds
Xre[0] = 10000.000000

$ icc -O2 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000 
DFTW computation in 1.020945 seconds
Xre[0] = 10000.000000

$ icc -O3 -qopenmp omp_homework.c -diag-disable=10441 -o output
$ ./output
DFTW calculation with N = 10000 
DFTW computation in 1.025665 seconds
Xre[0] = 10000.000000 

The biggest difference in time spent for the computation is between O0 and O1, while from O2 and O3 the time eventually increases:

Optimization Level Time
-O0 14.842891s
-O1 3.886813s
-O2 1.020945s
-O3 1.025665s

1. Hotspot Identification

This is the only part of the code with a quadratic complexity: