performance - ifort -ipo flag strange behavior -
i have following code testing intel mkl daxpy routine.
program test implicit none integer, parameter :: n = 50000000 integer, parameter :: nloop = 100 real(8), dimension(:), allocatable :: a, b integer start_t, end_t, rate, allocate(a(n)) allocate(b(n)) = 1.0d0 b = 2.0d0 call system_clock(start_t, rate) = 1, nloop call sumarray(a, b, a, 3.0d0, n) end call system_clock(end_t) print *, sum(a) print *, "sumarray time: ", real(end_t-start_t)/real(rate) = 1.0d0 b = 2.0d0 call system_clock(start_t, rate) = 1, nloop call daxpy(n, 3.0d0, b, 1, a, 1) end call system_clock(end_t) print *, sum(a) print *, "daxpy time: ", real(end_t-start_t)/real(rate) = 1.0d0 b = 2.0d0 call system_clock(start_t, rate) = 1, nloop = + 3.0d0*b end call system_clock(end_t) print *, sum(a) print *, "a + 3*b time: ", real(end_t-start_t)/real(rate) end program test subroutine sumarray(x, y, z, alfa, n) implicit none integer n, real(8) x(n), y(n), z(n), alfa !$omp parallel = 1, n z(i) = x(i) + alfa*y(i) end !$omp end parallel end subroutine sumarray
here, sumarray handwritten subroutine openmp similar daxpy. when compile code ifort test.f90 -o test -o3 -openmp -mkl
results (aproximately):
sumarray time: 5.7 sec daxpy time: 5.7 sec + 3*b time: 1.9 sec
however, when compile ifort test.f90 -o test -o3 -openmp -mkl -ipo
results a + 3*b
change dramatically:
sumarray time: 5.7 sec daxpy time: 5.7 sec + 3*b time: 9.3 sec
so firstly, why naive array sum better mkl? , -ipo
have slowdown of naive array sum? also, bothers me when eliminate loops, is, when time each operation once, times first case divided 1000 (around 5.7 ms sumarray
, daxpy
, 9.3 ms a + 3*b
) regardless of using -ipo
. guess naive sum in loop allows compiler optimize further, -ipo
flag messes optimization. note: know -ipo
in case useless since single file.
Comments
Post a Comment