performance - ifort -ipo flag strange behavior -


i have following code testing intel mkl daxpy routine.

program test     implicit none     integer, parameter :: n = 50000000     integer, parameter :: nloop = 100     real(8), dimension(:), allocatable :: a, b     integer start_t, end_t, rate,     allocate(a(n))     allocate(b(n))      = 1.0d0     b = 2.0d0     call system_clock(start_t, rate)     = 1, nloop         call sumarray(a, b, a, 3.0d0, n)     end     call system_clock(end_t)     print *, sum(a)     print *, "sumarray time: ", real(end_t-start_t)/real(rate)      = 1.0d0     b = 2.0d0     call system_clock(start_t, rate)     = 1, nloop         call daxpy(n, 3.0d0, b, 1, a, 1)     end     call system_clock(end_t)     print *, sum(a)     print *, "daxpy time: ", real(end_t-start_t)/real(rate)      = 1.0d0     b = 2.0d0     call system_clock(start_t, rate)     = 1, nloop         = + 3.0d0*b     end     call system_clock(end_t)     print *, sum(a)     print *, "a + 3*b time: ", real(end_t-start_t)/real(rate) end program test  subroutine sumarray(x, y, z, alfa, n)     implicit none     integer n,     real(8) x(n), y(n), z(n), alfa     !$omp parallel     = 1, n         z(i) = x(i) + alfa*y(i)     end     !$omp end parallel end subroutine sumarray 

here, sumarray handwritten subroutine openmp similar daxpy. when compile code ifort test.f90 -o test -o3 -openmp -mkl results (aproximately):

sumarray time: 5.7 sec daxpy time: 5.7 sec + 3*b time: 1.9 sec 

however, when compile ifort test.f90 -o test -o3 -openmp -mkl -ipo results a + 3*b change dramatically:

sumarray time: 5.7 sec daxpy time: 5.7 sec + 3*b time: 9.3 sec 

so firstly, why naive array sum better mkl? , -ipo have slowdown of naive array sum? also, bothers me when eliminate loops, is, when time each operation once, times first case divided 1000 (around 5.7 ms sumarray , daxpy, 9.3 ms a + 3*b) regardless of using -ipo. guess naive sum in loop allows compiler optimize further, -ipo flag messes optimization. note: know -ipo in case useless since single file.


Comments

Popular posts from this blog

How to access named pipes using JavaScript in Firefox add-on? -

multithreading - OPAL (Open Phone Abstraction Library) Transport not terminated when reattaching thread? -

node.js - req param returns an empty array -