Optimizing Max Kernel Operation in IDL

Thursday, December 01, 2016

I found this optimization question on the comp.lang.idl-pvwave, and decided to give it a try. The question was to implement an algorithm that replaces every element in a 2-D array with its neighborhood maximum value. In this case the neighborhood size was 101x101 (i.e. -50 to +50), and the array size was 3200x3248. The nested FOR loop approach looks like the following code snippet.

  data = randomu(seed,3200,3248)
  dim = size(data,/dimension)
  nx = dim[0]
  ny = dim[1]
 
  t0 = tic('Nested FOR')
  result2 = data
  for i=0,nx-1 do begin
    for j=0,ny-1 do begin
      result2[i,j] = max(data[(i-50)>0:(i+50)<(nx-1),(j-50)>0:(j+50)<(ny-1)])
    endfor
  endfor
  toc,t0
 

My first thought was to use the > operator which returns the maximum of 2 arguments. It operates on arrays, and in conjunction with the SHIFT function it serves to return the larger of 2 neighbors. The other trick here is that since we are looking for a 101x101 neighborhood maximum, we can use a combination of smaller neighborhood maxima as input in a structured way in order to achieve the exact 101x101 neighborhood size. The code that I ended up with after some trial and error was the following.

  t0 = tic('Iterative >')
  ; Using SHIFT and > in an iterative way
  padded = replicate(min(data),size(data,/dimension)+100)
  padded[50,50] = data
  tmp3 = shift(padded,1,0) > padded > shift(padded,-1,0)
  tmp9 = shift(tmp3,3,0) > tmp3 > shift(tmp3,-3,0)
  tmp27 = shift(tmp9,9,0) > tmp9 > shift(tmp9,-9,0)
  tmp81 = shift(tmp27,27,0) > tmp27 > shift(temporary(tmp27),-27,0)
  tmp99 = shift(tmp9,44,0) > temporary(tmp81) > shift(temporary(tmp9),-44,0)
  tmp101 = shift(tmp3,49,0) > temporary(tmp99) > shift(temporary(tmp3),-49,0)
 
  ; Same for Y-dim
  tmp3 = shift(tmp101,0,1) > tmp101 > shift(temporary(tmp101),0,-1)
  tmp9 = shift(tmp3,0,3) > tmp3 > shift(tmp3,0,-3)
  tmp27 = shift(tmp9,0,9) > tmp9 > shift(tmp9,0,-9)
  tmp81 = shift(tmp27,0,27) > tmp27 > shift(temporary(tmp27),0,-27)
  tmp99 = shift(tmp9,0,44) > temporary(tmp81) > shift(temporary(tmp9),0,-44)
  tmp101 = shift(tmp3,0,49) > temporary(tmp99) > shift(temporary(tmp3),0,-49)
 
  result1 = (temporary(tmp101))[50:50+nx-1,50:50+ny-1]
  toc,t0
 

I didn’t say that optimized code always looks pretty, but the goal here is to run fast. Adding in some result comparison checking to make sure the results are equivalent.

 
  print, array_equal(result1,result2) ? 'Results are matching' : 'SOMETHING went wrong'
 

Finally, here are the results, which yielded an impressive amount of speed-up, the execution time went from 189.4 seconds to 1.4 seconds, and the results are identical:

  % Time elapsed Nested FOR: 189.39470 seconds.
  % Time elapsed Iterative >: 1.4241931 seconds.
  Results are matching
 
 
 
Rate this article:
No rating
Tags:

Number of views (912)/Comments (0)

Please login or register to post comments.

MONTHLY ARCHIVE

MOST POPULAR POSTS

AUTHORS

Authors

Authors

Authors

Authors

Authors

Authors

Authors

Authors

Authors

Authors

GUEST AUTHORS

Authors

Authors

Authors

Authors

Authors

Authors

Authors



© 2017 Exelis Visual Information Solutions, Inc., a subsidiary of Harris Corporation