Ruby create tar ball in chunks to avoid out of memory error -


i'm trying re-use following code create tar ball:

tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w")       gem::package::tarwriter.new(tarfile) |tar|         dir[file.join(path, "**/*")].each |file|           mode = file.stat(file).mode           relative_file = file.sub /^#{regexp::escape path}\/?/, ''           if file.directory?(file)             tar.mkdir relative_file, mode           else             tar.add_file relative_file, mode |tf|               file.open(file, "rb") { |f| tf.write f.read }             end           end         end       end       tarfile.rewind       tarfile 

it works fine far small folders involve large fail following error:

error: application used more memory safety cap 

how can in chunks avoid memory problems?

it looks problem in line:

file.open(file, "rb") { |f| tf.write f.read } 

you "slurping" input file doing f.read. slurping means entire file being read memory, isn't scalable @ all, , result of using read without length.

instead, i'd read , write file in blocks have consistent memory usage. reads in 1mb blocks. can adjust own needs:

blocksize_to_read = 1024 * 1000  file.open(file, "rb") |fi|   while buffer = fi.read(blocksize_to_read)     tf.write buffer   end end 

here's the documentation says read:

if length positive integer, try read length bytes without conversion (binary mode). returns nil or string length 1 length bytes. nil means met eof @ beginning. 1 length-1 bytes string means met eof after reading result. length bytes string means doesn’t meet eof. resulted string ascii-8bit encoding.

an additional problem looks you're not opening output file correctly:

tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w") 

you're writing in "text" mode because of "w". instead, need write in binary mode, "wb", because tarballs contain binary (compressed) data:

tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","wb") 

rewriting original code more i'd want see it, results in:

blocksize_to_read = 1024 * 1000  def create_tarball(path)    tar_filename = pathname.new(path).realpath.to_path + '.tar'    file.open(tar_filename, 'wb') |tarfile|      gem::package::tarwriter.new(tarfile) |tar|        dir[file.join(path, '**/*')].each |file|          mode = file.stat(file).mode         relative_file = file.sub(/^#{ regexp.escape(path) }\/?/, '')          if file.directory?(file)           tar.mkdir(relative_file, mode)         else            tar.add_file(relative_file, mode) |tf|             file.open(file, 'rb') |f|               while buffer = f.read(blocksize_to_read)                 tf.write buffer               end             end           end          end       end     end   end    tar_filename  end 

blocksize_to_read should @ top of file since it's constant , "tweakable" - more changed body of code.

the method returns path tarball, not io handle original code. using block form of io.open automatically closes output, cause subsequent open automatically rewind. prefer passing around path strings io handles files.

i wrapped of method parameters in enclosing parenthesis. while parenthesis aren't required around method parameters in ruby, , people eschew them, think make code more maintainable delimiting parameters start , end. avoid confusing ruby when you're passing parameters , block method -- well-known cause bugs.


Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -