Ruby create tar ball in chunks to avoid out of memory error -
i'm trying re-use following code create tar ball:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w") gem::package::tarwriter.new(tarfile) |tar| dir[file.join(path, "**/*")].each |file| mode = file.stat(file).mode relative_file = file.sub /^#{regexp::escape path}\/?/, '' if file.directory?(file) tar.mkdir relative_file, mode else tar.add_file relative_file, mode |tf| file.open(file, "rb") { |f| tf.write f.read } end end end end tarfile.rewind tarfile it works fine far small folders involve large fail following error:
error: application used more memory safety cap how can in chunks avoid memory problems?
it looks problem in line:
file.open(file, "rb") { |f| tf.write f.read } you "slurping" input file doing f.read. slurping means entire file being read memory, isn't scalable @ all, , result of using read without length.
instead, i'd read , write file in blocks have consistent memory usage. reads in 1mb blocks. can adjust own needs:
blocksize_to_read = 1024 * 1000 file.open(file, "rb") |fi| while buffer = fi.read(blocksize_to_read) tf.write buffer end end here's the documentation says read:
if length positive integer, try read length bytes without conversion (binary mode). returns nil or string length 1 length bytes. nil means met eof @ beginning. 1 length-1 bytes string means met eof after reading result. length bytes string means doesn’t meet eof. resulted string ascii-8bit encoding.
an additional problem looks you're not opening output file correctly:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w") you're writing in "text" mode because of "w". instead, need write in binary mode, "wb", because tarballs contain binary (compressed) data:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","wb") rewriting original code more i'd want see it, results in:
blocksize_to_read = 1024 * 1000 def create_tarball(path) tar_filename = pathname.new(path).realpath.to_path + '.tar' file.open(tar_filename, 'wb') |tarfile| gem::package::tarwriter.new(tarfile) |tar| dir[file.join(path, '**/*')].each |file| mode = file.stat(file).mode relative_file = file.sub(/^#{ regexp.escape(path) }\/?/, '') if file.directory?(file) tar.mkdir(relative_file, mode) else tar.add_file(relative_file, mode) |tf| file.open(file, 'rb') |f| while buffer = f.read(blocksize_to_read) tf.write buffer end end end end end end end tar_filename end blocksize_to_read should @ top of file since it's constant , "tweakable" - more changed body of code.
the method returns path tarball, not io handle original code. using block form of io.open automatically closes output, cause subsequent open automatically rewind. prefer passing around path strings io handles files.
i wrapped of method parameters in enclosing parenthesis. while parenthesis aren't required around method parameters in ruby, , people eschew them, think make code more maintainable delimiting parameters start , end. avoid confusing ruby when you're passing parameters , block method -- well-known cause bugs.
Comments
Post a Comment