Ruby create tar ball in chunks to avoid out of memory error -
i'm trying re-use following code create tar ball:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w") gem::package::tarwriter.new(tarfile) |tar| dir[file.join(path, "**/*")].each |file| mode = file.stat(file).mode relative_file = file.sub /^#{regexp::escape path}\/?/, '' if file.directory?(file) tar.mkdir relative_file, mode else tar.add_file relative_file, mode |tf| file.open(file, "rb") { |f| tf.write f.read } end end end end tarfile.rewind tarfile
it works fine far small folders involve large fail following error:
error: application used more memory safety cap
how can in chunks avoid memory problems?
it looks problem in line:
file.open(file, "rb") { |f| tf.write f.read }
you "slurping" input file doing f.read
. slurping means entire file being read memory, isn't scalable @ all, , result of using read
without length.
instead, i'd read , write file in blocks have consistent memory usage. reads in 1mb blocks. can adjust own needs:
blocksize_to_read = 1024 * 1000 file.open(file, "rb") |fi| while buffer = fi.read(blocksize_to_read) tf.write buffer end end
here's the documentation says read
:
if length positive integer, try read length bytes without conversion (binary mode). returns nil or string length 1 length bytes. nil means met eof @ beginning. 1 length-1 bytes string means met eof after reading result. length bytes string means doesn’t meet eof. resulted string ascii-8bit encoding.
an additional problem looks you're not opening output file correctly:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","w")
you're writing in "text" mode because of "w"
. instead, need write in binary mode, "wb"
, because tarballs contain binary (compressed) data:
tarfile = file.open("#{pathname.new(path).realpath.to_s}.tar","wb")
rewriting original code more i'd want see it, results in:
blocksize_to_read = 1024 * 1000 def create_tarball(path) tar_filename = pathname.new(path).realpath.to_path + '.tar' file.open(tar_filename, 'wb') |tarfile| gem::package::tarwriter.new(tarfile) |tar| dir[file.join(path, '**/*')].each |file| mode = file.stat(file).mode relative_file = file.sub(/^#{ regexp.escape(path) }\/?/, '') if file.directory?(file) tar.mkdir(relative_file, mode) else tar.add_file(relative_file, mode) |tf| file.open(file, 'rb') |f| while buffer = f.read(blocksize_to_read) tf.write buffer end end end end end end end tar_filename end
blocksize_to_read
should @ top of file since it's constant , "tweakable" - more changed body of code.
the method returns path tarball, not io handle original code. using block form of io.open
automatically closes output, cause subsequent open
automatically rewind
. prefer passing around path strings io handles files.
i wrapped of method parameters in enclosing parenthesis. while parenthesis aren't required around method parameters in ruby, , people eschew them, think make code more maintainable delimiting parameters start , end. avoid confusing ruby when you're passing parameters , block method -- well-known cause bugs.
Comments
Post a Comment