rest - R, GET and GZ compression -
i building clients onto restful apis. links let me download attachments (files) server, , in best case these .txt. mention restful part since means have send headers , potentially body each post - standard r 'filename'=url logic won't work.
sometimes people bundle many txts zip. these awkward since don't know contain until download many of them.
for moment, unpackaging these, gzipping files (adds .gz extension) , re-uploading them. can indexed , downloaded.
i'm using hadley's cute httr package, can't see elegant way decompress gz files.
when using read.csv or similar files gz ending automatically decompressed (convenient!). what's equivalent when using httr or curl?
content(get("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")) [1] 1f 8b 08 08 4e 9e 9b 51 00 03 73 ...
that looks nice, compressed byte stream correct header (1f 8b). need text contents, tried using memdecompress, says should this:
memdecompress(content(get("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")),type="gzip") error in memdecompress(content(get("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz")), : internal error -3 in memdecompress(2)
what's proper solution here?
also, there way r pull index of remote .zip file without downloading of it?
you can add parser handle mime type. @ ?content
, line you can add new parsers adding appropriately functions httr:::parser
ls(httr:::parsers) #[1] "application/json" "application/x-www-form-urlencoded" #"image/jpeg" #[4] "image/png" "text/html" #"text/plain" #[7] "text/xml"
we can add 1 handle gz
content. dont have better answer @ point gave can incorporate function.
assign("application/octet-stream", function(x, ...) {scan(gzcon(rawconnection(x)),"",,,"\n")},envir = httr:::parsers) content(get("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz"), = "parsed") read 1 item [1] "these not droids looking for" >
edit: hacked alternative:
assign("application/octet-stream", function(x, ...) {f <- tempfile(); writebin(x,f);untar(f);readlines(f, warn = false)},envir = httr:::parsers) content(get("http://glimmer.rstudio.com/alexbbrown/gz/sample.txt.gz"), = "parsed") #[1] "these not droids looking for"
with regards listing files in archive maybe can adjust function somewhat. if try httr
source files. have mime type "application/x-gzip"
assign("application/x-gzip", function(x, ...) { f <- tempfile(); writebin(x,f); if(!is.null(list(...)$list)){ if(list(...)$list){ return(untar(f, list = true)) }else{ untar(f, ...); readlines(f) } }else{ untar(f, ...); readlines(f) } }, envir = httr:::parsers) content(get("http://cran.r-project.org/src/contrib/httr_0.2.tar.gz"), = "parsed", list = true) # > head(content(get("http://cran.r-project.org/src/contrib/httr_0.2.tar.gz"), = "parsed", list = true)) #[1] "httr/" "httr/md5" "httr/tests/" #[4] "httr/tests/test-all.r" "httr/readme.md" "httr/r/"
Comments
Post a Comment