Storing XML node values with R's xmlEventParse for filtered output -


i have huge xml file (260mb) tons of information looking this:

example:

<mydocument> <positions eventtime="2012-09-29t20:31:21" internalmatchid="0000t0"> <frameset gamesection="1sthalf" match="0000t0" club="referee" object="00011d"> <frame n="0" t="2012-09-29t18:31:21" x="-0.1158" y="0.2347" s="1.27" /> <frame n="1" t="2012-09-29t18:31:21" x="-0.1146" y="0.2351" s="1.3" /> <frame n="2" t="2012-09-29t18:31:21" x="-0.1134" y="0.2356" s="1.33" /> </frameset> <frameset gamesection="2ndhalf" match="0000t0" club="referee" object="00011d"> <frame n="0" t="2012-09-29t18:31:21" x="-0.1158" y="0.2347" s="1.27" /> <frame n="1" t="2012-09-29t18:31:21.196" x="-0.1146" y="0.2351" s="1.3" /> <frame n="2" t="2012-09-29t18:31:21.243" x="-0.1134" y="0.2356" s="1.33" /> </frameset> </positions> </mydocument> 

there around 40 different frameset nodes, each different gamesection="..." , object="...".

i love extract information of <frame> nodes list object cannot load whole xml file because large. there way, can use xmleventparse function filter specific gamesection , specific object , information corresponding <frame> elements?

it might 'internal' representation not large

xml = xmltreeparse("file.xml", useinternalnodes=true) 

and xpath best bet. if doesn't work, you'll need head around closures. i'm going aim branches argument of xmleventparse, allows hybrid event parsing iterate through file, coupled dom parsing on each node. here's function returns list of functions.

branchfactory <-     function() {     env <- new.env(parent=emptyenv())   # safety      frameset <- function(elt) {         id <- paste(xmlattrs(elt), collapse=":")         env[[id]] <- xpathsapply(elt, "//frame", xmlattrs)     }      <- function() env      list(get=get, frameset=frameset) } 

inside function we're going create place store our results iterate through file. list, it'll better use environment. allow insert new results without copying results we've inserted. here's our environment:

    env <- new.env(parent=emptyenv()) 

we use parent argument measure of safety, if it's not relevant in our present case. define function invoked whenever "frameset" node encountered

    frameset <- function(elt) {         id <- paste(xmlattrs(elt), collapse=":")         env[[id]] <- xpathsapply(elt, "//frame", xmlattrs)     } 

it turns out that, when use branches argument, xmleventparse have arranged parse entire node object can manipulate via dom, e.g., using xlmattrs , xpathsapply. first line of function creates unique identifier frame set (? maybe that's not case full data set? you'll need unique identifier). parse "//frame" part of element, , store in our environment. storing result trickier looks -- we're assigning variable called env. env doesn't exist in body of frameset function, r uses lexical scoping rules search variable named env in environment in frameset function defined. , lo, finds env have created. add result of xpathsapply to. that's our frameset node parser.

we'd convenience function can use retrieve env, this:

    <- function() env 

again, going use lexical scoping find env variable created @ top of branchfactory. end branchfactory returning list of functions we've defined

    list(get=get, frameset=frameset) 

this surprisingly tricky -- we're returning list of functions. functions defined in environment created when invoke branchfactory and, lexical scope work, environment has persist. we're returning not list of functions, also, implicitly, variable env. in brief

we're ready parse our file. creating instance of branch parser, it's own unique versions of get , frameset functions , of env variable created store results. parse file

b <- branchfactory() xx <- xmleventparse("file.xml", handlers=list(), branches=b) 

we can retrieve results using b$get(), , can cast list if that's convenient.

> as.list(b$get()) $`1sthalf:0000t0:referee:00011d`   [,1]                  [,2]                  [,3]                  n "0"                   "1"                   "2"                   t "2012-09-29t18:31:21" "2012-09-29t18:31:21" "2012-09-29t18:31:21" x "-0.1158"             "-0.1146"             "-0.1134"             y "0.2347"              "0.2351"              "0.2356"              s "1.27"                "1.3"                 "1.33"                 $`2ndhalf:0000t0:referee:00011d`   [,1]                  [,2]                      [,3]                      n "0"                   "1"                       "2"                       t "2012-09-29t18:31:21" "2012-09-29t18:31:21.196" "2012-09-29t18:31:21.243" x "-0.1158"             "-0.1146"                 "-0.1134"                 y "0.2347"              "0.2351"                  "0.2356"                  s "1.27"                "1.3"                     "1.33"                    

Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -