r - Recalculate column only for highest date in each category -


i want recalculate column in data.table rows, depending on condition, category (cat) , date.

a row may qualify recalculated if condition==true. among rows condition==true, rows highest date respective cat should selected.

a simplified example:

     df = data.frame(cat=rep(c("a","b","c"),each=3), date=rep(c("01-08-2013","01-07-2013","01-04-2013"),3),             condition=c(true, true, false, false, false, false, false, false, true),             data1=c(1:9), data2=rep(c(1:3),3), result=c(1:1))      df$date = as.date(df$date , "%m-%d-%y")      dt = data.table(df)      dt          cat       date condition data1 data2 result      1:   2013-01-08      true     1     1      1      2:   2013-01-07      true     2     2      1      3:   2013-01-04     false     3     3      1      4:   b 2013-01-08     false     4     1      1      5:   b 2013-01-07     false     5     2      1      6:   b 2013-01-04     false     6     3      1      7:   c 2013-01-08     false     7     1      1      8:   c 2013-01-07     false     8     2      1      9:   c 2013-01-04      true     9     3      1 

i found out how extract cat's , date's of rows, result must recalculated:

    setkey(dt, condition, cat, date)     dt[j(true), max(date), by=cat]         cat         v1     1:   2013-01-08     2:   c 2013-01-04 

however, don't know how calculate new result these rows. in simplified example, new result should data1+data2.

edit:
inspired eddi's answer, came 2 more possible solutions:

approach using .i:

    dt[dt[condition==true , .i[which.max(date)], by=cat][[2]], result:=data1+data2] 

approach using .sd (see eddi's note of caution):

    max_dates=dt[condition==true , .sd[which.max(date)], by=cat]     setkey(dt, cat, date)     dt[max_dates, result:=data1 + data2] 

are there recommendations solution choose regard speed / efficiency?

something work:

dt = data.table(df) max_dates = dt[condition == true,                list(date = max(date), condition = true),                = cat]  setkey(dt, cat, date, condition) dt[max_dates, result := data1 + data2] dt #   cat       date condition data1 data2 result #1:   2013-01-04     false     3     3      1 #2:   2013-01-07      true     2     2      1 #3:   2013-01-08      true     1     1      2 #4:   b 2013-01-04     false     6     3      1 #5:   b 2013-01-07     false     5     2      1 #6:   b 2013-01-08     false     4     1      1 #7:   c 2013-01-04      true     9     3     12 #8:   c 2013-01-07     false     8     2      1 #9:   c 2013-01-08     false     7     1      1 

a note of warning: above relies on max_dates not having key - if change have key (e.g. if by column that's part of key), you'd have either erase key, or make have same key dt later in code merge work correctly.

and here's approach:

dt = data.table(df)  dt[, result := result + (data1 + data2 - result) * condition * (date == max(date)),      = list(cat, condition)] # could've used ifelse here instead, ifelse slow 

Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -