r - Averaging values in a table by recognizing the name of the column header -

August 15, 2011

i have following table called m:

identifier  dat_sn_e15.5_1  dat_sn_e15.5_2  dat_sn_p2_1 dat_sn_p2_2 100009600   3           1           0           0 100009609   13          4           1           6 100009614   0           0           0           0 100009664   9           17          5           7 100012          0           0           0           0 100017          0           0           0           0 100019          1275            70          54          353 100033459   0           0           0           0 100034251   0           0           0           0 100034361   277         4           114         830

column number 1 gene identifier, column 2 , 3 biological replicates of dat_sn_e15.5, column 4 , 5 biological replicates of dat_sn_p2. real world data consists of 56 such samples each having 2 replicates. there way recognize replicates based on name , difference being 1 or 2 @ end of name?

if how create new table m.rep<- averages 2 values each identifier , each sample , contains gene identifier, columns named dat_sn_e15.5_ave , dat_sn_p2_ave.

one idea use fuzzy search or approximate matches pattern using agrep.

## replace nn colnames nn <- c('dat_sn_e15.5_1','dat_sn_e15.5_2','dat_sn_p2_1','dat_sn_p2_2') ## each column name find column approximately similar ll <- lapply(seq_along(nn),function(x)           nn[agrep(nn[x],nn)])  ## remove duplicate since similar n , b similar ll[!duplicated(ll)]  [[1]] [1] "dat_sn_e15.5_1" "dat_sn_e15.5_2"  [[2]] [1] "dat_sn_p2_1" "dat_sn_p2_2"

edit here how can use above, using data

dat <- read.table(text='identifier  dat_sn_e15.5_1  dat_sn_e15.5_2  dat_sn_p2_1 dat_sn_p2_2 100009600   3           1           0           0 100009609   13          4           1           6 100009614   0           0           0           0 100009664   9           17          5           7 100012          0           0           0           0 100017          0           0           0           0 100019          1275            70          54          353 100033459   0           0           0           0 100034251   0           0           0           0 100034361   277         4           114         830',header=true)  nn <- colnames(dat)[-1]  ll <- lapply(seq_along(nn),function(x)   nn[agrep(nn[x],nn)]) ll <- ll[!duplicated(ll)]  res <- lapply(ll,function(x)rowmeans(dat[,x])) res <- t(do.call(rbind,res)) ## take first element of pair column name colnames(res) <- lapply(ll,'[[',1)        dat_sn_e15.5_1 dat_sn_p2_1  [1,]            2.0         0.0  [2,]            8.5         3.5  [3,]            0.0         0.0  [4,]           13.0         6.0  [5,]            0.0         0.0  [6,]            0.0         0.0  [7,]          672.5       203.5  [8,]            0.0         0.0  [9,]            0.0         0.0 [10,]          140.5       472.0

Search This Blog

KHS

r - Averaging values in a table by recognizing the name of the column header -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

java - Using an Integer ArrayList in Android -