r - Averaging values in a table by recognizing the name of the column header -
i have following table called m:
identifier dat_sn_e15.5_1 dat_sn_e15.5_2 dat_sn_p2_1 dat_sn_p2_2 100009600 3 1 0 0 100009609 13 4 1 6 100009614 0 0 0 0 100009664 9 17 5 7 100012 0 0 0 0 100017 0 0 0 0 100019 1275 70 54 353 100033459 0 0 0 0 100034251 0 0 0 0 100034361 277 4 114 830
column number 1 gene identifier, column 2 , 3 biological replicates of dat_sn_e15.5, column 4 , 5 biological replicates of dat_sn_p2. real world data consists of 56 such samples each having 2 replicates. there way recognize replicates based on name , difference being 1 or 2 @ end of name?
if how create new table m.rep<- averages 2 values each identifier , each sample , contains gene identifier, columns named dat_sn_e15.5_ave , dat_sn_p2_ave.
one idea use fuzzy search or approximate matches pattern using agrep
.
## replace nn colnames nn <- c('dat_sn_e15.5_1','dat_sn_e15.5_2','dat_sn_p2_1','dat_sn_p2_2') ## each column name find column approximately similar ll <- lapply(seq_along(nn),function(x) nn[agrep(nn[x],nn)]) ## remove duplicate since similar n , b similar ll[!duplicated(ll)] [[1]] [1] "dat_sn_e15.5_1" "dat_sn_e15.5_2" [[2]] [1] "dat_sn_p2_1" "dat_sn_p2_2"
edit here how can use above, using data
dat <- read.table(text='identifier dat_sn_e15.5_1 dat_sn_e15.5_2 dat_sn_p2_1 dat_sn_p2_2 100009600 3 1 0 0 100009609 13 4 1 6 100009614 0 0 0 0 100009664 9 17 5 7 100012 0 0 0 0 100017 0 0 0 0 100019 1275 70 54 353 100033459 0 0 0 0 100034251 0 0 0 0 100034361 277 4 114 830',header=true) nn <- colnames(dat)[-1] ll <- lapply(seq_along(nn),function(x) nn[agrep(nn[x],nn)]) ll <- ll[!duplicated(ll)] res <- lapply(ll,function(x)rowmeans(dat[,x])) res <- t(do.call(rbind,res)) ## take first element of pair column name colnames(res) <- lapply(ll,'[[',1) dat_sn_e15.5_1 dat_sn_p2_1 [1,] 2.0 0.0 [2,] 8.5 3.5 [3,] 0.0 0.0 [4,] 13.0 6.0 [5,] 0.0 0.0 [6,] 0.0 0.0 [7,] 672.5 203.5 [8,] 0.0 0.0 [9,] 0.0 0.0 [10,] 140.5 472.0
Comments
Post a Comment