Breaking up (melting) text data in a column in R? -
i have csv file contains data in following format:
prjid, objective
1001 , (i) improve efficiency (ii) decrease cost (iii) maximize revenue
1002 , a) have fun b) learn new things
1003 , (1) getting tricky (2) challanging task
first variable id , second variable text variable "objective". each project has data on multiple objectives in single column seperate (i), (ii), ..etc or (a), (b), (c),..etc, or (1), (2), (3), ..etc. want observation created each objective of projects. this:
prjid, objective
1001 , (i) improve efficiency
1001 , (ii) decrease cost
1001 , (iii) maximize revenue
1002 , a) have fun
1002 , b) learn new things
1003 , (1) getting tricky
1003 , (2) challanging task
for projects have 1 objective, has 1 row. multiple objectives splits observation.
i quite new handling text data in r, can r pro me started problem? in advance!
here 1 idea.
- insert new delimiter in objective column, using clever regular expression
- use delimiter within
strsplit
split sentence in vector - using
by
, process previous steps id.
following steps , code:
ll <- by(dat,dat$prjid,fun = function(x){ x.delim <- gsub(" (\\(?[a-x,0-9]*\\))",'#\\1',x$objective) obj = unlist(strsplit(x.delim,'#')) data.frame(prjid= x$prjid,objective=obj[-1]) }) ## transform list data.frame do.call(rbind,ll) prjid objective 1001.1 1001 (i) improve efficiency 1001.2 1001 (ii) decrease cost 1001.3 1001 (iii) maximize revenue 1002.1 1002 a) have fun 1002.2 1002 b) learn new things 1003.1 1003 (1) getting tricky 1003.2 1003 (2) challanging task
ps , here dat :
dat <- read.table(text='prjid, objective 1001 , (i) improve efficiency (ii) decrease cost (iii) maximize revenue 1002 , a) have fun b) learn new things 1003 , (1) getting tricky (2) challanging task',sep=',',header=true)
Comments
Post a Comment