data munging in python compared with R (from an excel sheet) -

i have hypothetical example here file attached (excel file link) i'm loading in file excel , formatting can work either analyse or store more permanently.

in r use following few lines make usable:

library(gdata) tmp = read.xls('~/desktop/sample.xlsx',1,stringsasfactors=f) tmp = tmp[,!sapply(tmp,function(x)all(is.na(x)))] tmp$date = tmp[1,2] for(i in 2:ncol(tmp)){ifelse(tmp[2,i]=='',tmp[2,i]<-tmp[2,i-1],tmp[2,i]<-tmp[2,i])} tmp[2,1]=tmp[1,1] names(tmp)=paste(tmp[2,],tmp[3,],sep='.') tmp=tmp[-c(1:3),] names(tmp)[10]='date'

in python - i've gotten far

import xlrd  myf='/home/me/desktop/sample.xlsx'  sheet = xlrd.open_workbook(myf).sheet_by_index(0)  s1=[] r = sheet.nrows c = sheet.ncols row in range(0,r):     t1=[]     col in range(0,c):         t1.append(sheet.cell_value(row,col))     s1.append(t1)

but i've had little success getting rid of empty rows , columns. of following fail.

>>> s1[0] ['', '', '', '', '', '', '', '', '', '', '', ''] >>> s1[0] == [] false >>> s1[0] == '' false >>> s1[0] == all('') false

so i'm not clear how check entire list empty.

i can zip rows 2 & 3 (5 & 6 in python)

>>> zip(s1[5],s1[6]) [('', ''), (u'cat1', u'a'), ('', u'b'), ('', ''), (u'cat2', u'c'), ('', u'd'), ('', ''), (u'cat3', u'e'), ('', u'f'), ('', ''), (u'cat4', u'g'), ('', u'h')]

but don't know how i'd paste forwards in for-next loop.

very n00b question reflective of understanding of python is. thoughts welcome. submitted trepidation because recognize question has 'homework' feel though personal learning exercise. thanks

after bit of messing i've worked rough , ready working example below: grateful pointers on how more efficiently.

i had try @ pandas found learning curve quite steep. if post working mwe i'd pleased mark answered.

import os import xlrd import pandas pd import pprint import re import csv  '''  create few helper functions save time  finding things, picking empties , selecting items '''  def nulls(x):     g = lambda r: all(i == '' in r)     out = [i i,j in enumerate(x) if g(j)]     return(out)  def fill(x):     in range(1,len(x)):         if x[i] == '':             x[i] = x[i-1]     return(x)  def which(x,y):     out = [i i,j in enumerate(x) if y(j) ]     return(out)  def t(x):     out = map(list,zip(*x))     return(out)  def rbind(x,y,sep=none):     if sep none:         sep='.'     out = [str(i[0]) + sep + str(i[1]) in zip(x,y)]     return(out)  # def csvit(x): #   tmp = next(key key,val in globals().items() if val == x , not key.startswith('_')) #   f=open('/home/me/desktop/csvs/'+tmp+'.csv','wb') #   writer = csv.writer(f,quotechar='"', quoting=csv.quote_all,dialect=csv.excel) #   [writer.writerow(i) in x] #   f.close()   # # load spreadsheet file , convert python list #  sheet = xlrd.open_workbook('/home/me/downloads/sample.xlsx').sheet_by_index(0) s = [sheet.row_values(i) in range(sheet.nrows)]  # # rid of unnecessary excel formatting , spacing #  # rows first s = [j i,j in enumerate(s) if not in nulls(s)]  # transpose & columns (surely there more elegant way?) s = t(s) s = [j i,j in enumerate(s) if not in nulls(s)]  # title primary category column title = s[0][0]  # date secondary category column date = [j[1] j in s if str(j[0]) == 'date']  # # combine columns single category variable (could have left them separate) #  cols=['category'] s = t(s) cols.extend(rbind(fill(s[2]),s[3])[1:]) s = s[4:len(s)]   s=t(s) category = [str(i) in s[0]] s=s[1:len(s)]  c1=[date in range(len(s[0]))] #create date column c2=[title in range(len(s[0]))] #create title column cols.insert(0,'title') cols.insert(1,'date') s.insert(0,c2) s.insert(1,c1)  s = t(s)

this suggestion: if it's possible can export excel worksheet csv, might want have @ numpy.genfromtxt: http://docs.scipy.org/doc/numpy/user/basics.io.genfromtxt.html

it seems have similar capabilities pandas without steep learning curve. has delimiter, autostrip, missing_values, filling_values, column names, , columns in numpy.array form.

Search This Blog

KHS

data munging in python compared with R (from an excel sheet) -

Comments

Post a Comment

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -