list - Python: What's a fast way to read and split a file? -


i need read file , split lines, , split lines in half tab characters, getting rid of speech marks. @ moment have working function. however, rather slow:

temp = [] fp = open(fname, "r") line in fp:     temp.append(line.replace("\"","").rstrip("\n").split("\t")) print temp 

this splits file list of lists. 1 list, pretty easy redivide pairs later long order retained.

there must faster way of doing this. put me on right track?

thank you!

[edit] file i'm working massive, i'll add it. (is there way upload files on stack overflow?)

"carmilla"  "35" "jonathan r"    "aa2" "m" "3" "emma"  "350" "old"   "aa" 

should return:

["carmilla", "35", "jonathon r", "aa2", "m", "3", "emma", "350", "old", "aa"] 

although code returns list of lists of 2 strings, fine.

sorry, should have noted print statement standing in return statement - since took out of function changed print make more sense here.

i think list comprehension faster calling .append each line

from itertools import chain open('file.txt') f:     lines = chain.from_iterable([l.replace(r'"','').rstrip('\n').split('\t',1) l in f]) 

edit: produces flattened list

>>>  ['carmilla', '35', 'jonathan r', 'aa2', 'm', '3', 'emma', '350', 'old', 'aa'] 

the non-flattening version:

with open('file.txt') f:     lines = [l.replace(r'"','').rstrip('\n').split('\t',1) l in f] 

and timeing, turns out op's fastest?

import timeit print("chain, list",timeit.timeit(r""" open('file.txt') f:     lines = chain.from_iterable([l.replace(r'"','').rstrip('\n').split('\t',1) l in f])""",setup="from itertools import chain",number=1000)) print("flat       ",timeit.timeit(r""" open('file.txt') f:     lines = [l.replace(r'"','').rstrip('\n').split('\t',1) l in f]""",setup="from itertools import chain",number=1000)) print("op's       ",timeit.timeit(r"""temp = [] fp = open('file.txt', "r") line in fp:     temp.append(line.replace("\"","").rstrip("\n").split("\t")) """,number=1000)) print("jamlyks    ",timeit.timeit(r""" open('file.txt', 'rb') f:     r = csv.reader(f, delimiter=' ', skipinitialspace=true)     list(chain.from_iterable(r))""",setup="from itertools import chain; import csv",number=1000)) print("lennart    ",timeit.timeit(r"""     list(csv.reader(open('file.txt'), delimiter='\t', quotechar='"'))""",setup="from itertools import chain; import csv",number=1000)) 

yields

c:\users\henry\desktop>k.py ('chain, list', 0.04725674146159321) ('my flat    ', 0.04629905135295972) ("op's       ", 0.04391255644624917) ('jamlyks    ', 0.048360870934994915) ('lennart    ', 0.04569112379085424) 

Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -