list - Python: What's a fast way to read and split a file? -
i need read file , split lines, , split lines in half tab characters, getting rid of speech marks. @ moment have working function. however, rather slow:
temp = [] fp = open(fname, "r") line in fp: temp.append(line.replace("\"","").rstrip("\n").split("\t")) print temp
this splits file list of lists. 1 list, pretty easy redivide pairs later long order retained.
there must faster way of doing this. put me on right track?
thank you!
[edit] file i'm working massive, i'll add it. (is there way upload files on stack overflow?)
"carmilla" "35" "jonathan r" "aa2" "m" "3" "emma" "350" "old" "aa"
should return:
["carmilla", "35", "jonathon r", "aa2", "m", "3", "emma", "350", "old", "aa"]
although code returns list of lists of 2 strings, fine.
sorry, should have noted print statement standing in return statement - since took out of function changed print make more sense here.
i think list comprehension faster calling .append
each line
from itertools import chain open('file.txt') f: lines = chain.from_iterable([l.replace(r'"','').rstrip('\n').split('\t',1) l in f])
edit: produces flattened list
>>> ['carmilla', '35', 'jonathan r', 'aa2', 'm', '3', 'emma', '350', 'old', 'aa']
the non-flattening version:
with open('file.txt') f: lines = [l.replace(r'"','').rstrip('\n').split('\t',1) l in f]
and timeing, turns out op's fastest?
import timeit print("chain, list",timeit.timeit(r""" open('file.txt') f: lines = chain.from_iterable([l.replace(r'"','').rstrip('\n').split('\t',1) l in f])""",setup="from itertools import chain",number=1000)) print("flat ",timeit.timeit(r""" open('file.txt') f: lines = [l.replace(r'"','').rstrip('\n').split('\t',1) l in f]""",setup="from itertools import chain",number=1000)) print("op's ",timeit.timeit(r"""temp = [] fp = open('file.txt', "r") line in fp: temp.append(line.replace("\"","").rstrip("\n").split("\t")) """,number=1000)) print("jamlyks ",timeit.timeit(r""" open('file.txt', 'rb') f: r = csv.reader(f, delimiter=' ', skipinitialspace=true) list(chain.from_iterable(r))""",setup="from itertools import chain; import csv",number=1000)) print("lennart ",timeit.timeit(r""" list(csv.reader(open('file.txt'), delimiter='\t', quotechar='"'))""",setup="from itertools import chain; import csv",number=1000))
yields
c:\users\henry\desktop>k.py ('chain, list', 0.04725674146159321) ('my flat ', 0.04629905135295972) ("op's ", 0.04391255644624917) ('jamlyks ', 0.048360870934994915) ('lennart ', 0.04569112379085424)
Comments
Post a Comment