function - how to predict how long it will take for python to run a script? -


so want make sure run program when optimal, example, if take 5 hours complete, should run overnight!

i know program end, , theoretically should able base length on size. here actual problem:

i need open 16 pickled files pandas dataframes add total of 1.5 gigs. note, need dataframes add 20 gigs, answer need way of telling how long following code take given total amounts of gigs:

import pickle import os def picklesave(data, picklefile):     output = open(picklefile, 'wb')     pickle.dump(data, output)     output.close()     print "file has been saved %s" % (picklefile) def pickleload(picklefile):     pkl_file = open(picklefile, 'rb')     data = pickle.load(pkl_file)     pkl_file.close()     return data directory = '/users/ryansaxe/desktop/kaggle_parkinsons/gps/' files = os.listdir(directory) dfs = [pickleload(directory + i) in files] new_file = directory + 'new_file_dataframe' picklesave(dfs,new_file) 

so need write function following:

def time_fun(data_size_in_gigs):     #some algorithm here     print "your code take ___ hours run" 

i have no clue how approach this, or if possible. ideas?

this execution time entirely dependent on system, i.e., hard drive / ssd, processor, etc. no 1 can tell upfront time take run on computer, way you'll able precise estimate run script on sample files add small size such 100mb, take note of how long took, , base estimations off of that.

def time_fun(data_size_in_gigs):     benchmark = time_you_manually_tested_for_100mb     time_to_run = data_size_in_gigs/0.1 * benchmark     print "your code take time_to_run hours run" 

edit: in fact, may want save benchmark (size,time) pair on file, automatically add new entries whenever run script. here in function, may example want retrieve 2 benchmarks closest data_size you're estimating, , estimate off of them, taking average , making proportional data_size need. each adjacent pair of benchmarks define different linear slope accurate data near it.

     |                  .      |                 . time |               .      |            .      |       .      |_._________________               size 

just avoid saving 2 benchmarks differ less 200mb example, actual time may vary , ruin estimation entries such (999mb, 100 minutes) followed (1gb, 95 minutes).

the projection of line defined 2 last points closest estimate have new all-time-high data sizes.


Comments

Popular posts from this blog

blackberry 10 - how to add multiple markers on the google map just by url? -

php - guestbook returning database data to flash -

delphi - Dynamic file type icon -