multithreading - Learning python and threading. I think my code runs infinitely. Help me find bugs? -
so i've started learning python now, , absolutely in love it.
i'm building small scale facebook data scraper. basically, use graph api , scrape first names of specified number of users. works fine in single thread (or no thread guess).
i used online tutorials come following multithreaded version (updated code):
import requests import json import time import threading import queue graphurl = 'http://graph.facebook.com/' first_names = {} # store first names , counts queue = queue.queue() def getoneuser(url): http_response = requests.get(url) # open request url if http_response.status_code == 200: data = http_response.text.encode('utf-8', 'ignore') # text of response, , encode json_obj = json.loads(data) # load json object # name = json_obj['name'] return json_obj['first_name'] # last = json_obj['last_name'] return none class threadget(threading.thread): """ threaded name scraper """ def __init__(self, queue): threading.thread.__init__(self) self.queue = queue def run(self): while true: #print 'thread started\n' url = graphurl + str(self.queue.get()) first = getoneuser(url) # 1 user's first name if first not none: if first_names.has_key(first): # if name has been encountered before first_names[first] = first_names[first] + 1 # increment count else: first_names[first] = 1 # add new name self.queue.task_done() #print 'thread ended\n' def main(): start = time.time() in range(6): t = threadget(queue) t.setdaemon(true) t.start() in range(100): queue.put(i) queue.join() name in first_names.keys(): print name + ': ' + str(first_names[name]) print '----------------------------------------------------------------' print '================================================================' # print top first names key in first_names.keys(): if first_names[key] > 2: print key + ': ' + str(first_names[key]) print 'it took ' + str(time.time()-start) + 's' main()
to honest, don't understand of parts of code main idea. output nothing. mean shell has nothing in it, believe keeps on running.
so doing filling queue
integers user id's on fb. each id used build api call url. getoneuser
returns name of 1 user @ time. task
(id) marked 'done' , moves on.
what wrong code above?
your original run
function processed 1 item queue. in you've removed 5 items queue.
usually run
functions like
run(self): while true: dousefulwork()
i.e. have loop causes recurring work done.
[edit] op edited code include change.
some other useful things try:
- add print statement
run
function: you'll find called 5 times. - remove
queue.join()
call, causing module block, able probe state of queue. - put entire body of
run
function. verify can use function in single threaded manner desired results, then - try single worker thread, go
- multiple worker threads.
Comments
Post a Comment