Fetch html which is loaded dynamically? python -
i writing crawler in python must extract links pdfs listed in page:
http://www.peekyou.com/barack_obama
(scroll down, there "documents" section links pdfs. )
the problem "documents" section loaded in background, after few seconds, in javascript. , function using fetch html page not fetch section.
to fetch html, have been given code:
... req = urllib2.request(url) req.add_header('user-agent', random.choice(listagent)) page = urllib2.urlopen(req) if page.info().getmaintype() == "text": html = page.read() ...
which not fetch section, said.
what proper way deal problem? there api can use? thank you.
Comments
Post a Comment