Hee: python - Scrapy crawl from script always blocks script execution after scraping -

Monday, 15 April 2013

python - Scrapy crawl from script always blocks script execution after scraping -

i next guide http://doc.scrapy.org/en/0.16/topics/practices.html#run-scrapy-from-a-script run scrapy script. here part of script:

    crawler = crawler(settings(settings))     crawler.configure()     spider = crawler.spiders.create(spider_name)     crawler.crawl(spider)     crawler.start()     log.start()     reactor.run()     print "it can't printed out!"

it works @ should: visits pages, scrape needed info , stores output json told it(via feed_uri). when spider finishing work(i can see number in output json) execution of script wouldn't resume. isn't scrapy problem. , reply should somewhere in twisted's reactor. how release thread execution?

you need stop reactor when spider finishes. can accomplish listening spider_closed signal:

from twisted.internet import reactor  scrapy import log, signals scrapy.crawler import crawler scrapy.settings import settings scrapy.xlib.pydispatch import dispatcher  testspiders.spiders.followall import followallspider  def stop_reactor():     reactor.stop()  dispatcher.connect(stop_reactor, signal=signals.spider_closed) spider = followallspider(domain='scrapinghub.com') crawler = crawler(settings()) crawler.configure() crawler.crawl(spider) crawler.start() log.start() log.msg('running reactor...') reactor.run()  # script block here until spider closed log.msg('reactor stopped.')

and command line log output might like:

stav@maia:/srv/scrapy/testspiders$ ./api 2013-02-10 14:49:38-0600 [scrapy] info: running reactor... 2013-02-10 14:49:47-0600 [followall] info: closing spider (finished) 2013-02-10 14:49:47-0600 [followall] info: dumping scrapy stats:     {'downloader/request_bytes': 23934,...} 2013-02-10 14:49:47-0600 [followall] info: spider closed (finished) 2013-02-10 14:49:47-0600 [scrapy] info: reactor stopped. stav@maia:/srv/scrapy/testspiders$

python twisted scrapy

Hee

Monday, 15 April 2013

python - Scrapy crawl from script always blocks script execution after scraping -

No comments:

Post a Comment