scrapy - Enabling HttpProxyMiddleware in scrapyd -
after reading scrapy documentation, thought httpproxymiddleware enabled default. when start spider via scrapyd's webservice interface, httpproxymiddleware not enabled. receive next output:
2013-02-18 23:51:01+1300 [scrapy] info: scrapy 0.17.0-120-gf293d08 started (bot: pde) 2013-02-18 23:51:02+1300 [scrapy] debug: enabled extensions: feedexporter, logstats, closespider, webservice, corestats, spiderstate 2013-02-18 23:51:02+1300 [scrapy] debug: enabled downloader middlewares: httpauthmiddleware, downloadtimeoutmiddleware, useragentmiddleware, retrymiddleware, defaultheadersmiddleware, metarefreshmiddleware, httpcompressionmiddleware, redirectmiddleware, cookiesmiddleware, chunkedtransfermiddleware, downloaderstats 2013-02-18 23:51:02+1300 [scrapy] debug: enabled spider middlewares: httperrormiddleware, offsitemiddleware, referermiddleware, urllengthmiddleware, depthmiddleware 2013-02-18 23:51:02+1300 [scrapy] debug: enabled item pipelines: pdepipeline 2013-02-18 23:51:02+1300 [shotgunsupplements] info: spider opened
note httpproxymiddleware not enabled. how can enable scrapyd? help appreciated.
my scrapy.cfg
# automatically created by: scrapy startproject # # more info [deploy] section see: # http://doc.scrapy.org/topics/scrapyd.html [settings] default = pd.settings [deploy] url = http://localhost:6800/ project = pd
i have next settings.py
bot_name = 'pd' #this gets replaced function bot_version = '1.0' spider_modules = ['pd.spiders'] newspider_module = 'pd.spiders' default_item_class = 'pd.items.product' item_pipelines = 'pd.pipelines.pdpipeline' user_agent = '%s/%s' % (bot_name, bot_version) telnetconsole_host = '127.0.0.1' # defaults 0.0.0.0 set telnetconsole_port = '6073' # can see it. telnetconsole_enabled = false webservice_enabled = true log_enabled = true robotstxt_obey = false item_pipelines = [ 'pd.pipelines.pdpipeline', ] data_dir = '/home/pd/scraped_data' #directory store export files to. download_delay = 2.0 downloader_middlewares = { 'scrapy.contrib.downloadermiddleware.httpproxy.httpproxymiddleware': 750, }
regards,
pranshu
after spending forever trying debug, turns out httpproxymiddleware expects http_proxy environment variable set. middleware not loaded if http_proxy not set. therefore, set http_proxy , bob's uncle! works!
scrapy scrapyd
No comments:
Post a Comment