Wednesday, 15 February 2012

python - Global Variable Reset not working in Google App Engine -



python - Global Variable Reset not working in Google App Engine -

i calling web crawling function handler in gae , retrieves few images , displays them. works fine on first phone call next time displays same images , crawler starts lastly 1 left off. think problem global variables not beingness reset correctly.

everytime redeploy app correctly first time problem begins.

here code please allow me know if need me clarify think should create sense.

here scraper function

visited_pages = [] visit_queue = deque([]) collected_pages = [] collected_pics = [] count = 0 pic_count = 0 def scrape_pages(url, root_url, keywords=[], recurse=true): #variables max_count = 16 pic_num = 100 global count global pic_count global collected_pics global collected_pages print 'the keywords , url are' print keywords print url #this of links have been scraped the_links = [] soup = soupify_url(url) #only add together new pages onto queue if recursion argument true if recurse: #find links on page try: tag in soup.findall('a'): the_links.append(tag.get('href')) except attributeerror: homecoming try: external_links, internal_links, root_links, primary_links = categorize_links(the_links, url, root_url) except typeerror: homecoming #change depends on input links_to_visit = external_links + internal_links + root_links #build queue link in links_to_visit: if link not in visited_pages , link not in visit_queue: visit_queue.append(link) visited_pages.append(url) count = count + 1 # print 'number of pages visited' # print count #add pages collected_pages depending on criteria given if keywords given if keywords: page_to_add = find_pages(url, soup, keywords) # print 'page add' # print page_to_add if page_to_add , page_to_add not in collected_pages: collected_pages.append(page_to_add) pics_to_add = add_pics(url, soup) # print 'pics add' # print pics_to_add if pics_to_add: collected_pics.extend(pics_to_add) #here actual recursion happens finishing queue while visit_queue: if count >= max_count: homecoming if pic_count > pic_num: homecoming link = visit_queue.popleft() # print link scrape_pages(link, root_url, keywords) # print '***done***' ###done recursive scraping function here #here list of links bing, add together them queue , go through them reset global variables def scrape_bing_src(keywords): visit_queue, the_url = scrape_bing.get_links(keywords, a_list = false) scrape_pages(visit_queue.popleft(), the_url, keywords, recurse=true) global collected_pics global pic_count global count global visited_pages global visit_queue pic_count = 0 count = 0 visited_pages = [] visit_queue = deque([]) pics_to_return = collected_pics collected_pics = [] homecoming pics_to_return

here handler calls scraper function

#this displays images class try(bloghandler): def get(self, keyword): keyword = str(keyword) keyword_list = keyword.split() img_list = scraper.scrape_bing_src(keyword_list) img in img_list: self.response.write("""<br><img src='""" + img + """'>""") self.response.write('we done here')

your code isn't run within 1 "server" , 1 instance, noticed instances tab in admin console. there chance between calls switched different server, or process "restarted" (more can read here). during warmup process application reads disk memory , starts handle requests. every time getting new precached python instance own globals variable values.

in case improve utilize memcache.

python google-app-engine global-variables webapp2

No comments:

Post a Comment