Monday, 15 June 2015

javascript - PhantomJS using too many threads -



javascript - PhantomJS using too many threads -

i wrote phantomjs app crawl on site built , check javascript file included. javascript similar google inline code loads in js file. app looks other js file why used phantom.

what's expected result?

the console output should read through ton of urls , tell if script loaded or not.

what's happening?

the console output read expected 50 requests , start spitting out error:

2013-02-21t10:01:23 [fatal] qeventdispatcherunixprivate(): can not go on without thread pipe qeventdispatcherunixprivate(): unable create thread pipe: many open files

this block of code opens page , searches script include:

page.open(url, function (status) { console.log(yellow, url, status, clear); var found = page.evaluate(function () { if (document.queryselectorall("script[src='***']").length) { homecoming true; } else { homecoming false; } }); if (found) { console.log(green, 'javascript found on', url, clear); } else { console.log(red, 'javascript not found on', url, clear); } self.crawledurls[url] = true; self.crawlurls(self.getalllinks(page), depth-1); });

the crawledurls object object of urls i've crawled. crawlurls function goes through links getalllinks function , calls open function on links have base of operations domain of domain crawler started on.

edit

i modified lastly block of code follows, still have same issue. have added page.close() file.

if (!found) { console.log(red, 'javascript not found on', url, clear); } self.crawledurls[url] = true; var links = self.getalllinks(page); page.close(); self.crawlurls(links, depth-1);

from documentation:

due technical limitations, web page object might not garbage collected. encountered when same object used on , on again.

the solution explicitly phone call close() of web page object (i.e. page in many cases) @ right time.

some included examples, such follow.js, demonstrate multiple page objects explicit close.

javascript web-crawler phantomjs

No comments:

Post a Comment