Wednesday, 15 August 2012

php - How to get page content using cURL? -



php - How to get page content using cURL? -

i scrape content of google search result page using curl. i've been trying setting different user agents, , setting other options can't seem content of page, redirected or "page moved" error.

i believe has fact query string gets encoded somewhere i'm not sure how around that.

//$url same link above $ch = curl_init(); $user_agent='mozilla/5.0 (windows nt 6.1; rv:8.0) gecko/20100101 firefox/8.0' curl_setopt ($ch, curlopt_url, $url); curl_setopt ($ch, curlopt_useragent, $user_agent); curl_setopt ($ch, curlopt_header, 0); curl_setopt ($ch, curlopt_followlocation, 1); curl_setopt ($ch, curlopt_returntransfer, 1); curl_setopt ($ch,curlopt_connecttimeout,120); curl_setopt ($ch,curlopt_timeout,120); curl_setopt ($ch,curlopt_maxredirs,10); curl_setopt ($ch,curlopt_cookiefile,"cookie.txt"); curl_setopt ($ch,curlopt_cookiejar,"cookie.txt"); echo curl_exec ($ch);

what need php code show exact content of page see on browser? missing? can point me right direction?

i've seen similar questions on so, none reply help me.

edit:

i tried open link using selenium webdriver, gives same results curl. still thinking has fact there special characters in query string getting messed somewhere in process.

this how:

/** * web file (html, xhtml, xml, image, etc.) url. homecoming * array containing http server response header fields , content. */ function get_web_page( $url ) { $user_agent='mozilla/5.0 (windows nt 6.1; rv:8.0) gecko/20100101 firefox/8.0'; $options = array( curlopt_customrequest =>"get", //set request type post or curlopt_post =>false, //set curlopt_useragent => $user_agent, //set user agent curlopt_cookiefile =>"cookie.txt", //set cookie file curlopt_cookiejar =>"cookie.txt", //set cookie jar curlopt_returntransfer => true, // homecoming web page curlopt_header => false, // don't homecoming headers curlopt_followlocation => true, // follow redirects curlopt_encoding => "", // handle encodings curlopt_autoreferer => true, // set referer on redirect curlopt_connecttimeout => 120, // timeout on connect curlopt_timeout => 120, // timeout on response curlopt_maxredirs => 10, // stop after 10 redirects ); $ch = curl_init( $url ); curl_setopt_array( $ch, $options ); $content = curl_exec( $ch ); $err = curl_errno( $ch ); $errmsg = curl_error( $ch ); $header = curl_getinfo( $ch ); curl_close( $ch ); $header['errno'] = $err; $header['errmsg'] = $errmsg; $header['content'] = $content; homecoming $header; }

example

//read web page , check errors: $result = get_web_page( $url ); if ( $result['errno'] != 0 ) ... error: bad url, timeout, redirect loop ... if ( $result['http_code'] != 200 ) ... error: no page, no permissions, no service ... $page = $result['content'];

php curl

No comments:

Post a Comment