Tuesday, 15 June 2010

redirect - About handling a redirection in python -



redirect - About handling a redirection in python -

i new python , trying larn new modules. fortunately or unfortunately, picked urllib2 module , started using 1 url that's causing me problems.

to begin with, created request object , called read() on response object. failing. turns out getting redirected error code still 200. not sure what's going on. here code --

def get_url_data(url): print "getting url " + url user_agent = "mozilla/5.0 (windows nt 6.0; rv:14.0) gecko/20100101 firefox/14.0.1" headers = { 'user-agent' : user_agent } request = urllib2.request(url, str(headers) ) try: response = urllib2.urlopen(request) except urllib2.httperror, e: print response.geturl() print response.info() print response.getcode() homecoming false; else: print response print response.info() print response.getcode() print response.geturl() homecoming response

i calling above function http://www.chilis.com".

i expecting receive 301, 302, or 303 instead see 200. here response see --

getting url http://www.chilis.com <addinfourl @ 4354349896 fp = <socket._fileobject object @ 0x1037513d0>> cache-control: private server: microsoft-iis/7.5 sprequestguid: 48bbff39-f8b1-46ee-a70c-bcad16725a4d x-sharepointhealthscore: 0 x-aspnet-version: 2.0.50727 x-powered-by: asp.net microsoftsharepointteamservices: 14.0.0.6120 x-ms-invokeapp: 1; requirereadonly date: wed, 13 feb 2013 11:21:27 gmt connection: close content-length: 0 set-cookie: bigipserverpool_http_chilis.com=359791882.20480.0000; path=/ 200 http://www.chilis.com/(x(1)s(q24tqizldxqlvy55rjk5va2j))/pages/chilisvariationroot.aspx?aspxautodetectcookiesupport=1

can explain url , how handle this? know can utilize "handling redirects" section diveintopython.net there code on page see same response 200.

edit: using code diveintopython, see temporary redirection. don't understand why http errorcode code 200. isn't supposed actual homecoming code?

edit2: see better, not weird redirection @ all. editing title.

edit3: if urllib2 follows redirection automatically, not sure why next code not front end page chilis.com.

docobj = get_url_data(url) doc = docobj.read() soup = beautifulsoup(doc, 'lxml') print(soup.prettify())

if utilize url browser ends getting redirected works (http://www.chilis.com/en/pages/home.aspx").

urllib2 automatically follows redirects, info you're seeing page redirected to.

if don't want follow redirect, you'll need subclass urllib2.httpredirecthandler. here's relevant posting on how that: how prevent python's urllib(2) next redirect

regarding edit 3: looks www.chilis.com requires accepting cookies. can implemented using urllib2, suggest installing requests module (http://pypi.python.org/pypi/requests/).

the next seems want (without error handling):

import requests r = requests.get(url) soup = beautifulsoup(r.text, 'lxml') print(soup.prettify())

python redirect urllib2

No comments:

Post a Comment