Thursday, 15 January 2015

python - Parsing XML RSS feed byte stream for tag -



python - Parsing XML RSS feed byte stream for <item> tag -

i'm attempting parse rss feed first instance of element "".

def pagereader(url): try: readpage = urllib2.urlopen(url) except urllib2.urlerror, e: # print 'we failed reach server.' # print 'reason: ', e.reason homecoming 404 except urllib2.httperror, e: # print('the server couldn\'t fulfill request.') # print('error code: ', e.code) homecoming 404 else: outputpage = readpage.read() homecoming outputpage

assume arguments beingness passed correct. function returns str object value entire rss feed - i've confirmed type with:

a = isinstance(value, str) if not a: homecoming -1

so, entire rss feed has been returned function call, it's point nail brick wall - i've tried parsing feed beautifulsoup, lxml , various other libs, no success (i had some success beautifulsoup, wasn't able pull kid elements parent, example, . i'm ready resort writing own parser, i'd know if has suggestions.

to recreate error, phone call above function argument similar to:

http://www.cert.org/nav/cert_announcements.rss

you'll see i'm trying homecoming first child.

<item> <title>new blog entry: mutual sense guide mitigating insider threats - best practice 16 (of 19)</title> <link>http://www.cert.org/blogs/insider_threat/2013/02/common_sense_guide_to_mitigating_insider_threats_-_best_practice_16_of_19.html</link> <description>this sixteenth of 19 blog posts 4th edition of mutual sense guide mitigating insider threats describes practice 16: develop formalized insider threat program.</description> <pubdate>wed, 06 feb 2013 06:38:07 -0500</pubdate> </item>

as i've said, beautifulsoup fails find both pubdate , link, crucial app.

any advice appreciated.

i had success using beautifulstonesoup , passing lowercase tags so:

from beautifulsoup import beautifulstonesoup xml = '<item><title>new blog entry: mutual sense guide mitigating insider threats - best practice 16 (of 19)</title><link>http://www.cert.org/blogs/insider_threat/2013/02/common_sense_guide_to_mitigating_insider_threats_-_best_practice_16_of_19.html</link><description>this sixteenth of 19 blog posts 4th edition of mutual sense guide mitigating insider threats describes practice 16: develop formalized insider threat program.</description><pubdate>wed, 06 feb 2013 06:38:07 -0500</pubdate></item>' soup = beautifulstonesoup(xml) item = soup('item')[0] print item('pubdate'), item('link')

python xml parsing rss beautifulsoup

No comments:

Post a Comment