scripting - How to scrape html tags spread over multiple lines in python? -
i trying scrape webpage in python. able results tags on single line, tags spread on multiple lines, code cannot retrieve anything.
in html source single line tags nowadays as:
<td><span class="facultyname">john matthew falletta, md</span>
and multiple line tags nowadays as:
<td><span class="label">division:</span> </td><td>hematology/oncology</td>
here wrote:
patfinderfullname = re.compile('<span class="facultyname">(.*)</span>') fullname = re.findall(patfinderfullname,webpage) #works fine patfinderdivision = re.compile('<span class="label">division:</span> </td><td>(.*)</td>') partition = re.findall(patfinderdivision,webpage) #doesn't work
here webpage variable contains url has scraped. can point out, missing, or wrong?
i highly recommend utilize beautifulsoup. python library parsing html documents.
p.s: if want stick own code, utilize \s* skip white spaces in regex.
patfinderdivision = re.compile('<span class="label">division:</span>\s* \s*</td><td>(.*)</td>')
python scripting web-scraping
No comments:
Post a Comment