php - How can I parse this HTML with a regular expression? -
i trying write regular look extract href
, anchor
text of list of urls html source. anchor
text can values.
the html part goes follow:
<div class="links"><a rel="nofollow" target="_blank" href="http://url1.com" class="get-all">url1</a><a rel="nofollow" target="_blank" href="http://url2.com" class="get-all">this url-2</a><a rel="nofollow" target="_blank" href="http://url3.com" class="get-all">this url-3</a><a rel="nofollow" target="_blank" href="http://url4.com" class="get-all">sweet url 4</a></div>
i tried next regular expression, it's not working since grabs before </a>
tag , fails.
preg_match_('/<a rel="nofollow" target="_blank" href="(.*)" class="see-all">(.*)<\/a>/', $source , $website_array);
what working regular look extract required data?
if must know, look greedy, match start of first anchor , end of last; /u
modifier prepare that:
preg_match('/<a rel="nofollow" target="_blank" href="(.*)" class="see-all">(.*)<\/a>/u', $source , $website_array);
note pcre.backtrack_limit
applies ungreedy mode.
using look-ahead sets might give improve performance:
preg_match('/<a rel="nofollow" target="_blank" href="([^"]*)" class="see-all">([^<]*)<\/a>/', $source , $website_array);
this have problem tags within anchor itself.
with aforementioned limitations, consider using html parser:
$d = new domdocument; $d->loadhtml($source); $xp = new domxpath($d); foreach ($xp->query('//a[@class="see-all"][@rel="nofollow"][@target="_blank"]') $anchor) { $href = $anchor->getattribute('href'); $text = $anchor->nodevalue; }
demo
this happily handle attributes in different order , give ability query farther inside, etc.
php regex
No comments:
Post a Comment