php - Preserving <br> tags when parsing HTML text content -
i have little issue. want parse simple html document in php. here simple html :
<html> <body> <table> <tr> <td>colombo <br> coucou</td> <td>30</td> <td>sunny</td> </tr> <tr> <td>hambantota</td> <td>33</td> <td>sunny</td> </tr> </table> </body> </html>
and php code :
$dom = new domdocument(); $html = $dom->loadhtmlfile("test.html"); $dom->preservewhitespace = false; $tables = $dom->getelementsbytagname('table'); $rows = $tables->item(0)->getelementsbytagname('tr'); foreach ($rows $row) { $cols = $row->getelementsbytagname('td'); echo $cols->item(0)->nodevalue.'<br />'; echo $cols->item(1)->nodevalue.'<br />'; echo $cols->item(2)->nodevalue; }
but can see, have <br>
tag , need it, when php code runs, removes tag. can explain me how can maintain it?
i recommend capture values of table cells help of xpath:
$values = array(); $xpath = new domxpath($dom); foreach($xpath->query('//tr') $row) { $row_values = array(); foreach($xpath->query('td', $row) $cell) { $row_values[] = innerhtml($cell); } $values[] = $row_values; }
also, i've had same problem <br>
tags beingness stripped out of fetched content reason considered empty nodes; unfortunately they're not automatically replaced newline character (\n
);
so i've done designed own innerhtml function has proved invaluable in many projects. here share you:
function innerhtml(domelement $element, $trim = true, $decode = true) { $innerhtml = ''; foreach ($element->childnodes $node) { $temp_container = new domdocument(); $temp_container->appendchild($temp_container->importnode($node, true)); $innerhtml .= ($trim ? trim($temp_container->savehtml()) : $temp_container->savehtml()); } homecoming ($decode ? html_entity_decode($innerhtml) : $innerhtml); }
php dom xpath html-parsing
No comments:
Post a Comment