html - Regex for an open SGML node that contains <p> , </p>, and <br /> tags -
i have sgml i'm trying clean adding closing tags opening ones. right now, document has construction this:
<cat> <name>daniel <color>white <desc>daniel white cat <p>he born in july</p><br />he's super cute.<p><br />he not have siblings. <country>usa </cat>
so far can match open tag , capture content grouping using regexp: <name>([^\\<]+)[^<]
if doesn't have <p>
, </p>
, or <br />
elements within content area. but if <desc>([^\\<]+)[^<]
, pattern matching stops right before first <p>
the reason why i'm using <
end of pattern because other open nodes don't have html elements stop matching
how can create regexp matches <desc>
node includes <p>
, </p>
, <br />
, ends before <country>
node?
how this:
<desc>((?:</?p>|<br />|[^\\<])+)
this allows these 3 tags match , stops @ next <
doesn't belong 1 of three.
by way, why aren't allowing backslash valid character?
html xml regex sgml
No comments:
Post a Comment