html - Regex for an open SGML node that contains <p> , </p>, and <br /> tags -
i have sgml i'm trying clean adding closing tags opening ones. right now, document has construction this:
<cat> <name>daniel <color>white <desc>daniel white cat <p>he born in july</p><br />he's super cute.<p><br />he not have siblings. <country>usa </cat> so far can match open tag , capture content grouping using regexp: <name>([^\\<]+)[^<] if doesn't have <p>, </p>, or <br /> elements within content area. but if <desc>([^\\<]+)[^<], pattern matching stops right before first <p>
the reason why i'm using < end of pattern because other open nodes don't have html elements stop matching
how can create regexp matches <desc> node includes <p>, </p>, <br /> , ends before <country> node?
how this:
<desc>((?:</?p>|<br />|[^\\<])+) this allows these 3 tags match , stops @ next < doesn't belong 1 of three.
by way, why aren't allowing backslash valid character?
html xml regex sgml
No comments:
Post a Comment