Saturday, 15 June 2013

html - Regex for an open SGML node that contains , , and tags -



html - Regex for an open SGML node that contains <p> , </p>, and <br /> tags -

i have sgml i'm trying clean adding closing tags opening ones. right now, document has construction this:

<cat> <name>daniel <color>white <desc>daniel white cat <p>he born in july</p><br />he's super cute.<p><br />he not have siblings. <country>usa </cat>

so far can match open tag , capture content grouping using regexp: <name>([^\\<]+)[^<] if doesn't have <p>, </p>, or <br /> elements within content area. but if <desc>([^\\<]+)[^<], pattern matching stops right before first <p>

the reason why i'm using < end of pattern because other open nodes don't have html elements stop matching

how can create regexp matches <desc> node includes <p>, </p>, <br /> , ends before <country> node?

how this:

<desc>((?:</?p>|<br />|[^\\<])+)

this allows these 3 tags match , stops @ next < doesn't belong 1 of three.

by way, why aren't allowing backslash valid character?

html xml regex sgml

No comments:

Post a Comment