xml - xmllint problems to output lines -
i know question includes 2 questions...
at first, want utilize xmllint output "loc" content tags. sitemap load has got xmlns="...".
on xmllint shell, need this:
setrootns xpath //defaultns:loc
that works... no problem. need in bash script.
(afaik) xmllint hasn't got alternative tell "let's go, setrootns" cannot this:
xmllint --xpath "//loc" sitemaps.xml # or xmllint --xpath "//defaultns:loc" sitemaps.xml
this first question, how can tell xmllint load default ns ?
if can't, let's take on sec solution:
i can remove xmlns attribute , then, there os no ns use:
xmllint --xpath "//loc" <(sed -r 's/xmlns=".*?"//' sitemaps.xml)
but... now... whole response of 500 "loc" content concatenated in 1 line !...
i tried too:
xmllint --shell sitemaps.xml <<eof setrootns xpath //defaultns:loc/text() eof
or again
xmllint --shell sitemaps.xml <<eof setrootns cat //defaultns:loc eof
the first gives me (for example)
465 text content=http://...
with truncated url
the sec gives me "------" every 2 lines... , "/>" @ lastly line...
and begin nervous... :)
a big if find solution.
the goal have every location, 1 per line.
i used similar:
clean_xml_message=$(echo "$xml_message" | sed 's/xmlns/ignore/')
eventually seek set new lines:
sed 's/></>\n</g'
i guess want url without <loc></loc>
? select loc elements xmllint:
<loc>...</loc><loc>...</loc><loc>...</loc>
then add together new lines: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'
<loc> ... </loc><loc> ... </loc><loc> ... </loc>
finally remove tags grep -v "<loc>" |grep -v "</loc>"
or single grep -v "$<"
it. (-v invert selection: http://unixhelp.ed.ac.uk/cgi/man-cgi?grep)
xml bash xml-parsing sitemap xmllint
No comments:
Post a Comment