Wednesday, 15 July 2015

xml - xmllint problems to output lines -



xml - xmllint problems to output lines -

i know question includes 2 questions...

at first, want utilize xmllint output "loc" content tags. sitemap load has got xmlns="...".

on xmllint shell, need this:

setrootns xpath //defaultns:loc

that works... no problem. need in bash script.

(afaik) xmllint hasn't got alternative tell "let's go, setrootns" cannot this:

xmllint --xpath "//loc" sitemaps.xml # or xmllint --xpath "//defaultns:loc" sitemaps.xml

this first question, how can tell xmllint load default ns ?

if can't, let's take on sec solution:

i can remove xmlns attribute , then, there os no ns use:

xmllint --xpath "//loc" <(sed -r 's/xmlns=".*?"//' sitemaps.xml)

but... now... whole response of 500 "loc" content concatenated in 1 line !...

i tried too:

xmllint --shell sitemaps.xml <<eof setrootns xpath //defaultns:loc/text() eof

or again

xmllint --shell sitemaps.xml <<eof setrootns cat //defaultns:loc eof

the first gives me (for example)

465 text content=http://...

with truncated url

the sec gives me "------" every 2 lines... , "/>" @ lastly line...

and begin nervous... :)

a big if find solution.

the goal have every location, 1 per line.

i used similar:

clean_xml_message=$(echo "$xml_message" | sed 's/xmlns/ignore/')

eventually seek set new lines:

sed 's/></>\n</g'

i guess want url without <loc></loc> ? select loc elements xmllint:

<loc>...</loc><loc>...</loc><loc>...</loc>

then add together new lines: sed 's/<loc>/<loc>\n/g' | sed 's#</loc>#\n</loc>#g'

<loc> ... </loc><loc> ... </loc><loc> ... </loc>

finally remove tags grep -v "<loc>" |grep -v "</loc>" or single grep -v "$<" it. (-v invert selection: http://unixhelp.ed.ac.uk/cgi/man-cgi?grep)

xml bash xml-parsing sitemap xmllint

No comments:

Post a Comment