Saturday, 15 February 2014

bash - Parsing XML via Command Line -



bash - Parsing XML via Command Line -

so have xml file want parse bash script, etc. using xmlstarlet (or alternative if people can give me example).

the basic construction this:

 <character>   <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning>     <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading>      <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup>     </reading_meaning> </character>

there other fields there , meanings , readings can alter in number. i'd of readings, meanings, stroke count, etc. out , generate html table bash.

this big file many characters need looking up. i'd script takes in $1 , uses values based on tag. ideally it'd be:

kanjilookup.sh 恵

and generate html table based on content.

thoughts? (i'd using programme xpath)

as @thatotherguy suggested, you'll want xslt instead of bash. can parse xml bash, it's going tricky pretty quick.

following @thatotherguy's suggestion, have xslt stylesheet looks this:

<!-- kanjilookup.xsl --> <?xml version="1.0" encoding="iso-8859-1"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/xsl/transform"> <xsl:param name="character"/> <xsl:output method="html" indent="yes"/> <xsl:strip-space elements="*"/> <!-- http://stackoverflow.com/questions/9611569/xsl-how-do-you-capitalize-first-letter --> <xsl:variable name="vlower" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:variable name="vupper" select="'abcdefghijklmnopqrstuvwxyz'"/> <xsl:template name="capitalize"> <xsl:param name="string"/> <xsl:value-of select= "concat(translate(substring( $string, 1, 1), $vlower, $vupper), substring($string, 2) ) "/> </xsl:template> <xsl:template match="/"> <xsl:if test="string-length($character) = 0 or not(//literal[. = $character])"> <xsl:message terminate="yes">err: no input character given.</xsl:message> </xsl:if> <xsl:apply-templates select="characters/character[literal[. = $character]]"/> </xsl:template> <xsl:template match="character"> <xsl:text disable-output-escaping='yes'>&lt;!doctype html> </xsl:text> <html> <head/> <body> <table> <tbody> <xsl:apply-templates/> </tbody> </table> </body> </html> </xsl:template> <xsl:template match="literal"> <caption> <xsl:value-of select="."/> </caption> </xsl:template> <xsl:template match="stroke_count"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="translate(local-name(), '_', ' ')"/> </xsl:call-template> </td> <td><xsl:value-of select="."/></td> </tr> </xsl:template> <xsl:template match="misc | reading_meaning | rmgroup"> <xsl:apply-templates/> </xsl:template> <xsl:template match="reading | meaning"> <tr> <td> <xsl:call-template name="capitalize"> <xsl:with-param name="string" select="local-name()"/> </xsl:call-template> <xsl:apply-templates select="@r_type"/> </td> <td> <xsl:value-of select="."/> </td> </tr> </xsl:template> <xsl:template match="@r_type"> <xsl:value-of select="concat(' ', '(', ., ')')"/> </xsl:template> </xsl:stylesheet>

let's have file called characters.xml:

<characters> <character> <literal>恵</literal> <misc> <stroke_count>10</stroke_count> </misc> <reading_meaning> <rmgroup> <reading r_type="ja_on">ケイ</reading> <reading r_type="ja_on">エ</reading> <reading r_type="ja_kun">めぐ.む</reading> <reading r_type="ja_kun">めぐ.み</reading> <meaning>favor</meaning> <meaning>blessing</meaning> <meaning>grace</meaning> <meaning>kindness</meaning> </rmgroup> </reading_meaning> </character> </characters>

you run kanjilookup.xsl on xmlstarlet this:

xml tr kanjilookup.xsl -s character=恵 characters.xml

that'll produce html table looks (after pretty-printing):

<!doctype html> <html> <head> <meta http-equiv="content-type" content="text/html; charset=utf-8"> </head> <body> <table> <tbody> <caption>恵</caption> <tr> <td>stroke count</td> <td>10</td> </tr> <tr> <td>reading (ja_on)</td> <td>ケイ</td> </tr> <tr> <td>reading (ja_on)</td> <td>エ</td> </tr> <tr> <td>reading (ja_kun)</td> <td>めぐ.む</td> </tr> <tr> <td>reading (ja_kun)</td> <td>めぐ.み</td> </tr> <tr> <td>meaning</td> <td>favor</td> </tr> <tr> <td>meaning</td> <td>blessing</td> </tr> <tr> <td>meaning</td> <td>grace</td> </tr> <tr> <td>meaning</td> <td>kindness</td> </tr> </tbody> </table> </body> </html>

you'd have modify xslt stylesheets suit needs, of course.

xml bash xpath xmlstarlet

No comments:

Post a Comment