Wednesday, 15 February 2012

nlp - How do I extract contents from a koRpus object in R? -



nlp - How do I extract contents from a koRpus object in R? -

i'm using tm package, , looking flesch-kincaid scores document using r. found korpus bundle has lot of metrics including reading-level, , started using that. however, object returned seems complicated s4 object don't understand how parse.

so, apply corpus:

txt <- system.file("texts", "txt", bundle = "tm") (d <- corpus(dirsource(txt, encoding = "utf-8"), readercontrol = list(language = "lat"))) f <- function(x) tokenize(x, format="obj", lang='en') g <- function(x) flesch.kincaid(x) x <- foreach(i=1:5) %dopar% g(f(d[[i]]))

x vector of flesch.kincaid applied ovid.

> x[[1]] flesch-kincaid grade level parameters: default grade: 13.62 age: 18.62 text language: en

how can homecoming values grade=13.62, , age=18.62? str(x) big it's hard parse, ie:

> str(x[[1]]) formal class 'krp.readability' [package "korpus"] 49 slots ..@ hyphen :formal class 'krp.hyphen' [package "korpus"] 3 slots .. .. ..@ lang : chr "en" .. .. ..@ desc :list of 5 .. .. .. ..$ num.syll : num 196 .. .. .. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ... .. .. .. .. ..- attr(*, "dimnames")=list of 2 .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" .. .. .. ..$ syll.uniq.distrib: num [1:6, 1:4] 15 15 61 19.7 19.7 ... .. .. .. .. ..- attr(*, "dimnames")=list of 2 .. .. .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... .. .. .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" .. .. .. ..$ avg.syll.word : num 2.18 .. .. .. ..$ syll.per100 : num 218 .. .. ..@ hyphen:'data.frame': 90 obs. of 2 variables: .. .. .. ..$ syll: num [1:90] 1 1 1 1 2 3 1 2 3 1 ... .. .. .. ..$ word: chr [1:90] "si" "quis" "in" "hoc" ... ..@ param :list of 1 .. ..$ flesch.kincaid: named num [1:3] 0.39 11.8 15.59 .. .. ..- attr(*, "names")= chr [1:3] "asl" "asw" "const" ..@ ari :list of 1 .. ..$ : logi na ..@ ari.nri :list of 1 .. ..$ : logi na ..@ ari.simple :list of 1 .. ..$ : logi na ..@ bormuth :list of 1 .. ..$ : logi na ..@ coleman :list of 1 .. ..$ : logi na ..@ coleman.liau :list of 1 .. ..$ : logi na ..@ dale.chall :list of 1 .. ..$ : logi na ..@ dale.chall.psk :list of 1 .. ..$ : logi na ..@ dale.chall.old :list of 1 .. ..$ : logi na ..@ danielson.bryan :list of 1 .. ..$ : logi na ..@ dickes.steiwer :list of 1 .. ..$ : logi na ..@ drp :list of 1 .. ..$ : logi na ..@ elf :list of 1 .. ..$ : logi na ..@ flesch :list of 1 .. ..$ : logi na ..@ flesch.psk :list of 1 .. ..$ : logi na ..@ flesch.de :list of 1 .. ..$ : logi na ..@ flesch.es :list of 1 .. ..$ : logi na ..@ flesch.fr :list of 1 .. ..$ : logi na ..@ flesch.nl :list of 1 .. ..$ : logi na ..@ flesch.kincaid :list of 3 .. ..$ flavour: chr "default" .. ..$ grade : num 13.6 .. ..$ age : num 18.6 ..@ farr.jenkins.paterson :list of 1 .. ..$ : logi na ..@ farr.jenkins.paterson.psk:list of 1 .. ..$ : logi na ..@ fog :list of 1 .. ..$ : logi na ..@ fog.psk :list of 1 .. ..$ : logi na ..@ fog.nri :list of 1 .. ..$ : logi na ..@ forcast :list of 1 .. ..$ : logi na ..@ forcast.rgl :list of 1 .. ..$ : logi na ..@ fucks :list of 1 .. ..$ : logi na ..@ harris.jacobson :list of 1 .. ..$ : logi na ..@ linsear.write :list of 1 .. ..$ : logi na ..@ lix :list of 1 .. ..$ : logi na ..@ rix :list of 1 .. ..$ : logi na ..@ smog :list of 1 .. ..$ : logi na ..@ smog.de :list of 1 .. ..$ : logi na ..@ smog.c :list of 1 .. ..$ : logi na ..@ smog.simple :list of 1 .. ..$ : logi na ..@ spache :list of 1 .. ..$ : logi na ..@ spache.old :list of 1 .. ..$ : logi na ..@ strain :list of 1 .. ..$ : logi na ..@ traenkle.bailer :list of 1 .. ..$ : logi na ..@ tri :list of 1 .. ..$ : logi na ..@ wheeler.smith :list of 1 .. ..$ : logi na ..@ wheeler.smith.de :list of 1 .. ..$ : logi na ..@ wiener.stf :list of 1 .. ..$ : logi na ..@ lang : chr "en" ..@ desc :list of 26 .. ..$ sentences : int 10 .. ..$ words : int 90 .. ..$ letters : named num [1:12] 492 0 8 9 14 18 14 9 10 6 ... .. .. ..- attr(*, "names")= chr [1:12] "all" "l1" "l2" "l3" ... .. ..$ all.chars : int 692 .. ..$ syllables : named num [1:5] 196 25 32 25 8 .. .. ..- attr(*, "names")= chr [1:5] "all" "s1" "s2" "s3" ... .. ..$ lttr.distrib : num [1:6, 1:11] 0 0 90 0 0 ... .. .. ..- attr(*, "dimnames")=list of 2 .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... .. .. .. ..$ : chr [1:11] "1" "2" "3" "4" ... .. ..$ syll.distrib : num [1:6, 1:4] 25 25 65 27.8 27.8 ... .. .. ..- attr(*, "dimnames")=list of 2 .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" .. ..$ syll.uniq.distrib : num [1:6, 1:4] 15 15 61 19.7 19.7 ... .. .. ..- attr(*, "dimnames")=list of 2 .. .. .. ..$ : chr [1:6] "num" "cum.sum" "cum.inv" "pct" ... .. .. .. ..$ : chr [1:4] "1" "2" "3" "4" .. ..$ punct : int 17 .. ..$ conjunctions : int 0 .. ..$ prepositions : int 0 .. ..$ pronouns : int 0 .. ..$ foreign : int 0 .. ..$ ttr : num 0.844 .. ..$ avg.sentc.length : num 9 .. ..$ avg.word.length : num 5.47 .. ..$ avg.syll.word : num 2.18 .. ..$ sntc.per.word : num 0.111 .. ..$ sntc.per100 : num 11.1 .. ..$ lett.per100 : num 547 .. ..$ syll.per100 : num 218 .. ..$ fog.hard.words : null .. ..$ bormuth.nol : null .. ..$ dale.chall.nol : null .. ..$ harris.jacobson.nol: null .. ..$ spache.nol : null ..@ tt.res :'data.frame': 107 obs. of 6 variables: .. ..$ token : chr [1:107] "si" "quis" "in" "hoc" ... .. ..$ tag : chr [1:107] "word.krp" "word.krp" "word.krp" "word.krp" ... .. ..$ lemma : chr [1:107] "" "" "" "" ... .. ..$ lttr : num [1:107] 2 4 2 3 5 6 3 5 6 1 ... .. ..$ wclass: chr [1:107] "word" "word" "word" "word" ... .. ..$ desc : chr [1:107] "word (krp internal)" "word (krp internal)" "word (krp internal)" "word (krp internal)" ...

i'd ideally assign f-k score meta(d) in tm.

i'd appreciate learning either how understand homecoming object , take out values, also, if there's another, better, faster way f-k score, i'm ears!

similar @paul reply 1 liner solution

sapply(lapply(x,slot,'flesch.kincaid'),'[',c('age','grade')) [,1] [,2] [,3] [,4] [,5] age 18.61778 17.62351 17.77699 18.29032 18.645 grade 13.61778 12.62351 12.77699 13.29032 13.645

r nlp s4 tm

No comments:

Post a Comment