Tuesday, 15 July 2014

nlp - R tm removeWords stopwords is not removing stopwords -



nlp - R tm removeWords stopwords is not removing stopwords -

i'm using r tm package, , find none of tm_map functions remove elements of text working me.

by 'working' mean example, i'll run:

d <- tm_map(d, removewords, stopwords('english'))

but when run

ddtm <- documenttermmatrix(d, command = list( weighting = weighttfidf, minwordlength = 2)) findfreqterms(ddtm, 10)

i still get:

[1]

...etc., , bunch of other stopwords.

i see no error indicating has gone wrong. know is, , how create stopword-removal function correctly, or diagnose what's going wrong me?

update

there error before didn't catch:

refreshing goe props... ---registering weka editors--- trying add together database driver (jdbc): rmijdbc.rjdriver - warning, not in classpath? trying add together database driver (jdbc): jdbc.idbdriver - warning, not in classpath? trying add together database driver (jdbc): org.gjt.mm.mysql.driver - warning, not in classpath? trying add together database driver (jdbc): com.mckoi.jdbcdriver - warning, not in classpath? trying add together database driver (jdbc): org.hsqldb.jdbcdriver - warning, not in classpath? [knowledgeflow] loading properties , plugins... [knowledgeflow] initializing kf...

it weka removing stopwords in tm, right? problem?

update 2

from this, error appears unrelated. it's db, not stopwords.

nevermind, working. did next minimum example:

data("crude") crude[[1]] j <- corpus(vectorsource(crude[[1]])) jj <- tm_map(j, removewords, stopwords('english')) jj[[1]]

i had used several tm_map expressions in series. turned out, order had removed spaces, punctuation, etc, had concatenated new stopwords in.

r nlp stop-words tm

No comments:

Post a Comment