nlp - R tm removeWords stopwords is not removing stopwords -
i'm using r tm package, , find none of tm_map
functions remove elements of text working me.
by 'working' mean example, i'll run:
d <- tm_map(d, removewords, stopwords('english'))
but when run
ddtm <- documenttermmatrix(d, command = list( weighting = weighttfidf, minwordlength = 2)) findfreqterms(ddtm, 10)
i still get:
[1]
...etc., , bunch of other stopwords.
i see no error indicating has gone wrong. know is, , how create stopword-removal function correctly, or diagnose what's going wrong me?
update
there error before didn't catch:
refreshing goe props... ---registering weka editors--- trying add together database driver (jdbc): rmijdbc.rjdriver - warning, not in classpath? trying add together database driver (jdbc): jdbc.idbdriver - warning, not in classpath? trying add together database driver (jdbc): org.gjt.mm.mysql.driver - warning, not in classpath? trying add together database driver (jdbc): com.mckoi.jdbcdriver - warning, not in classpath? trying add together database driver (jdbc): org.hsqldb.jdbcdriver - warning, not in classpath? [knowledgeflow] loading properties , plugins... [knowledgeflow] initializing kf...
it weka removing stopwords in tm, right? problem?
update 2
from this, error appears unrelated. it's db, not stopwords.
nevermind, working. did next minimum example:
data("crude") crude[[1]] j <- corpus(vectorsource(crude[[1]])) jj <- tm_map(j, removewords, stopwords('english')) jj[[1]]
i had used several tm_map
expressions in series. turned out, order had removed spaces, punctuation, etc, had concatenated new stopwords in.
r nlp stop-words tm
No comments:
Post a Comment