Monday, 15 September 2014

machine learning - Feature extraction from a single word -



machine learning - Feature extraction from a single word -

usually 1 wants feature text using handbag of words approach, counting words , calculate different measures, illustration tf-idf values, this: how include words numerical feature in classification

but problem different, want extract feature vector single word. want know illustration potatoes , french fries close each other in vector space, since both made of potatoes. want know milk , cream close, hot , warm, stone , hard , on.

what problem called? can larn similarities , features of words looking @ big number documents?

i not create implementation in english, can't utilize databases.

hmm,feature extraction (e.g. tf-idf) on text info based on statistics. on other hand, looking sense (semantics). hence no such method tf-idef work you.

in nlp exists 3 basic levels:

morphological analyses syntactic analyses semantic analyses

(higher number represents bigger problems :)). morphology known bulk languages. syntactic analyses bigger problem (it deals things verb, noun in sentence,...). semantic analyses has challenges, since deals meaning quite hard represent in machines, have many exceptions , language-specific.

as far understand want know relationships between words, can done via so-called dependency tree banks, (or treebank): http://en.wikipedia.org/wiki/treebank . database/graph of sentences word can considered node , relationship arc. there treebank czech language , english language there some, many 'less-covered' languages can problem find 1 ...

machine-learning nlp feature-extraction

No comments:

Post a Comment