machine learning - Feature extraction from a single word -
usually 1 wants feature text using handbag of words approach, counting words , calculate different measures, illustration tf-idf values, this: how include words numerical feature in classification
but problem different, want extract feature vector single word. want know illustration potatoes , french fries close each other in vector space, since both made of potatoes. want know milk , cream close, hot , warm, stone , hard , on.
what problem called? can larn similarities , features of words looking @ big number documents?
i not create implementation in english, can't utilize databases.
hmm,feature extraction (e.g. tf-idf) on text info based on statistics. on other hand, looking sense (semantics). hence no such method tf-idef work you.
in nlp exists 3 basic levels:
morphological analyses syntactic analyses semantic analyses(higher number represents bigger problems :)). morphology known bulk languages. syntactic analyses bigger problem (it deals things verb, noun in sentence,...). semantic analyses has challenges, since deals meaning quite hard represent in machines, have many exceptions , language-specific.
as far understand want know relationships between words, can done via so-called dependency tree banks, (or treebank): http://en.wikipedia.org/wiki/treebank . database/graph of sentences word can considered node , relationship arc. there treebank czech language , english language there some, many 'less-covered' languages can problem find 1 ...
machine-learning nlp feature-extraction
No comments:
Post a Comment