Python - Finding word frequencies of list of words in text file -
i trying speed project count word frequencies. have 360+ text files, , need total number of words , number of times each word list of words appears. know how single text file.
>>> import nltk >>> import os >>> os.chdir("c:\users\cameron\desktop\pdf-to-txt") >>> filename="1976.03.txt" >>> textfile=open(filename,"r") >>> inputstring=textfile.read() >>> word_list=re.split('\s+',file(filename).read().lower()) >>> print 'words in text:', len(word_list) #spits out number of words in textfile >>> word_list.count('inflation') #spits out number of times 'inflation' occurs in textfile >>>word_list.count('jobs') >>>word_list.count('output')
its tedious frequencies of 'inflation', 'jobs', 'output' individual. can set these words list , find frequency of words in list @ same time? this python.
example: instead of this:
>>> word_list.count('inflation') 3 >>> word_list.count('jobs') 5 >>> word_list.count('output') 1
i want (i know isn't real code, i'm asking help on):
>>> list1='inflation', 'jobs', 'output' >>>word_list.count(list1) 'inflation', 'jobs', 'output' 3, 5, 1
my list of words going have 10-20 terms, need able point python toward list of words counts of. nice if output able copy+paste excel spreadsheet words columns , frequencies rows
example:
inflation, jobs, output 3, 5, 1
and finally, can help automate of textfiles? figure point python toward folder , can above word counting new list each of 360+ text files. seems easy enough, i'm bit stuck. help?
an output fantastic: filename1 inflation, jobs, output 3, 5, 1
filename2 inflation, jobs, output 7, 2, 4 filename3 inflation, jobs, output 9, 3, 5
thanks!
collections.counter() has covered if understand problem.
the illustration docs seem match problem.
# tally occurrences of words in list cnt = counter() word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: cnt[word] += 1 print cnt # find 10 mutual words in hamlet import re words = re.findall('\w+', open('hamlet.txt').read().lower()) counter(words).most_common(10)
from illustration above should able do:
import re import collections words = re.findall('\w+', open('1976.03.txt').read().lower()) print collections.counter(words)
edit naive approach show 1 way.
wanted = "fish chips steak" cnt = counter() words = re.findall('\w+', open('1976.03.txt').read().lower()) word in words: if word in wanted: cnt[word] += 1 print cnt
python text frequency
No comments:
Post a Comment