Tuesday, 15 July 2014

python - File output based on the contents of another file -



python - File output based on the contents of another file -

i have issue has file input , output in python (it's continuation question: how extract specific lines info file, has been solved now).

so have 1 big file, danish.train, , 11 little files (called danish.test.part-01 , on), each of them containing different selection of info danish.train file. now, each of 11 files, want create accompanying file complements them. means each little file, want create file contains contents of danish.train minus part in little file.

what i've come far this:

trainfile = open("danish.train") file_number in range(1,12): input = open('danish.test.part-%02d' % file_number, 'r') line in trainfile: if line not in input: open('danish.train.part-%02d' % file_number, 'a+') myfile: myfile.write(line)

the problem code gives output file_number 1, although have loop 1-11. if alter range, illustration in range(2,3), output danish.train.part-02, output contains re-create of whole danish.train without leaving out contents of file danish.test.part-02, wanted.

i suspect these issues may have me not understanding with... as operator, i'm not sure. help appreciated.

when open file, returns iterator through lines of file. nice, in lets go through file, 1 line @ time, without keeping whole file memory @ once. in case, leads problem, in need iterate through file multiple times.

instead, can read total training file memory, , go through multiple times:

with open("danish.train", 'r') f: train_lines = f.readlines() file_number in range(1, 12): open("danish.test.part-%02d" % file_number, 'r') f: test_lines = set(f) open("danish.train.part-%02d" % file_number, 'w') g: g.writelines(line line in train_lines if line not in test_lines)

i've simplified logic little bit, well. if don't care order of lines, consider reading training lines set, , utilize set operations instead of generator look used in final line.

python file-io

No comments:

Post a Comment