Tuesday, 15 June 2010

regex - How to remove the filename from the top of some text files -



regex - How to remove the filename from the top of some text files -

i trying utilize fdupes in mac osx remove duplicate text files directory. has removed bunch of duplicates.

the problem i'm having among ones remaining, many duplicates, except in 1 of files, filename first line, followed empty line, followed text.

so, i'd find files have filename repeated @ top, , strip , next blank line fdupes recognize them duplicates. allow me utilize fdupes parse them.

example:

file001.txt:

test 123 test

file002.001.txt:

file002.001.txt test 123 test

what's best way go this?

perhaps this

perl -0777 -pi -e 's#\q$argv\e$/{2}##' *.txt

$argv contains file name. $/ input record separator -- may need utilize \n or whatever line endings are. since $/ contains slash, alter delimiter of s/// else, in case #. \q ... \e escape there create meta characters in file names not screw up.

the -0777 switch makes perl read entire file @ once, can match multiple lines in 1 regex.

the -i switch in-place edit. may add together suffix maintain backup, may not practical when used on many files. however, recommend not utilize -i switch until goes want , print standard output.

regex perl text duplicates

No comments:

Post a Comment