regex - How to remove the filename from the top of some text files -
i trying utilize fdupes in mac osx remove duplicate text files directory. has removed bunch of duplicates.
the problem i'm having among ones remaining, many duplicates, except in 1 of files, filename first line, followed empty line, followed text.
so, i'd find files have filename repeated @ top, , strip , next blank line fdupes recognize them duplicates. allow me utilize fdupes parse them.
example:
file001.txt:
test 123 test
file002.001.txt:
file002.001.txt test 123 test
what's best way go this?
perhaps this
perl -0777 -pi -e 's#\q$argv\e$/{2}##' *.txt
$argv
contains file name. $/
input record separator -- may need utilize \n
or whatever line endings are. since $/
contains slash, alter delimiter of s///
else, in case #
. \q ... \e
escape there create meta characters in file names not screw up.
the -0777
switch makes perl read entire file @ once, can match multiple lines in 1 regex.
the -i
switch in-place edit. may add together suffix maintain backup, may not practical when used on many files. however, recommend not utilize -i
switch until goes want , print standard output.
regex perl text duplicates
No comments:
Post a Comment