python - Batch (basename) file/folder renaming using an "index" -
renaming of files , folder in batch question asked after search think none similar mine.
background: send biological samples service provider returns files unique names , table in text format containing, amongst other information, file name , sample originated it:
head samples.txt fq_file sample_id sample_name library_id fc_number track_lanes_pos l2369_track-3885_r1.fastq.gz s1746_b_7_t b 7 t l2369_b_7_t 163 6 l2349_track-3865_r1.fastq.gz s1726_a_3_t 3 t l2349_a_3_t 163 5 l2354_track-3870_r1.fastq.gz s1731_a_gfp_c gfp c l2354_a_gfp_c 163 5 l2377_track-3893_r1.fastq.gz s1754_b_7_c b 7 c l2377_b_7_c 163 7 l2362_track-3878_r1.fastq.gz s1739_b_gfp_t b gfp t l2362_b_gfp_t 163 6
the directory construction (for 34 directories):
l2369_track-3885_ accepted_hits.bam deletions.bed junctions.bed logs accepted_hits.bam.bai insertions.bed left_kept_reads.info l2349_track-3865_ accepted_hits.bam deletions.bed junctions.bed logs accepted_hits.bam.bai insertions.bed left_kept_reads.info
goal: because file names meaningless , hard interpret, want rename files ending in .bam (keeping suffix) , folders correspondent sample name, re-ordered in more suitable manner. result should like:
7_t_b 7_t_b..bam deletions.bed junctions.bed logs 7_t_b.bam.bai insertions.bed left_kept_reads.info 3_t_a 3_t_a.bam deletions.bed junctions.bed logs accepted_hits.bam.bai insertions.bed left_kept_reads.info
i've hacked solution bash , python (newbie) feels over-engineered. question whether there more simple/elegant way of doing i've missed? solutions can in python, bash, , r. awk since trying larn it. beingness relative beginner create 1 complicate things.
this solution:
a wrapper puts in place , gives thought of workflow:
#! /bin/bash # select columns of involvement , write them file - basenames tail -n +2 samples.txt | cutting -d$'\t' -f1,3 >> bamfilames.txt # phone call little python script creates new .sh renaming commmands ./renamebamfiles.py # renaming ./renamebam.sh # , folders ./renamebamfolder.sh
renamebamfiles.py:
#! /usr/bin/env python import re # read in info sample file , create bash file remane tophat output # reanaming follows: # mv l2377_track-3893_r1_ l2377_track-3893_r1_srsf7_cyto_b # # set input file name # (the programme must run within directory # contains info file) infilename = 'bamfilames.txt' ### rename bam files # open input file reading infile = open(infilename, 'r') # open output file writing outfilename= 'renamebam.sh' outfile=open(outfilename,'a') # can append instead 'a' outfile.write("#! /bin/bash"+"\n") outfile.write(" "+"\n") # loop through each line in file line in infile: ## remove line ending characters line=line.strip('\n') ## separate line list of tab-delimited components elementlist=line.split('\t') # separate folder string experimental name fileroot=elementlist[1] fileroot=fileroot.split() # create variable names using regex foldername=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', elementlist[0]) foldername=foldername.strip('\n') filename = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0]) command= "for file in %s/accepted_hits.*; mv $file ${file/accepted_hits/%s}; done" % (foldername, filename) print command outfile.write(command+"\n") # after loop completed, close files infile.close() outfile.close() ### rename folders # open input file reading infile = open(infilename, 'r') # open output file writing outfilename= 'renamebamfolder.sh' outfile=open(outfilename,'w') outfile.write("#! /bin/bash"+"\n") outfile.write(" "+"\n") # loop through each line in file line in infile: ## remove line ending characters line=line.strip('\n') ## separate line list of tab-delimited components elementlist=line.split('\t') # separate folder string experimental name fileroot=elementlist[1] fileroot=fileroot.split() # create variable names using regex foldername=re.sub(r'^(.*)(\_)(\w+).*', r'\1\2\3\2', elementlist[0]) foldername=foldername.strip('\n') filename = "%s_%s_%s" % (fileroot[1], fileroot[2], fileroot[0]) command= "mv %s %s" % (foldername, filename) print command outfile.write(command+"\n") # after loop completed, close files infile.close() outfile.close()
renamebam.sh - created previous python script:
#! /bin/bash file in l2369_track-3885_r1_/accepted_hits.*; mv $file ${file/accepted_hits/7_t_b}; done file in l2349_track-3865_r1_/accepted_hits.*; mv $file ${file/accepted_hits/3_t_a}; done file in l2354_track-3870_r1_/accepted_hits.*; mv $file ${file/accepted_hits/gfp_c_a}; done (..)
rename renamebamfolder.sh similar:
mv l2369_track-3885_r1_ 7_t_b mv l2349_track-3865_r1_ 3_t_a mv l2354_track-3870_r1_ gfp_c_a mv l2377_track-3893_r1_ 7_c_b
since learning, sense examples of different ways of doing this, , thinking how it, useful.
one simple way in bash:
find . -type d -print | while ifs= read -r oldpath; parent=$(dirname "$oldpath") old=$(basename "$oldpath") new=$(awk -v old="$old" '$1~"^"old{print $4"_"$5"_"$3}' samples.txt) if [ -n "$new" ]; newpath="${parent}/${new}" echo mv "$oldpath" "$newpath" echo mv "${newpath}/accepted_hits.bam" "${newpath}/${new}.bam" fi done
remove "echo"s after initial testing "mv"s.
if of target directories @ 1 level @triplee's reply implies, it's simpler. cd parent directory , do:
awk 'nr>1{sub(/[^_]+$/,"",$1); print $1" "$4"_"$5"_"$3}' samples.txt | while read -r old new; echo mv "$old" "$new" echo mv "${new}/accepted_hits.bam" "${new}/${new}.bam" done
in 1 of expected outputs renamed ".bai" file, in other didn't , didn't if want or not. if want rename add
echo mv "${new}/accepted_hits.bam.bai" "${new}/${new}.bam.bai"
to whatever solution above prefer.
python r bash awk
No comments:
Post a Comment