我想在 txt 文件中搜索不包括 [p] 和比较中的扩展名的重复行。确定相等的线后,仅显示不包含 [p] 及其扩展名的线。我在测试中有以下行.txt:
Peliculas/Desperados (2020)[p].mp4
Peliculas/La Duquesa (2008)[p].mp4
Peliculas/Nueva York Año 2012 (1975).mkv
Peliculas/Acoso en la noche (1980) .mkv
Peliculas/Angustia a Flor de Piel (1982).mkv
Peliculas/Desperados (2020).mkv
Peliculas/Angustia (1947).mkv
Peliculas/Días de radio (1987) BR1080[p].mp4
Peliculas/Mona Lisa (1986) BR1080[p].mp4
Peliculas/La decente (1970) FlixOle WEB-DL 1080p [Buzz][p].mp4
Peliculas/Mona Lisa (1986) BR1080.mkv
在此文件中,第 1-6 行和第 9-11 行是相同的(没有 ext 和 [p](。所需输出:
Peliculas/Desperados (2020).mkv
Peliculas/Mona Lisa (1986) BR1080.mkv
我尝试这样做,但只显示相同的行删除扩展和模式 [P],但我不知道正确的行,我需要整行完成
sed 's/[p]//' ./test.txt | sed 's.[^.]*$//' | sort | uniq -d
错误输出(缺少扩展名(:
Peliculas/Desperados (2020)
Peliculas/Mona Lisa (1986) BR1080
因为你提到了bash...
删除任何带有p
的行:
cat test.txt | grep -v p
home/folder/house from earth.mkv
home/folder3/window 1.avi
删除任何带有[p]
的行:
cat test.txt | grep -v '[p]'
home/folder/house from earth.mkv
home/folder3/window 1.avi
home/folder4/little mouse.mpg
不太可能是您的需求,而只是因为: 从每行中删除[p]
,然后重复数据删除:
cat test.txt | sed 's/[p]//g' | sort | uniq
home/folder/house from earth.mkv
home/folder/house from earth.mp4
home/folder2/test.mp4
home/folder3/window 1.avi
home/folder3/window 1.mp4
home/folder4/little mouse.mpg
如果 2 遍解决方案(读取test.txt
文件两次(是可以接受的,请您尝试:
declare -A ary # associate the filename with the base
while IFS= read -r file; do
if [[ $file != *[p]* ]]; then # the filename does not include "[p]"
base="${file%.*}" # remove the extension
ary[$base]="$file" # create a map
fi
done < test.txt
while IFS= read -r base; do
echo "${ary[$base]}"
done < <(sed 's/[p]//' ./test.txt | sed 's/.[^.]*$//' | sort | uniq -d)
输出:
Peliculas/Desperados (2020).mkv
Peliculas/Mona Lisa (1986) BR1080.mkv
- 在第 1 遍中,它逐行读取文件以创建一个映射,该映射将文件名(带扩展名(与基号(不带扩展名(相关联。
- 在第二遍中,它将输出(基(替换为文件名。
如果您更喜欢 1 次通过解决方案(会更快(,请尝试:
declare -A ary # associate the filename with the base
declare -A count # count the occurrences of the base
while IFS= read -r file; do
base="${file%.*}" # remove the extension
if [[ $base =~ (.*)[p](.*) ]]; then
# "$base" contains the substring "[p]"
(( count[${BASH_REMATCH[1]}${BASH_REMATCH[2]}]++ ))
# increment the counter
else
(( count[$base]++ )) # increment the counter
ary[$base]="$file" # map the filename
fi
done < test.txt
for base in "${!ary[@]}"; do # loop over the keys of ${ary[@]}
if (( count[$base] > 1 )); then
# it duplicates
echo "${ary[$base]}"
fi
done
在 Python 中,您可以将itertools.groupby
与函数一起使用,该函数生成一个键,该键由文件名组成,没有任何[p]
,并且删除了扩展名。
对于大小为 2 或更大的任何组,将打印不包含"[p]"的任何文件名。
import itertools
import re
def make_key(line):
return re.sub(r'.[^.]*$', '', line.replace('[p]', ''))
with open('test.txt') as f:
lines = [line.strip() for line in f]
for key, group in itertools.groupby(lines, make_key):
files = [file for file in group]
if len(files) > 1:
for file in files:
if '[p]' not in file:
print(file)
这给出了:
home/folder/house from earth.mkv
home/folder3/window 1.avi