递归地在目录中搜索 IBMi IFS 上目录中的每个文件

我正在尝试编写两个（编辑：shell）脚本，但遇到了一些困难。我将解释目的，然后提供脚本和当前输出。

1：递归获取目录中每个文件名的列表。然后在该目录中所有文件的内容中搜索每个文件名。应返回每次出现的特定文件名的路径、文件名和行号。

2：递归获取目录中每个文件名的列表。然后在目录中所有文件的内容中搜索每个文件名。应返回在目录中的任何文件中都找不到的每个文件的路径和文件名。

我最终想使用脚本 2 来查找和删除（实际上将它们移动到另一个目录进行存档）网站中未使用的文件。然后，我想使用脚本 1 查看每个匹配项并过滤任何重复的文件名。

我知道我可以让脚本 2 在运行时移动每个文件，而不是作为第二步，但我想在执行任何操作之前确认脚本功能正确。我会在确认它正常运行后对其进行修改。

我目前正在 strqsh 的 IMBi 系统上测试这个。

我的测试文件夹结构为：

scriptTest
---subDir1
------file4.txt
------file5.txt
------file6.txt
---subDir2
------file1.txt
------file7.txt
------file8.txt
------file9.txt
---file1.txt
---file2.txt
---file3.txt

我在其中一些文件中有包含现有文件名的文本。

这是我当前的脚本 1：

#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d -exec basename {} ;`
for i in $files
do
    grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;
done

现在它运行正常，但提供匹配的文件的路径除外。默认情况下 grep 不返回文件路径吗？

我对脚本 2 有点远：

#!/bin/bash
files=`find /www/Test/htdocs/DLTest/scriptTest/ ! -type d`
for i in $files
do
    #split $i on '/' and store into an array
    IFS='/' read -a array <<< "$i"
    #get last element of the array 
    echo "${array[-1]}"
    #perform a grep similar to script 2 and store it into a variable
    filename="grep -rin $i "/www/Test/htdocs/DLTest/scriptTest" >> testReport.txt;"
    #Check if the variable has anything in it
    if [ $filename = "" ]   
            #if not then output $i for the full path of the current needle.
        then echo $i;
    fi
done

我不知道如何将字符串$i拆分为数组。我在第 6 行不断收到错误

001-0059 Syntax error on line 6: token redirection not expected.

我

打算在实际的 linux 发行版上尝试一下，看看我是否得到不同的结果。

我很欣赏任何先进的见解。

简介

这不是一个完整的解决方案，因为我不是 100% 确定我理解你想做什么。但是，以下内容包含的解决方案片段，您可以将这些解决方案拼接在一起以执行所需的操作。

创建测试工具

cd /tmp
mkdir -p scriptTest/subDir{1,2}
mkdir -p scriptTest/subDir1/file{4,5,6}.txt
mkdir -p scriptTest/subDir2/file{1,8,8}.txt
touch scriptTest/file{1,2,3}.txt

查找和删除重复项

在最一般的意义上，您可以使用 find 的 -exec 标志或 Bash 循环来运行 grep 或其他比较文件。但是，如果您尝试做的只是删除重复项，那么您可能最好使用 fdupes 或 duff 实用程序来识别（并选择性地删除）具有重复内容的文件。

例如，假设测试语料库中的所有.txt文件都是零长度重复文件，请考虑以下 duff 和 fdupes 示例

达夫

Duff有更多选择，但不会直接为您删除文件。您可能需要使用 duff -e0 * | xargs -0 rm 之类的命令来删除重复项。要使用默认比较查找重复项：

$ duff -r scriptTest/
8 files in cluster 1 (0 bytes, digest da39a3ee5e6b4b0d3255bfef95601890afd80709)
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt

弗杜普斯

此实用程序提供了以各种方式直接删除重复项的功能。一种方法是在确信已准备好继续操作后调用fdupes . --delete --noprompt。但是，要查找重复项列表，请执行以下操作：

$ fdupes -R scriptTest/
scriptTest/subDir1/file4.txt            
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt

获取所有文件的列表，包括非重复文件

$ find scriptTest -name *.txt
scriptTest/file1.txt
scriptTest/file2.txt
scriptTest/file3.txt
scriptTest/subDir1/file4.txt
scriptTest/subDir1/file5.txt
scriptTest/subDir1/file6.txt
scriptTest/subDir2/file1.txt
scriptTest/subDir2/file8.txt

然后，您可以使用查找的-exec {} +功能对每个文件进行操作，或者仅使用支持 --recursive --files-with-matches 标志的 grep 来查找具有匹配内容的文件。

将查找结果作为数组传递给 Bash 循环

或者，如果您确定文件名中没有空格，也可以使用 Bash 数组将文件存储到可以在 Bash for 循环中迭代的变量中。例如：

files=$(find scriptTest -name *.txt)
for file in "${files[@]}"; do
  : # do something with each "$file"
done

像这样的循环通常较慢，但如果您正在做一些复杂的事情，可能会为您提供所需的额外灵活性。扬子晚报.

简介

创建测试工具

查找和删除重复项

达夫

弗杜普斯

获取所有文件的列表，包括非重复文件

将查找结果作为数组传递给 Bash 循环

相关内容

最新更新

热门标签：