我在StackExchange上读了很多帖子,但找不到我需要的确切内容。注意:这不仅仅是为了删除重复项。我需要浏览File1.csv并创建一个新文件-Results.csv,其中包含的每一行都不包含File2.txt中的一行。
File1.csv包含个人详细信息和电子邮件地址,每行1个:
"mr","Happy","Man","mrhappy@example.com"
"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"
File2.txt包含电子邮件地址,每行1个:
mrhappy@example.com
mrsomeoneelse@example.com
mrsomeoneelse2@example.com
预期结果:Results.csv应包含:
"mr","Sad","Man","mrsad@example.com"
"mr","Grumpy","Man","mrgrumpy@example.com"
"mr","Strong","Man","mrstrong@example.com"
令人困惑的是,当File2.txt包含一行时,我的代码可以正常工作。但当它包含多行时,Results.txt包含File1.csv中的所有行(包括应该删除的行),并多次重复这些行(与File2.txt中的行一样多)。
我的代码:
<?php
$to_be_searched = "File1.csv";
$items_to_catch = file("File2.txt");
// create empty array to store lines we want to keep - i.e. lines that dont contain emails we're checking for
$good_lines = array();
// open $to_be_searched
$handle = fopen($to_be_searched, "r");
if ($handle) {
// go line by line until end of file
while (($line = fgets($handle)) !== false) {
// check if line contains any items from $items_to_catch
foreach($items_to_catch as $key => $value) {
if(strpos($line, $value) === false) {
// email wasn't found on the line so we want this line in the results file, therefore add to $good_lines array
$good_lines[] = $line;
}
}
}
fclose($handle);
} else {
echo "Couldn't open " . $to_be_searched;
exit();
}
// write $array_of_good_lines into new file
$new_file = "Results.csv";
foreach($good_lines as $key => $value) {
file_put_contents($new_file, $value, FILE_APPEND | LOCK_EX);
}
?>
我做错了什么?
它目前不起作用,因为在foreach中,您将多次向$good_lines
添加同一行。
要解决此问题,可以在循环中添加一个标志变量。
while (($line = fgets($handle)) !== false) {
// Declare our flag variable as false by default
$found = false;
// Loop through each item to see if the email has been found
foreach($items_to_catch as $key => $value) {
// If the email was found, stop looping in the second file
if(strpos($line, $value) !== false){
$found = true;
break;
}
}
// If the email was not found in the second file, add it to the good_lines array
if(!$found)
$good_lines[] = $line;
}
更新
除了循环之外,在读取File2.txt
时还有另一个问题,因为它将换行符添加到字符串中,因此,当稍后将字符串与strpos
进行比较时,它不起作用。解决方法:
$items_to_catch = file("File2.txt", FILE_IGNORE_NEW_LINES);
这是$items_to_catch的var_dump,不带标志:
array (size=3)
0 => string 'mrhappy@example.com
' (length=20)
1 => string 'mrsomeoneelse@example.com
' (length=26)
2 => string 'mrsomeoneelse2@example.com
' (length=27)
这是$items_to_catch的var_dump,其标志为:
array (size=3)
0 => string 'mrhappy@example.com' (length=19)
1 => string 'mrsomeoneelse@example.com' (length=25)
2 => string 'mrsomeoneelse2@example.com' (length=26)
请注意每个电子邮件中的额外字符,即换行符。
file()
返回文件的每一行,包括结束的终端行。如果你使用Symfony的VarDumper组件查看$items_to_catch
,你会发现它看起来像:
array:3 [
0 => "mrhappy@example.comn"
1 => "mrsomeoneelse@example.comn"
2 => "mrsomeoneelse2@example.comn"
]
这不是您想要的,因为您稍后的比较不包括终点线结束。顺便说一句,Symfony的VarDumper组件比print_r
和var_dump
好几个数量级:我强烈建议将其组合到您的项目中。
因此,用修剪掉新的终端线路
$items_to_catch = array_map('trim', file('File2.txt'));
一个最小的工作示例:
$excludedLinesWithTheseEmails = array_map('trim', file('File2.txt'));
$out = fopen('Results.csv', 'w') or die('Cannot open Results.csv');
$in = fopen('File1.csv', 'r') or die('Cannot open File1.csv');
while (false !== ($row = fgetcsv($in))) {
if (! in_array($row[3], $excludedLinesWithTheseEmails)) {
fputcsv($out, $row);
}
}
fclose($out);
fclose($in);