Java 大文件排序器



我正在为我的班级做作业,我有一个 7 Mb 的文件。阶段 1:我将文件中的每个单词添加到数组列表中,并按字母顺序排序。然后,我将每 100,000 个单词添加到 1 个文件中,因此我总共有 12 个文件,其命名约定如下代码所示。

第 2 阶段:对于每 2 个文件,我从每个文件中读取一行,然后按字母顺序将哪个文件先写到一个新文件中(基本上是排序(,直到我最终将 2 个文件合并为 1 个排序的文件。我循环这样做,以便在排序时每次文件数量减半,因此基本上我将 7 MB 全部分类到一个文件中。

我遇到问题:对于第 2 阶段,我成功读取了第 1 阶段,但似乎我的文件都被重复复制到多个文件中,而不是排序和合并。我感谢任何帮助,谢谢。

文件:似乎我无法上传.txt文件,但代码应该可以工作,以便可以合并任何行数的文件,只需要更改行数变量。

摘要:1个大文件未排序,变成多个排序文件(即12个(,第一次排序和合并将其变成6个文件,第二次排序和合并将其变成3个文件,第三次合并将其变成2个文件,第四次合并将其再次变成1个文件大文件。 法典:

package Assignment11;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Scanner;
public class FileSorter_1 
{
public static ArrayList<String> storyline = new ArrayList<String>();
public static int num_lines = 100000; //this number can be changed
public static int num_files_initial;
public static int num_files_sec;
public static void main(String[] args) throws IOException 
{
phase1();
phase2();
}
public static void phase1() throws IOException   
{
Scanner story = new Scanner(new File("Aesop_Shakespeare_Shelley_Twain.txt")); //file name
int f = 0;
while(story.hasNext()) 
{
int i = 0;
while(story.hasNext())
{
String temp = story.next(); 
storyline.add(temp);
i++;
if(i > num_lines) 
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
for(int x = 0; x<num_lines;x++) 
{
write2file.write(storyline.get(x)); 
write2file.newLine(); 
}
write2file.close();
f++;
}
num_files_initial = f;
}
public static void phase2() throws IOException 
{
int file_n = 1;
int prev_fn = 0;
int t = 0;
int g = 0;
while(g<5) 
{
System.out.println(num_files_initial);
if(t+1 > num_files_initial-1)
{
if(num_files_initial % 2 != 0)
{
BufferedWriter w = new BufferedWriter(new FileWriter("temp_"+file_n +"_" + g + ".txt"));
Scanner file1 = new Scanner(new File("temp_"+prev_fn +"_" + t + ".txt"));
String word1 = file1.next();
while(file1.hasNext())
{
w.write(word1);
w.newLine();    
}
g++;
break;
}
num_files_initial = num_files_initial / 2 + num_files_initial % 2;
g = 0;
t = 0;
file_n++;
prev_fn++;
}
String s1="temp_"+file_n +"_" + g + ".txt";
String s2="temp_"+prev_fn +"_" + t + ".txt";
String s3="temp_"+prev_fn +"_" + (t+1) + ".txt";
System.out.println(s2);
System.out.println(s3);
BufferedWriter w = new BufferedWriter(new FileWriter(s1));
Scanner file1 = new Scanner(new File(s2));
Scanner file2 = new Scanner(new File(s3));
String word1 = file1.next();
String word2 = file2.next();
System.out.println(num_files_initial);
//System.out.println(t);
//System.out.println(g);
while(file1.hasNext() && file2.hasNext())
{
if(word1.compareTo(word2) == 1) //if word 1 comes first = 1
{
w.write(word1);
w.newLine();
file1.next();
}
if(word1.compareTo(word2) == 0) //if word 1 comes second = 0
{
w.write(word2);
w.newLine();
file2.next();
}
}   
while(file1.hasNext()) 
{
w.write(word1);
w.newLine();
break;
}
while(file2.hasNext()) 
{
w.write(word2);
w.newLine();
break;
}
g++;
t+=2;
w.close();
file1.close();
file2.close();
}

}
}

将数据写入新文件后,您不会清除现有的排序数组,这就是将其复制到新文件中的原因。以下是一些修复:

...
int f = 0;
while(story.hasNext()) 
{
// initilize the array here.
storyline = new ArrayList<>();
int i = 0;
while(story.hasNext())
{
String temp = story.next(); 
storyline.add(temp);
i++;
if(i > num_lines) 
{
break;
}
}
Collections.sort(storyline, String.CASE_INSENSITIVE_ORDER);
BufferedWriter write2file = new BufferedWriter(new FileWriter("temp_0_" + f + ".txt")); //initialze new file
// instead of num_lines use i
for(int x = 0; x<i;x++) 
{
write2file.write(storyline.get(x)); 
write2file.newLine(); 
}
write2file.close();
f++;
}
num_files_initial = f;

希望这有帮助。

最新更新