我正试图将文本文件放入链表中,并计算链表中有多少重复单词。
这是我的基本代码。
public class Node{
private Node next;
private String data;
private int Dup_Counter= 0;
public Node(){
this.next = null;
this.data = data;
this.Dup_Counter = 0;
}
public String fiile_Reader() throws FileNotFoundException {
File file = new File("/Users/djhanz/IdeaProjects/datalab2/pg174.txt"); //reading a plain text file
Scanner scan = new Scanner(file);
String fileContent = ""; // initalizing an empty string to put scanned string text file
while (scan.hasNextLine()) {
fileContent = fileContent.concat(scan.nextLine() + "n"); // scan and put in into string object
}
fileContent = fileContent.replaceAll("\p{Punct}", ""); // remove all the punctuation characters
fileContent = fileContent.toLowerCase();
return fileContent;
}
public void insert() throws FileNotFoundException{
Node cursor = head;
Single_LL Linked_L = new Single_LL();
String file_content = Linked_L.fiile_Reader();
String[] splitted_File = file_content.split(" ");
for(int i=0 ; i<splitted_File.length; i++){
Linked_L.add(splitted_File[i]);
}
}
public int Word_Counter(String word){
String compare =word;
Node cursor = head;
int counter = 0;
while(cursor!=null){
if (cursor.data.equals(compare)){
counter++;
}
cursor = cursor.next;
}
return counter;
}
public void Dup_Count(){
Node cursor = head.next;
while (cursor != null){
if(head.data == cursor.data){
head.Dup_Counter++;
break;
}
cursor = cursor.next;
System.out.println(cursor.Dup_Counter);
}
head = head.next;
}
public String dup_num(){
Node cursor = head;
String rtn = "";
while (cursor!= null){
if(cursor.Dup_Counter > 20 ){
rtn += cursor.data + " -> ";
}
cursor = cursor.next;
}
return rtn;
}
public static void main(String[] args) throws FileNotFoundException {
Program1 test = new Program1();
String file_content = test.fiile_Reader();
Single_LL Linked_L = new Single_LL();
String[] splitted_File = file_content.split(" ");
int spli_len = splitted_File.length;
for(int i =0; i< spli_len; i++){
Linked_L.add(splitted_File[i]);
}
我的方法是在Node类中添加一个名为dup_counter的变量。函数Dup_Count((在链表中循环,当它看到重复时,它会更新Node的Dup_counter变量。
我正在努力寻找出现20次以上的单词,而dup_num((是我的方法。在linkedlist中循环,如果Node的dup_Count大于20,则将其添加到字符串中并返回。但是,dup_Count((实际上并没有更新dup_Count值。插入操作很好,但我似乎找不到我的dup_counter有什么问题。有人能帮我修这个虫子吗?
我建议尝试使用如下的Map来简化您的任务
不知怎的,把所有的单词都放到一个集合Collection<String> words
中。使用您已经拥有的阅读器代码应该很容易做到这一点。
现在,为了计算每个单词的出现次数,我们可以使用Map:
Map<Integer, Long> counter = words.stream()
.collect(Collectors.groupingBy(p -> p, Collectors.counting()));
现在你想找到所有出现20次以上的单词,你可以评估
Set<String> wordsOccuringManyTimes = counter
.entrySet().stream()
.filter(e -> e.getValue() > 20)
.map(Map.Entry::getKey)
.collect(Collectors.toSet());
如果你想得到所有重复的总和,你可以简单地评估
int duplicateCount = counter.values().stream().mapToInt(x -> x - 1).sum();