我正在制作一个词频计数器,它返回随机单词



我正在创建一个程序,该程序将接收一个输入文本文件,并打印出10个最常用的单词以及它们的使用次数。然而,它目前打印10个随机单词,没有排序。我缺了什么吗?

    public void insert(E word) {
    if (word.equals("")) {
        return;
    }
    //Adds 2 temporary nodes, and sets first to the first one if first is empty
    Node temp = new Node(word);
    Node temp2;
    if (first == null) {
        first = temp;
    } else{
    for (Node temp6 = first; temp6 != null; temp6 = temp6.next) {
        if (temp6.key.equals(temp.key)) {
            temp6.count++;
            temp2 = temp6;
            Node parent = first;
            Node parent2 = first;
            while (parent != null) {
                if (parent.key.equals(word)) {
                    if (parent == first) {
                        first = first.next;
                    } else {
                        parent2.next = parent.next;
                    }
                }
                parent2 = parent;
                parent = parent.next;
            }
            //replaces first with temp2 if temp2's count is higher than first's
            if (temp2.count > first.count) {
                Node temp3 = first;
                first = temp2;
                first.next = temp3;
            } 
            //Adds 1 to the counter if the word is already in the linkedlist. Moves the node to the correct place and deletes the original node.
            for (Node temp4 = first.next; temp4 != null; temp4 = temp4.next){
                if(temp4.next.count < first.count){
                    Node temp5 = temp4.next;
                    temp4.next = temp2;
                    temp2.next = temp5;
                    break;
                }
            }
            return;
            }
        }
        current.next = temp;
    }
    current = temp;
}

乍一看,解决问题的方法似乎有点过于复杂。这可能是因为Node类所做的事情需要更复杂的方法。但是,我建议使用Set。通过这种方式,您可以创建一个名为Word的POJO,其中包含一个String wordInteger count。如果你用这个POJO implement Comparable,那么你可以@Override compareTo(Word w),然后你可以对我的计数进行排序。由于Set不允许重复,因此您可以为读取的每个单词创建new Word,或者简单地增加Word的计数。读取完整个文件后,只需打印出列表中的前10个对象。举例说明我的观点。

class Word implements Comparable<Word>{
    String word;
    Integer count;
    Word(String w, Integer c) {
        this.word = w;
        this.count = c;
    }
    public String toString(){   
        return word + " appeared " + count + " times.";
    }
    @Override
    public int compareTo(Word w) {
        return  this.count - w.count;
    }
}
public class TestTreeMap {
    public static void main(String[] args) {
        //Add logic here for reading in from file and ...
    }
}

不管怎样,我希望这个答案能为你指明正确的方向。顺便说一句,我倾向于尝试找到最简单的解决方案,因为我们越聪明,我们的代码就越难以维护。祝你好运

以下是我们如何使用集合

class WordCount {
    public static void main (String[] are) {
        //this should change. Used to keep it simple
        String sentence = "Returns a key value mapping associated with the least key greater than or   equal to the given key";
        String[] array = sentence.split("\s");
        //to store the word and their count as we read them from the file
        SortedMap<String, Integer> ht = new TreeMap<String, Integer>();
        for (String s : array) {
            if (ht.size() == 0) {
                ht.put(s, 1);
            } else {
                if (ht.containsKey(s)) {
                    int count = (Integer) ht.get(s);
                    ht.put(s, count + 1);
                } else {
                    ht.put(s, 1);
                }
            }
        }
        //impose reverse of the natural ordering on this map
        SortedMap<Integer, String> ht1 = new TreeMap<Integer, String>(Collections.reverseOrder());
        for (Map.Entry<String, Integer> entrySet : ht.entrySet()) {
            //setting the values as key in this map
            ht1.put(entrySet.getValue(), entrySet.getKey());
        }
        int firstTen = 0;
        for (Map.Entry<Integer, String> entrySet : ht1.entrySet()) {
            if (firstTen == 10) 
                break;
            System.out.println("Word-" + entrySet.getValue() + " number of times-" +   entrySet.getKey());
            firstTen++;
        }
    }
}

这里有一个问题。。。也就是说,如果有两个单词具有相同的频率,我们在输出中只看到一个。

因此,我最终再次将其修改为

class WordCount1 {
    public static void main (String...arg) {
        String sentence = "Returns a key value mapping mapping the mapping key the than or equal to the or key";
        String[] array = sentence.split("\s");
        Map<String, Integer> hm = new HashMap<String, Integer>();
        ValueComparator vc = new ValueComparator(hm);
        SortedMap<String, Integer> ht = new TreeMap<String, Integer>(vc);
        for (String s : array) {
            if (hm.size() == 0) {
                hm.put(s, 1);
            } else {
                if (hm.containsKey(s)) {
                    int count = (Integer) hm.get(s);
                    hm.put(s, count + 1);
                } else {
                    hm.put(s, 1);
                }
            }
        }
        ht.putAll(hm);
        int firstTen = 0;
        for (Map.Entry<String, Integer> entrySet : ht.entrySet()) {
            if (firstTen == 10) 
                break;
            System.out.println("Word-" + entrySet.getKey() + " number of times-" + entrySet.getValue());
        firstTen++;
    }
}

和,这里的ValueComparator。调整了一点,如下

public class ValueComparator implements Comparator<String> {
    Map<String, Integer> entry;
    public ValueComparator(Map<String, Integer> entry) {
        this.entry = entry;
    }
    public int compare(String a, String b) {
        //return entry.get(a).compareTo(entry.get(b));
        //return (thisVal<anotherVal ? -1 : (thisVal==anotherVal ? 0 : 1));//from java source
        return (entry.get(a) < entry.get(b) ? 1 : (entry.get(a) == entry.get(b) ? 1 : -1));
    }
}

这个程序区分大小写,如果您需要不区分大小写的行为,只需在放入Map之前将字符串转换为小写即可。

相关内容

  • 没有找到相关文章

最新更新