将所有Digram及其频率打印在字符串中



我正在尝试编写一个读取文本字符串的程序,并在此文本及其频率中打印所有digrams。Digram是两个字符的序列。该程序打印基于频率分类的DIGRAM(在下降中订购(。

输入的示例: park car at the parking lot

相应的输出:ar:3 pa:2 rk:2 at:1 ca:1 he:1 in:1 ki:1 lo:1 ng:1 ot:1 th:1

我有此实现,但仅适用于字符串中的每个字符。我将如何为每个挖掘机实施此操作?

import java.util.Scanner;
public class Digrams {
  public static void main(String args[]) {
    int ci, i, j, k, l=0;
    String str, str1;
    char c, ch;
    Scanner scan = new Scanner(System.in);
    System.out.print("Enter a String : ");
    str=scan.nextLine();
    i=str.length();
    for(c='A'; c<='z'; c++)
    {
        k=0;
        for(j=0; j<i; j++)
        {
            ch = str.charAt(j);
            if(ch == c)
            {
                k++;
            }
        }
        if(k>0)
        {
            System.out.println("" +c +": " +k);
        }
    }
  }
}

我知道你已经得到了完美的答案,而且比这要好得多,但是我想知道我是否可以在没有Collections班级的情况下按降序对结果进行排序,可能是帮助,或一个新想法。

import java.util.ArrayList;
import java.util.Scanner;

public class Digrams{
    public static void main(String[] args){
        Scanner in = new Scanner(System.in);
        System.out.println("Insert The Sentence");
        String []sentence =  in.nextLine().split(" "); // split the input according to the spaces and put them in array
        //get all digrams
        ArrayList<String> allDigrams = new ArrayList<String>(); // ArrayList to contain all possible digrams
        for(int i=0; i<sentence.length; i++){ // do that for every word     
            for(int j=0; j<sentence[i].length(); j++){ // cycle through each char at each index in the sentence array
                String oneDigram= "";
                if(j<sentence[i].length()-1){
                    oneDigram += sentence[i].charAt(j); // append the char and the following char
                    oneDigram += sentence[i].charAt(j+1);
                    allDigrams.add(oneDigram); // add the one diagram to the ArrayList
                }
            }
        }
        // isolate digrams and get corresponding frequencies
        ArrayList<Integer> frequency = new ArrayList<Integer>(); // for frequencies
        ArrayList<String>  digrams = new ArrayList<String>(); //for digrams
        int freqIndex=0;
        while(allDigrams.size()>0){ 
            frequency.add(freqIndex,0);
            for(int j=0; j<allDigrams.size(); j++){ // compare each UNIQUE digram with the rest of the digrams to find repetition
                if(allDigrams.get(0).equalsIgnoreCase(allDigrams.get(j))){
                    frequency.set(freqIndex, frequency.get(freqIndex)+1); // increment frequency    
                }
            }
            String dig = allDigrams.get(0); // record the digram temporarily
            while(allDigrams.contains(dig)){ // now remove all repetition from the allDigrams ArrayList
                allDigrams.remove(dig);
            }
            digrams.add(dig); // add the UNIQUE digram
            freqIndex++; // move to next index for the following digram 
        }

        // sort result in descending order
        // compare the frequency , if equal -> the first char of digram, if equal -> the second char of digram
        // and move frequencies and digrams at every index in each ArrayList accordingly
        for (int i = 0 ; i < frequency.size(); i++){
            for (int j = 0 ; j < frequency.size() - i - 1; j++){
                if (frequency.get(j) < frequency.get(j+1) || 
                      ((frequency.get(j) == frequency.get(j+1)) && (digrams.get(j).charAt(0) > digrams.get(j+1).charAt(0))) ||
                        ((digrams.get(j).charAt(0) == digrams.get(j+1).charAt(0)) && (digrams.get(j).charAt(1) > digrams.get(j+1).charAt(1)))){ 
                    int swap  = frequency.get(j);
                    String swapS = digrams.get(j);
                    frequency.set(j, frequency.get(j+1));
                    frequency.set(j+1, swap);
                    digrams.set(j, digrams.get(j+1));
                    digrams.set(j+1, swapS);
                }
            }
        }

         //final result
         String sortedResult="";
         for(int i=0; i<frequency.size(); i++){
             sortedResult+=digrams.get(i) + ":" + frequency.get(i) + " ";
         }
         System.out.println(sortedResult);
    }
}

输入

park car at the parking lot

输出

ar:3 pa:2 rk:2 at:1 ca:1 he:1 in:1 ki:1 lo:1 ng:1 ot:1 th:1

这样做的方法是检查每2个字母组合,然后查找这些组合。您可以通过使用双循环来做到这一点,例如:

public static void main(String args[]) {
    int ci, i, j, k, l=0;
    String str, str1, result, subString;
    char c1, c2, ch;
    Scanner scan = new Scanner(System.in);
    System.out.print("Enter a String : ");
    str=scan.nextLine();
    i=str.length();
    for(c1='A'; c1<='z'; c1++)
    {
        for(c2='A'; c2<='z'; c2++) {
            result = new String(new char[]{c1, c2});
            k = 0;
            for (j = 0; j < i-1; j++) {
                subString = str.substring(j, j+2);
                if (result.equals(subString)) {
                    k++;
                }
            }
            if (k > 0) {
                System.out.println("" + result + ": " + k);
            }
        }
    }
}

这也意味着您必须比较字符串,而不是比较字符。当然,这意味着需要使用.equals()函数,而不是==操作员,因为字符串是Java中的对象。

对我的结果是:

ar: 3 at: 1 ca: 1 he: 1 in: 1 ki: 1 lo: 1 ng: 1 ot: 1 pa: 2 rk: 2 th: 1

以下是您在一行中进行操作的方式:

Map<String, Long> digramFrequencies = Arrays
    .stream(str
        .replaceAll("(?<!^| ).(?! |$)", "$0$0") // double letters
        .split(" |(?<=\G..)")) // split into digrams 
    .filter(s -> s.length() > 1) // discard short terms
    .collect(Collectors.groupingBy(s -> s, Collectors.counting()));

请参阅实时演示。

这可以:

  • 将所有字母不在单词的启动/末尾加倍,例如"abc defg"变为"abbc deeffg"
  • 分成成对,在单词开头重新启动拆分
  • 抛弃短期(例如" i"one_answers" a"之类的单词(
  • 计数频率

这应该有所帮助:

    public static void main(String[] args) {
        Scanner scan = new Scanner(System.in);
        System.out.print("Enter a String : ");
        String str = scan.nextLine();
        ArrayList<String> repetition = new ArrayList<String>();
        ArrayList<String> digrams = new ArrayList<String>();
        String digram;
        for(int i = 0; i < str.length() - 1; i++) {
            digram = str.substring(i, i + 2);
            if(repetition.contains(digram) || digram.contains(" ") || digram.length() < 2)
                continue;
            int occurances = (str.length() - str.replace(digram, "").length()) / 2;
            occurances += (str.replaceFirst(".*?(" + digram.charAt(0) + "+).*", "$1").length() - 1) / 2;
            digrams.add(digram + ":" + occurances);
            repetition.add(digram);
        }
        Collections.sort(digrams, (s1, s2) -> s1.substring(3, 4).compareTo(s2.substring(3, 4)));
        System.out.println(digrams);
}

如果您不想使用JDK8,请让我知道。

相关内容

  • 没有找到相关文章

最新更新