我正在尝试编写一个读取文本字符串的程序,并在此文本及其频率中打印所有digrams。Digram是两个字符的序列。该程序打印基于频率分类的DIGRAM(在下降中订购(。
输入的示例: park car at the parking lot
相应的输出:ar:3 pa:2 rk:2 at:1 ca:1 he:1 in:1 ki:1 lo:1 ng:1 ot:1 th:1
我有此实现,但仅适用于字符串中的每个字符。我将如何为每个挖掘机实施此操作?
import java.util.Scanner;
public class Digrams {
public static void main(String args[]) {
int ci, i, j, k, l=0;
String str, str1;
char c, ch;
Scanner scan = new Scanner(System.in);
System.out.print("Enter a String : ");
str=scan.nextLine();
i=str.length();
for(c='A'; c<='z'; c++)
{
k=0;
for(j=0; j<i; j++)
{
ch = str.charAt(j);
if(ch == c)
{
k++;
}
}
if(k>0)
{
System.out.println("" +c +": " +k);
}
}
}
}
我知道你已经得到了完美的答案,而且比这要好得多,但是我想知道我是否可以在没有Collections
班级的情况下按降序对结果进行排序,可能是帮助,或一个新想法。
import java.util.ArrayList;
import java.util.Scanner;
public class Digrams{
public static void main(String[] args){
Scanner in = new Scanner(System.in);
System.out.println("Insert The Sentence");
String []sentence = in.nextLine().split(" "); // split the input according to the spaces and put them in array
//get all digrams
ArrayList<String> allDigrams = new ArrayList<String>(); // ArrayList to contain all possible digrams
for(int i=0; i<sentence.length; i++){ // do that for every word
for(int j=0; j<sentence[i].length(); j++){ // cycle through each char at each index in the sentence array
String oneDigram= "";
if(j<sentence[i].length()-1){
oneDigram += sentence[i].charAt(j); // append the char and the following char
oneDigram += sentence[i].charAt(j+1);
allDigrams.add(oneDigram); // add the one diagram to the ArrayList
}
}
}
// isolate digrams and get corresponding frequencies
ArrayList<Integer> frequency = new ArrayList<Integer>(); // for frequencies
ArrayList<String> digrams = new ArrayList<String>(); //for digrams
int freqIndex=0;
while(allDigrams.size()>0){
frequency.add(freqIndex,0);
for(int j=0; j<allDigrams.size(); j++){ // compare each UNIQUE digram with the rest of the digrams to find repetition
if(allDigrams.get(0).equalsIgnoreCase(allDigrams.get(j))){
frequency.set(freqIndex, frequency.get(freqIndex)+1); // increment frequency
}
}
String dig = allDigrams.get(0); // record the digram temporarily
while(allDigrams.contains(dig)){ // now remove all repetition from the allDigrams ArrayList
allDigrams.remove(dig);
}
digrams.add(dig); // add the UNIQUE digram
freqIndex++; // move to next index for the following digram
}
// sort result in descending order
// compare the frequency , if equal -> the first char of digram, if equal -> the second char of digram
// and move frequencies and digrams at every index in each ArrayList accordingly
for (int i = 0 ; i < frequency.size(); i++){
for (int j = 0 ; j < frequency.size() - i - 1; j++){
if (frequency.get(j) < frequency.get(j+1) ||
((frequency.get(j) == frequency.get(j+1)) && (digrams.get(j).charAt(0) > digrams.get(j+1).charAt(0))) ||
((digrams.get(j).charAt(0) == digrams.get(j+1).charAt(0)) && (digrams.get(j).charAt(1) > digrams.get(j+1).charAt(1)))){
int swap = frequency.get(j);
String swapS = digrams.get(j);
frequency.set(j, frequency.get(j+1));
frequency.set(j+1, swap);
digrams.set(j, digrams.get(j+1));
digrams.set(j+1, swapS);
}
}
}
//final result
String sortedResult="";
for(int i=0; i<frequency.size(); i++){
sortedResult+=digrams.get(i) + ":" + frequency.get(i) + " ";
}
System.out.println(sortedResult);
}
}
输入
park car at the parking lot
输出
ar:3 pa:2 rk:2 at:1 ca:1 he:1 in:1 ki:1 lo:1 ng:1 ot:1 th:1
这样做的方法是检查每2个字母组合,然后查找这些组合。您可以通过使用双循环来做到这一点,例如:
public static void main(String args[]) {
int ci, i, j, k, l=0;
String str, str1, result, subString;
char c1, c2, ch;
Scanner scan = new Scanner(System.in);
System.out.print("Enter a String : ");
str=scan.nextLine();
i=str.length();
for(c1='A'; c1<='z'; c1++)
{
for(c2='A'; c2<='z'; c2++) {
result = new String(new char[]{c1, c2});
k = 0;
for (j = 0; j < i-1; j++) {
subString = str.substring(j, j+2);
if (result.equals(subString)) {
k++;
}
}
if (k > 0) {
System.out.println("" + result + ": " + k);
}
}
}
}
这也意味着您必须比较字符串,而不是比较字符。当然,这意味着需要使用.equals()
函数,而不是==
操作员,因为字符串是Java中的对象。
对我的结果是:
ar: 3 at: 1 ca: 1 he: 1 in: 1 ki: 1 lo: 1 ng: 1 ot: 1 pa: 2 rk: 2 th: 1
以下是您在一行中进行操作的方式:
Map<String, Long> digramFrequencies = Arrays
.stream(str
.replaceAll("(?<!^| ).(?! |$)", "$0$0") // double letters
.split(" |(?<=\G..)")) // split into digrams
.filter(s -> s.length() > 1) // discard short terms
.collect(Collectors.groupingBy(s -> s, Collectors.counting()));
请参阅实时演示。
这可以:
- 将所有字母不在单词的启动/末尾加倍,例如
"abc defg"
变为"abbc deeffg"
- 分成成对,在单词开头重新启动拆分
- 抛弃短期(例如" i"one_answers" a"之类的单词(
- 计数频率
这应该有所帮助:
public static void main(String[] args) {
Scanner scan = new Scanner(System.in);
System.out.print("Enter a String : ");
String str = scan.nextLine();
ArrayList<String> repetition = new ArrayList<String>();
ArrayList<String> digrams = new ArrayList<String>();
String digram;
for(int i = 0; i < str.length() - 1; i++) {
digram = str.substring(i, i + 2);
if(repetition.contains(digram) || digram.contains(" ") || digram.length() < 2)
continue;
int occurances = (str.length() - str.replace(digram, "").length()) / 2;
occurances += (str.replaceFirst(".*?(" + digram.charAt(0) + "+).*", "$1").length() - 1) / 2;
digrams.add(digram + ":" + occurances);
repetition.add(digram);
}
Collections.sort(digrams, (s1, s2) -> s1.substring(3, 4).compareTo(s2.substring(3, 4)));
System.out.println(digrams);
}
如果您不想使用JDK8,请让我知道。