我正在尝试制作一个程序,计算单词、行、句子的数量,以及文章"a"、"one_answers"、"的数量。到目前为止,我已经掌握了单词、行和句子。但我不知道我要把文章数给谁。程序如何区分"a"one_answers"and"。
到目前为止,这是我的代码。
public static void main(String[]args) throws FileNotFoundException, IOException
{
FileInputStream file= new FileInputStream("C:\Users\nlstudent\Downloads\text.txt");
Scanner sfile = new Scanner(new File("C:\Users\nlstudent\Downloads\text.txt"));
int ch,sentence=0,words = 0,chars = 0,lines = 0;
while((ch=file.read())!=-1)
{
if(ch=='?'||ch=='!'|| ch=='.')
sentence++;
}
while(sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
words += new StringTokenizer(line, " ,").countTokens();
}
System.out.println("Number of words: " + words);
System.out.println("Number of sentence: " + sentence);
System.out.println("Number of lines: " + lines);
System.out.println("Number of characters: " + chars);
}
}
程序如何区分"a"one_answers"and"。
您可以为此使用regex:
String input = "A and Andy then the are a";
Matcher m = Pattern.compile("(?i)\b((a)|(an)|(and)|(the))\b").matcher(input);
int count = 0;
while(m.find()){
count++;
}
//count == 4
"\b"是单词边界,"|"是OR、"(?i)"--忽略大小写标志。您可以在这里找到的所有模式列表,也许您应该了解regex。
标记化器会将每一行拆分为标记。您可以评估每个标记(一个完整的单词),看看它是否与您期望的字符串匹配。下面是一个计算a和的示例。
int a = 0, and = 0, the = 0, forCount = 0;
while (sfile.hasNextLine()) {
lines++;
String line = sfile.nextLine();
chars += line.length();
StringTokenizer tokenizer = new StringTokenizer(line, " ,");
words += tokenizer.countTokens();
while (tokenizer.hasMoreTokens()) {
String element = (String) tokenizer.nextElement();
if ("a".equals(element)) {
a++;
} else if ("and".equals(element)) {
and++;
} else if ("for".equals(element)) {
forCount++;
} else if ("the".equals(element)) {
the++;
}
}
}