无法完全让词法分析器在我的 Java 程序中工作



我正试图让这段Java代码从词汇上分析短语"(sum+47(/总计";并将其吐为:

Next token is: 25 Next lexeme is (
Next token is: 11 Next lexeme is sum 
Next token is: 21 Next lexeme is +
Next token is: 10 Next lexeme is 47
Next token is: 26 Next lexeme is )
Next token is: 24 Next lexeme is /
Next token is: 11 Next lexeme is total
Next token is: -1 Next lexeme is EOF

然而,结果却是这样的:

Next token is: 25 Next lexeme is (
Next token is: 11 Next lexeme is um 
Next token is: 21 Next lexeme is +
Next token is: 10 Next lexeme is 47
Next token is: 24 Next lexeme is /
Next token is: 11 Next lexeme is total

我知道我把EOF不出现的事情搞砸了,但我不明白为什么它会切断";s";总计和"("在47。这是我的代码供参考。请让我知道,如果有什么我需要做的这个帖子,因为这是我的第一个。

import java.io.*;
import java.util.*;
public class Main
{
private static final int LETTER=0;
private static final int DIGIT=1;
private static final int UNKNOWN=99;
private static final int EOF=-1;
private static final int INT_LIT=10;
private static final int IDENT=11;
private static final int ASSIGN_OP=20;
private static final int ADD_OP=21;
private static final int SUB_OP=22;
private static final int MULT_OP=23;
private static final int DIV_OP=24;
private static final int LEFT_PAREN=25;
private static final int RIGHT_PAREN=26;

private static int charClass;
private static char lexeme[];
private static char nextChar;
private static int lexLen;
private static int token;
private static int nextToken;
private static File file;
private static FileInputStream fis;
public static int lookup(char ch)
{
switch (ch)
{
case '(':
addChar();
nextToken = LEFT_PAREN;
break;
case ')':
addChar();
nextToken = RIGHT_PAREN;
break;
case '+':
addChar();
nextToken = ADD_OP;
break;
case '-':
addChar();
nextToken = SUB_OP;
break;
case '*':
addChar();
nextToken = MULT_OP;
break;
case '/':
addChar();
nextToken = DIV_OP;
break;
default:
addChar();
nextToken = EOF;
break;
}
return nextToken;
}
public static void addChar()
{
if (lexLen <= 98)
{
lexeme[lexLen++] = nextChar;
lexeme[lexLen] = 0;
}
else
System.out.println("Error -lexeme is too longn");
}
public static void getChar()
{
try
{
if(fis.available()>0)
{
nextChar=(char)fis.read();
if(Character.isLetter(nextChar))
charClass=LETTER;
else if(Character.isDigit(nextChar))
charClass=DIGIT;
else
charClass=UNKNOWN;
}
else
charClass=EOF;

}
catch(IOException e)
{
e.printStackTrace();
}
}
public static void getNonBlank()
{
while(Character.isSpaceChar(nextChar))
getChar();

}
public static int lex()
{
lexLen = 0;
getNonBlank();
switch (charClass)
{
/* parse identifiers */
case LETTER:
addChar();
getChar();
while (charClass == LETTER || charClass == DIGIT)
{
addChar();
getChar();
}
nextToken = IDENT;
break;
/* parse integer literals and integers */
case DIGIT:
addChar();
getChar();
while(charClass == DIGIT)
{
addChar();
getChar();
}
nextToken = INT_LIT;
break;
/* parentheses and operators */
case UNKNOWN:
lookup(nextChar);
getChar();
break;
/* EOF */
case EOF:
nextToken = EOF;
break;
} /* end of switch */
System.out.print("Next token is :"+nextToken+" Next lexeme is :");
for(int i=0;i<lexLen;i++)
System.out.print(lexeme[i]);
System.out.println();
return nextToken;
}
public static void main(String args[])
{
lexLen=0;
lexeme=new char[100];
for(int i=0;i<100;i++)
lexeme[i]='0';
file = new File("input1.txt");
if (!file.exists())
{
System.out.println( "input1.txt does not exist.");
return;
}
if (!(file.isFile() && file.canRead()))
{
System.out.println(file.getName() + " cannot be read.");
return;
}
try
{
fis = new FileInputStream(file);
char current;
while (fis.available() > 0)
{
getChar();
//   System.out.println(nextChar+" "+charClass);
lex();
}
}
catch (IOException e)
{
e.printStackTrace();
}
}
}

丢弃的字符错误和丢失的EOF错误都发生在这个循环中:

while (fis.available() > 0)
{
getChar();
lex();
}

您应该能够通过在纸上执行一个简单输入的循环来解决问题。(例如,尝试()后跟文件末尾。(

这两个问题的关键是lex的合同——也就是说,世界在执行之前和之后应该如何看待的规范——包括:

  • 前提条件(调用lex时必须为true(:nextChar是下一个可用的输入字符,charClass是它的类
  • postcondition(lex保证在调用后这将是真的(:nextChar是下一个可用的输入字符,charClass是它的类

请注意,这些都是相同的,这并不罕见。这通常被称为不变量

另一方面,getChar的合同是:

  • 前提条件:不再需要nextCharcharClass的值
  • 后置条件nextChar是下一个可用的输入字符,charClass是它的类

为您编写的每个函数显式地记录契约始终是一个好习惯。这样做有助于你发现问题。特别是,考虑到lex的后置条件和getChar的前置条件(将在下一次循环迭代开始时调用(,你能说什么?。

如果您在上面的模型中添加文件结束指示符的条件,您可能也会看到这个错误。

最新更新