我的解析器中的编译错误,认为我的输入文件错了,但不确定哪里出错了



所以基本上这是一个解析器/余弦矩阵计算器,但我不断收到编译错误。我想我有正确读取文本文件的输入路径。但它仍然无法编译。

这是我的主要课程:

    import java.io.FileNotFoundException;
    import java.io.IOException;
    public class TfIdfMain {
    public static void main(String args[]) throws FileNotFoundException, IOException {
        DocumentParser dp = new DocumentParser();
        dp.parseFiles("C:/Users/dachen/Documents/doc1.txt"); // give the location of source file
        dp.tfIdfCalculator(); //calculates tfidf
        dp.getCosineSimilarity(); //calculates cosine similarity   
    }
}

我的解析器类:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
public class DocumentParser {
    //This variable will hold all terms of each document in an array.
    private List<String[]> termsDocsArray = new ArrayList<String[]>();
    private List<String> allTerms = new ArrayList<String>(); //to hold all terms
    private List<double[]> tfidfDocsVector = new ArrayList<double[]>();
    /**
     * Method to read files and store in array.
     */
    public void parseFiles(String filePath) throws FileNotFoundException, IOException {
        File[] allfiles = new File(filePath).listFiles();
        BufferedReader in = null;
        for (File f : allfiles) {
            if (f.getName().endsWith(".txt")) {
                in = new BufferedReader(new FileReader(f));
                StringBuilder sb = new StringBuilder();
                String s = null;
                while ((s = in.readLine()) != null) {
                    sb.append(s);
                }
                String[] tokenizedTerms = sb.toString().replaceAll("[\W&&[^\s]]", "").split("\W+");   //to get individual terms
                for (String term : tokenizedTerms) {
                    if (!allTerms.contains(term)) {  //avoid duplicate entry
                        allTerms.add(term);
                    }
                }
                termsDocsArray.add(tokenizedTerms);
            }
        }
    }
    /**
     * Method to create termVector according to its tfidf score.
     */
    public void tfIdfCalculator() {
        double tf; //term frequency
        double idf; //inverse document frequency
        double tfidf; //term requency inverse document frequency        
        for (String[] docTermsArray : termsDocsArray) {
            double[] tfidfvectors = new double[allTerms.size()];
            int count = 0;
            for (String terms : allTerms) {
                tf = new TfIdf().tfCalculator(docTermsArray, terms);
                idf = new TfIdf().idfCalculator(termsDocsArray, terms);
                tfidf = tf * idf;
                tfidfvectors[count] = tfidf;
                count++;
            }
            tfidfDocsVector.add(tfidfvectors);  //storing document vectors;            
        }
    }
    /**
     * Method to calculate cosine similarity between all the documents.
     */
    public void getCosineSimilarity() {
        for (int i = 0; i < tfidfDocsVector.size(); i++) {
            for (int j = 0; j < tfidfDocsVector.size(); j++) {
                System.out.println("between " + i + " and " + j + "  =  "
                                   + new CosineSimilarity().cosineSimilarity
                                       (
                                         tfidfDocsVector.get(i), 
                                         tfidfDocsVector.get(j)
                                       )
                                  );
            }
        }
    }
}

这是我的错误:

Exception in thread "main" java.lang.NullPointerException
    at DocumentParser.parseFiles(DocumentParser.java:22)
    at TfIdfMain.main(TfIdfMain.java:7)

文档中文本文件的路径是否错误?

Windows 文件路径应该使用 而不是 /.此外,这里还有另一个错误,代码不需要整个文件路径,只需要目录路径。所以而不是

dp.parseFiles("C:/Users/dachen/Documents/doc1.txt");

应该是

 dp.parseFiles("C:\Users\dachen\Documents");

listFiles()的文档指出:

如果此抽象路径名不表示目录,则返回 null

要传递的路径不是目录。

相关内容

  • 没有找到相关文章

最新更新