如何在Java中使用FileInputStream为MALLET加载tsv文件



我想将以'TMFlatFile'(在MALLET中使用的.tsv文件格式(形式传入的平面文本文件加载到fileReader变量中。我已经创建了RunTopicModelling((方法,但try/except块有问题。我已经创建了File和FileInputStream对象,但不知道如何将其正确加载到fileReader中?

我有一个错误;类型InputStreamReader中的方法read(CharBuffer(不适用于参数(int(";。

public class TopicModelling {

private void StartTopicModellingProcess(String filePath) {
JSONIOHelper jsonIO = new JSONIOHelper(); 
jsonIO.LoadJSON(filePath); 
ConcurrentHashMap<String, String> lemmas = jsonIO.GetDocumentsFromJSONStructure();


SaveLemmaDataToFile("topicdata.txt" ,lemmas);

}

private void SaveLemmaDataToFile(String TMFlatFile, ConcurrentHashMap<String, String> lemmas) {
for (Entry<String, String> entry : lemmas.entrySet()) {
try (FileWriter writer = new FileWriter(TMFlatFile)) {
;
writer.write(entry.getKey() + "tent" + entry.getValue() + "rn");
} catch (Exception e)
{
System.out.println("Saving to flat text file failed...");
}
}
}
private void RunTopicModelling(String TMFlatFile, int numTopics, int numThreads, int numIterations) {
ArrayList<Pipe> pipeList = new ArrayList <Pipe>();
// Pipes: tokenise, map to features
pipeList.add(new CharSequence2TokenSequence (Pattern.compile("\p{L}[\p{L}\p{P}]+\p{L}")));
pipeList.add(new TokenSequence2FeatureSequence());

InstanceList instances = new InstanceList (new SerialPipes(pipeList)); 


InputStreamReader fileReader = null;
//loads the file passed in via the TMFlatFile variable into the fileReader variable - this block I have a problem with
try {

File inFile = new File(TMFlatFile);
FileInputStream fis = new FileInputStream(inFile);

int line;

while ((line = fis.read()) != -1) {
}
fileReader.read(line);

} 
fis.close();
}catch(
Exception e)
{
System.out.println("File Load Failed");
System.exit(1);
}
\      // linking data to the pipeline
instances.addThruPipe(new CsvIterator(fileReader,Pattern.compile("^(\S*)[\s,]*(\S*)[\s,]*(.*)$"),3,2,1));

}

有人能告诉我做这件事的正确方法是什么吗?

很难说直接的问题是什么,因为提供的代码示例看起来缺少重要部分,并且无法按照编写的方式编译(例如Exception e)和不带引号的regex(。

数据导入开发人员指南https://mimno.github.io/Mallet/import-devel有示例代码,这应该是一个很好的起点。

相关内容

  • 没有找到相关文章

最新更新