我想将以'TMFlatFile'(在MALLET中使用的.tsv文件格式(形式传入的平面文本文件加载到fileReader变量中。我已经创建了RunTopicModelling((方法,但try/except块有问题。我已经创建了File和FileInputStream对象,但不知道如何将其正确加载到fileReader中?
我有一个错误;类型InputStreamReader中的方法read(CharBuffer(不适用于参数(int(";。
public class TopicModelling {
private void StartTopicModellingProcess(String filePath) {
JSONIOHelper jsonIO = new JSONIOHelper();
jsonIO.LoadJSON(filePath);
ConcurrentHashMap<String, String> lemmas = jsonIO.GetDocumentsFromJSONStructure();
SaveLemmaDataToFile("topicdata.txt" ,lemmas);
}
private void SaveLemmaDataToFile(String TMFlatFile, ConcurrentHashMap<String, String> lemmas) {
for (Entry<String, String> entry : lemmas.entrySet()) {
try (FileWriter writer = new FileWriter(TMFlatFile)) {
;
writer.write(entry.getKey() + "tent" + entry.getValue() + "rn");
} catch (Exception e)
{
System.out.println("Saving to flat text file failed...");
}
}
}
private void RunTopicModelling(String TMFlatFile, int numTopics, int numThreads, int numIterations) {
ArrayList<Pipe> pipeList = new ArrayList <Pipe>();
// Pipes: tokenise, map to features
pipeList.add(new CharSequence2TokenSequence (Pattern.compile("\p{L}[\p{L}\p{P}]+\p{L}")));
pipeList.add(new TokenSequence2FeatureSequence());
InstanceList instances = new InstanceList (new SerialPipes(pipeList));
InputStreamReader fileReader = null;
//loads the file passed in via the TMFlatFile variable into the fileReader variable - this block I have a problem with
try {
File inFile = new File(TMFlatFile);
FileInputStream fis = new FileInputStream(inFile);
int line;
while ((line = fis.read()) != -1) {
}
fileReader.read(line);
}
fis.close();
}catch(
Exception e)
{
System.out.println("File Load Failed");
System.exit(1);
}
\ // linking data to the pipeline
instances.addThruPipe(new CsvIterator(fileReader,Pattern.compile("^(\S*)[\s,]*(\S*)[\s,]*(.*)$"),3,2,1));
}
有人能告诉我做这件事的正确方法是什么吗?
很难说直接的问题是什么,因为提供的代码示例看起来缺少重要部分,并且无法按照编写的方式编译(例如Exception e)
和不带引号的regex(。
数据导入开发人员指南https://mimno.github.io/Mallet/import-devel有示例代码,这应该是一个很好的起点。