在Lucene中索引一个TXT文件

我想为推文创建一个小的搜索angine。我有一个带有20000条推文的TXT文件。文件格式就像：

tommyfrench1
851
85170333395811123
Lurgan，Moira，阿尔玛。德里
本周我们首先是双重的乐趣在商店中的四场冠军联赛比赛中的射手。 Championsleague

im_aarkay
175
851703414300037122
paris
@championsleague @as_monaco @as_monaco_en nopes，这是城市击败了Outta冠军联赛。。
。
等

第一行是username，其次我具有followers，其次是id和location，最后是text(tweet)。

我认为每个推文都是文档。因此，我必须拥有20000个文档，每个文档必须具有5个字段(用户名，关注者，ID等(。

如何制作索引？

我看过一些教程，但没有找到类似的东西

编辑：这是我的代码。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.nio.file.Paths;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopScoreDocCollector;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.store.RAMDirectory;
import org.apache.lucene.util.Version;
public class MyProgram {
    public static void main(String[] args) throws IOException, ParseException {
        FileReader fileReader = new FileReader(new File("myfile.txt"));
        BufferedReader br = new BufferedReader(fileReader);
        String line = null;
        String indexPath = "C:\Desktop\myfolder";
        Directory dir = FSDirectory.open(Paths.get(indexPath));
        Analyzer analyzer = new StandardAnalyzer();
        IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
        IndexWriter writer = new IndexWriter(dir, iwc);

        while ((line = br.readLine()) != null) {
            // reading lines until the end of the file
            Document doc = new Document();
            String username = br.readLine();
            doc.add(new Field("username", username, Field.Store.YES, Field.Index.ANALYZED));  // adding title field
            String followers = br.readLine();
            doc.add(new Field("followers", followers, Field.Store.YES, Field.Index.ANALYZED));
            String id = br.readLine();
            doc.add(new Field("id", id, Field.Store.YES, Field.Index.ANALYZED));
            String location = br.readLine();
            doc.add(new Field("location", location, Field.Store.YES, Field.Index.ANALYZED));
            String text = br.readLine();
            doc.add(new Field("text", text, Field.Store.YES, Field.Index.ANALYZED));
            writer.addDocument(doc);  // writing new document to the index

            br.readLine();
         }
    }
}

im收到以下错误： Index cannot be resolved or is not a field。

我该如何修复？

从您的问题中很难解释，实际上您面临编译时间错误而不运行时间错误。

我必须复制您的代码，以了解这是Field构造函数上的Field.Index.ANALYZED参数上的编译时间错误。

参考文档，并且在6.5.0中没有此类构造函数。

这是人们使用最高工具等高级工具等的原因之一，因为这些更改一直在低Lucene API中发生。

无论如何，在上面的文档中，它也提到了您

专家：直接创建文档字段。大多数用户应该使用糖子类之一：

对于您的情况，TextField和StringField是相关的类 - 两者有微妙的差异。

因此，我将使用像-new StringField(fieldName, fieldValue, Store.YES)等这样的构造函数，而不是在Field上直接进行。

您也可以使用 Field也喜欢- new Field(fieldName, fieldValue, fieldType)，其中 fieldType是fieldType。

您可以初始化FieldType，例如-FieldType txtFieldType = new FieldType(TextField.TYPE_STORED)或 FieldType strFieldType = new FieldType(StringField.TYPE_STORED)等

总的来说，他们在Lucene中创建Field的方式已在最近的版本中发生了变化，因此根据所使用的Lucene版本的文档创建Field实例。

类似-doc.add(new Field("username", username, new FieldType(TextField.TYPE_STORED)))等。

相关内容

最新更新

热门标签：