在Java中使用hashmap进行单词列表搜索

我有一个单词列表，在我的单词列表中有超过50,000个单词。正如你所看到的，我读取我的单词并将它们添加到数组列表中，但是在这个过程之后，当我想要读取我的单词时，它发生得很慢。这就是为什么我想到了Hashmap。我想读取我的单词，当我从用户那里收到一个单词输入时，我想检查它是否在HashMap中。即使我做了研究，我也找不到确切的方法。我该怎么做呢?

public ArrayList<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
ArrayList <String> words = new ArrayList<String>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}

如果我正确理解了你的问题，当你试图检查某个特定单词是否存在于你的列表中时，你在遍历包含50,000个单词的ArrayList时遇到性能问题。

这是因为在未排序的List中查找元素具有O(n)的复杂性。你可以通过使用像BST(二叉搜索树)这样的排序数据结构来提高性能，这将通过O(log n)来改进研究操作。复杂性。

此外，您使用Map的想法绝对是可行的，因为HashMap赋予了O(1)之间的添加和获取操作的复杂性。(对于理论上完美的哈希算法，键之间没有冲突)和O(n)(对于有高碰撞可能性的坏哈希算法)。此外，自Java 8以来，在HashMap实现中引入了优化，在高碰撞条件下，将多个元素添加到同一桶中，桶对应的数据结构实际上被实现为平衡树而不是列表，授予O(log n)

最坏情况下的复杂性。 https://www.logicbig.com/tutorials/core-java-tutorial/java-collections/java-map-cheatsheet.html

然而，对于我假设的字典(只有不同的单词)使用HashMap可能是不必要的，因为您将使用一个单词作为键和值。代替HashMap，您可以使用其他人指出的Set，或者更好的是HashSet。事实上，HashSet是通过HashMap实例在底层实现的，这将为我们提供前面讨论过的所有性能和优势(这就是我写序言的原因)。

https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HashSet.html

你的实现可以像这样:

public Set<String> wordReader(String path) throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
Set<String> words = new HashSet<>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}
public boolean isWordContained(Set<String> set, String word) {
return set.contains(word);
}

由于您将检查单词输入是否存在于从文件读取的单词列表中，因此您可以使用HashSet<String>而不是ArrayList<String>。

你的方法将变成

public HashSet<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
HashSet <String> words = new HashSet<String>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
words.add(data);
}
scanner.close();
return words;
}

现在，在您读取单词输入后，您可以检查它是否存在于HashSet中。这将是一个更快的操作，因为查找将花费常数时间。

public boolean isWordPresent(String word, HashMap<String> words){
return words.contains(word);
}

作为旁注，HashSet内部使用HashMap来执行操作。

我会使用Set，而不是List，因为当您将它们添加到集合中时，集合会自动忽略重复项。如果不存在，则返回true并添加它，否则返回false。

public Set<String> wordReader () throws FileNotFoundException {
File txt = new File(path);
Scanner scanner = new Scanner(txt);
Set <String> words = new HashSet<>();
while (scanner.hasNextLine()) {
String data = scanner.nextLine();
if(!words.add(data)) {
// present - Do something
} 
}   

scanner.close();
return words;
}

因为集合不是有序的，所以它们不是随机访问集合。因此，您可以将集合添加到列表中，如下所示:

Set<String> words = wordReader();
List<String> wordList = new ArrayList<>(words);

现在您可以通过索引检索它们。

你可能想通过传递文件名作为参数使你的方法更通用。

相关内容

最新更新

热门标签：