为什么在使用多个线程来计算一个大文件的词频时,答案会有变化



我的目标是在使用多个线程读取大文件时计算每个单词的频率。我正在实现Runnable接口来实现多线程。但在执行程序时,我并不是每次都得到正确的答案。有时,它给出了正确的输出,有时则不然。但是使用可调用接口而不是Runnable,程序可以正确执行,没有任何错误。

这是主要类别:

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class WordFrequencyRunnableTest {
public static void main(String[] args) throws IOException {
long startTime = System.currentTimeMillis();
String filePath = "C:/Users/Mukesh Kumar/Desktop/data.txt";
WordFrequencyRunnableTest runnableTest = new WordFrequencyRunnableTest();
Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
runnableTest.printResult(wordFrequencies);
long elapsedTime = System.currentTimeMillis() - startTime;
System.out.println("Total execution time in millis: " + elapsedTime);
}
public Map<String, Integer> parseLines(String filePath) throws IOException {
Map<String, Integer> wordFrequencies = new HashMap<>();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
String eachLine = bufferedReader.readLine();
while (eachLine != null) {
List<String> linesForEachThread = new ArrayList<>();
while (linesForEachThread.size() != 100 && eachLine != null) {
linesForEachThread.add(eachLine);
eachLine = bufferedReader.readLine();
}
WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
Thread thread = new Thread(task);
thread.start();
}
}
return wordFrequencies;
}
public void printResult(Map<String, Integer> wordFrequencies) {
wordFrequencies.forEach((key, value) -> System.out.println(key + " " + value));
}
}

这就是逻辑类:

import java.util.ArrayList;
import java.util.List;
import java.util.Map;
public class WordFrequencyUsingRunnable implements Runnable {
private final List<String> linesForEachThread;
private final Map<String, Integer> wordFrequencies;
public WordFrequencyUsingRunnable(List<String> linesForEachThread, Map<String, Integer> wordFrequencies) {
this.linesForEachThread = linesForEachThread;
this.wordFrequencies = wordFrequencies;
}
@Override
public void run() {
List<String> currentThreadLines = new ArrayList<>(linesForEachThread);
for (String eachLine : currentThreadLines) {
String[] eachLineWords = eachLine.toLowerCase().split("([,.\s]+)");
synchronized (wordFrequencies) {
for (String eachWord : eachLineWords) {
if (wordFrequencies.containsKey(eachWord)) {
wordFrequencies.replace(eachWord, wordFrequencies.get(eachWord) + 1);
}
wordFrequencies.putIfAbsent(eachWord, 1);
}
}
}
}
}

我希望得到良好的回应,并提前感谢您的帮助。

您应该等待所有线程关闭后再打印结果。

public class WordFrequencyRunnableTest {
List<Thread> threads = new ArrayList<>();
public static void main(String[] args) throws IOException {
...
...
Map<String, Integer> wordFrequencies = runnableTest.parseLines(filePath);
for(Thread thread: threads)
{
thread.join();
}
runnableTest.printResult(wordFrequencies);
...
...
}
public Map<String, Integer> parseLines(String filePath) throws IOException {
Map<String, Integer> wordFrequencies = new HashMap<>();
try (BufferedReader bufferedReader = new BufferedReader(new FileReader(filePath))) {
String eachLine = bufferedReader.readLine();
while (eachLine != null) {
List<String> linesForEachThread = new ArrayList<>();
while (linesForEachThread.size() != 100 && eachLine != null) {
linesForEachThread.add(eachLine);
eachLine = bufferedReader.readLine();
}
WordFrequencyUsingRunnable task = new WordFrequencyUsingRunnable(linesForEachThread, wordFrequencies);
Thread thread = new Thread(task);
thread.start();
threads.add(thread); // Add thread to the list.
}
}
return wordFrequencies;
}
}

PS-您可以使用ConcurrentHashMap<String, AtomicInteger>来避免同步访问hashmap。这样程序会运行得更快。

相关内容

最新更新