apache tomcat servlet中javaHashSet的线程安全性



我正在尝试编写一个多线程的apache tomcat servlet,它将在处理的每个POST主体中接收大量文本,并在GET请求时返回接收到的唯一单词数。我已经使用Qt和QtWebApp库实现了这一点,但我似乎无法用Java实现这一点。我不太确定问题出在哪里,但可能是应用程序的整体线程安全问题(或者单词是如何拆分和保存的)。返回的单词数总是太高(比实际数量高出约2000-4000,即70000-140000——我确实有这些测试用例的结果)。我的代码如下:

@WebServlet(name = "data", urlPatterns =     {"/myserver/","/myserver/data","/myserver/count"})
public class data extends HttpServlet {
HashSet<String> slova = new HashSet<>();
public final Lock lock = new ReentrantLock();
protected void processRequest(HttpServletRequest request, HttpServletResponse response)
        throws ServletException, IOException {
    response.setContentType("text/html;charset=UTF-8");

    try (PrintWriter out = response.getWriter()) {

        if("POST".equals(request.getMethod()) && "/osp/myserver/data".equals(request.getRequestURI())){
        InputStream body = request.getInputStream();
        GZIPInputStream gstream = new GZIPInputStream(body);
        BufferedReader buffreader = new BufferedReader(new InputStreamReader(gstream, "UTF8"));

        String vse ="";
        StringBuffer sbuffer = new StringBuffer();
        while ((vse = buffreader.readLine()) != null)
        {
            sbuffer.append(vse);
        }
        String text = sbuffer.toString();
        System.out.println(text);

            String[] words = text.split("\s+");
            lock.lock();
            for(int i = 0; i < words.length; i++){
                slova.add(words[i]);
            }
            lock.unlock();
        }
        if("GET".equals(request.getMethod()) && "/osp/myserver/count".equals(request.getRequestURI())){
            out.println(slova.size());
            slova.clear();
        }
    }
}

知道是什么原因造成的吗?如有任何反馈,我们将不胜感激。我可以根据要求发布工作Qt来源。

JB Nizet提到的字符串拆分是问题的根源。

最新更新