Java TXT解析器新线路问题



早上好。使用拆分方法在解析器上遇到麻烦。目标是在TXT文件中阅读,提取应语句,然后用这些应该语句编写新的TXT文件。当文本在一条连续的线上时,我可以工作。如果我在TXT文件中有新行,请仅使用最后一行重写文件。可能是我的循环的结构?还有从打开的目录中保存新文件的任何建议吗?谢谢你

import java.io.BufferedWriter;
import java.io.File;
import java.io.FileWriter;
import java.util.ArrayList;
import java.util.List;
import java.util.Scanner;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;
/*This Program Will launch a File Explorer.
User will then chose a .txt file to be parsed.
A new file will be created labeled "Parsed_(Document Name)".*/
public class Parser {
    @SuppressWarnings("resource")
    public static void main(String[] args) {
        JFileChooser chooser = new JFileChooser();
        Scanner userFile = new Scanner(System.in);
        int returnVal = chooser.showOpenDialog(null);
        if (returnVal == JFileChooser.APPROVE_OPTION) {
            try {
                System.out.println("You chose to open this file: " + chooser.getSelectedFile().getName() + "n");
                File file = new File(chooser.getSelectedFile().getName());
                String newFile = ("Parsed_" + file);
                userFile = new Scanner(file);
                while (userFile.hasNextLine()) {
                    String document = userFile.nextLine();
                    // Line breaks used by Parser
                    String[] sentences = document.split("\.|\?|\!|\r");
                    List<String> ShouldArray = new ArrayList<String>();
                    for (String shouldStatements : sentences) {
                        if (shouldStatements.contains("Should") || shouldStatements.contains("should"))
                            ShouldArray.add(shouldStatements);
                    }
                    FileWriter writer = new FileWriter(newFile);
                    BufferedWriter bw = new BufferedWriter(writer);
                    for (String shallStatements : ShouldArray) {
                        System.out.println(shallStatements);
                        bw.append(shallStatements);
                        bw.newLine();
                    }
                    System.out.println("nParsed Document Created: " + newFile);
                    JOptionPane.showMessageDialog(null, "Parsed Document Created: " + newFile);
                    bw.close();
                    writer.close();
                }
                userFile.close();
            } catch (Exception ex) {
                ex.printStackTrace();
            }
        }
    }
}

测试文件1(工作!)

大家好。这是一个包装列表。你应该有牙刷。您应该有电话充电器。而且您绝对应该有钱包!

测试文件1输出:

你应该有牙刷 您应该有电话充电器 而且您绝对应该有钱包

测试文件2(仅打印最后一行)

大家好。这是一个包装列表。你应该有牙刷。您应该有电话充电器。这是一些随机文本,显示解析器将不包括此。
您绝对应该有钱包!

测试文件2输出:

您绝对应该有钱包

您需要在循环外创建结果数组

 /** Placed here**/
 List<String> ShouldArray = new ArrayList<String>();
 while (userFile.hasNextLine()) {
                String document = userFile.nextLine();
                // Line breaks used by Parser
                String[] sentences = document.split("\.|\?|\!|\r");
                /** REMOVED HERE **/
                for (String shouldStatements : sentences) {
                    if (shouldStatements.contains("Should") || shouldStatements.contains("should"))
                        ShouldArray.add(shouldStatements);
                }
               ......

否则,您只会收集最后一个循环的结果。

基本上您的代码在做什么:

cut up file in lines
take each line
    take next line
     make a result board.
     write results on board
    take next line
     erase board
     write results on board
    take next line
     erase board
     write results on board

最后,板上只有有限的结果集

您在循环中覆盖了您的arraylist,但是您实际上不需要它

File file = chooser.getSelectedFile();
System.out.println("You chose to open this file: " + file.getName() + "n");
String newFile = "Parsed_" + file.getName();
// open all closable objects using try-with-resources 
try (Scanner userFile = new Scanner(file); 
            BufferedWriter bw = new BufferedWriter(new FileWriter(newFile))) {
   while (userFile.hasNextLine()) {
      String document = userFile.nextLine();
     // Line breaks used by Parser
     String[] sentences = document.split("\.|\?|\!|\r");
    for (String s : sentences) {
        if (s.contains("Should") || s.contains("should")) {
            System.out.println(s);
            bw.append(s);
            bw.newLine();
      }
   } 
   System.out.println("nParsed Document Created: " + newFile);
  JOptionPane.showMessageDialog(null, "Parsed Document Created: " + newFile);
   // bw.close(); // not needed anymore 

我已经重构了代码,删除了不需要的"应该阵列"。

pseudocode

While there are lines to read in the In file
    Read each line
    Split each line into Array of sentences
    Loop through each sentence
        If each sentence contains Should or should Then
          Write sentence to Out file
        End If
    End Loop
End While
Close Out file
Close In file

以下代码可用于:

多行:

Hello all. Here is a a packing list.
You Should have a toothbrush. You Should have a Phone charger.
Here is some random text to show the parser will not include this.
You definitely should have your wallet!

单行:

Hello all. Here is a a packing list. You Should have a toothbrush. You should have a Phone charger. And you definitely should have your wallet!

import java.util.Scanner;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.io.File;
public class ShouldStringsParser {
    public ShouldStringsParser(String inFile, String outFile) throws IOException {
        File file = new File(inFile);
        FileWriter writer = new FileWriter(outFile);
        BufferedWriter bw = new BufferedWriter(writer);
        Scanner userFile;
        userFile = new Scanner(file);
        String[] sentences;
        while (userFile.hasNextLine()) {
            String line = userFile.nextLine();
            System.out.println(line);
            sentences = line.split("\.|\?|\!|\r");
            for (String shouldStatements : sentences) {
                if (shouldStatements.contains("Should") || shouldStatements.contains("should")) {
                    System.out.println(">>>" + shouldStatements);
                    bw.append(shouldStatements);
                    bw.newLine();
                }
            }
        }
        bw.close();
        writer.close();
        userFile.close();
    }
    public static void main(String[] args) {
        try {
            new ShouldStringsParser("inDataMultiLine.txt", "outDataMultiLine.txt");
            new ShouldStringsParser("inDataSingleLine.txt", "outDataSingleLine.txt");
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

最新更新