需要从文本文件中查找删除重复项,比较每行的第一个和第五个字符串



作为我正在进行的项目的一部分,我想清理我生成的包含重复行条目的文件。然而,这些重复通常不会出现在彼此附近。我在Java中想出了一个这样做的方法(基本上是在文件中找到重复的字符串,我将两个字符串存储在两个arrayLists中并进行迭代,但由于嵌套的for循环,我多次陷入这种情况,所以它不起作用

不过,我需要一个集成的解决方案。最好是Java。有什么想法吗?列表项

    public class duplicates {
        static BufferedReader reader = null;
        static BufferedWriter writer = null;
        static String currentLine;
        public static void main(String[] args) throws IOException {
            int count=0,linecount=0;;
            String fe = null,fie = null,pe=null;
            File file = new File("E:\Book.txt");
            ArrayList<String> list1=new ArrayList<String>();
            ArrayList<String> list2=new ArrayList<String>();
            reader = new BufferedReader(new FileReader(file));
            while((currentLine = reader.readLine()) != null)
            {
                StringTokenizer st = new StringTokenizer(currentLine,"/");  //splits data into strings
                while (st.hasMoreElements()) {
                    count++;
                    fe=(String) st.nextElement();
                    //System.out.print(fe+"/// ");
                    //System.out.println("count="+count);
                    if(count==1){                                            //stores 1st string 
                        pe=fe;
                        //  System.out.println("first element "+fe);
                    }
                    else if(count==5){
                        fie=fe;                                              //stores 5th string
                        //  System.out.println("fifth element "+fie);
                    }
                }
                count=0;
                if(linecount>0){
                    for(String s1:list1)
                    {
                        for(String s2:list2){
                            if(pe.equals(s1)&&fie.equals(s2)){                              //checking condition
                                System.out.println("duplicate found");
                                //System.out.println(s1+ "   "+s2);
                            }        
                        }
                    }
                }                     
                list1.add(pe);
                list2.add(fie);
                linecount++;
            }
        }
    }
i/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/book1/_cwc/B737/customer/Special_Reports/
/jangeer/_cwc/Crj_200/customer/plots/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/
o/p:
/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/

使用Set<String>而不是Arraylist<String>

集合中不允许重复,所以如果你只是把每一行都添加到集合中,然后再把它们取出来,你就会得到所有不同的字符串。

就性能而言,它也比嵌套的for循环更快。

public static void removeDups() {
        String[] input = new String[] { //Lets say you read whole file in this string array
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };
        ArrayList<String> outPut = new ArrayList<>(); //The array list for storing output i.e. distincts.
        Arrays.stream(input).distinct().forEach(x -> outPut.add(x)); //using java 8 and stream you get distinct from input
        outPut.forEach(System.out::println); //I will write back to the file, just for example I am printing out everything but you can write back the output to file using your own implementation.
    }

我运行此方法时的输出是

/book1/_cwc/B737/customer/Special_Reports/
/Airbook/_cwc/A330-200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/plots/
/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/
/jangeer/_cwc/Crj_200/customer/02_Watchlists/
/jangeer/_cwc/Crj_200/customer/01_Highlights/
/jangeer/_cwc/ERJ170/customer/01_Highlights/

编辑

非Java 8答案

public static void removeDups() {
        String[] input = new String[] {
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/book1/_cwc/B737/customer/Special_Reports/",
                "/jangeer/_cwc/Crj_200/customer/plots/",
                "/Airbook/_cwc/A330-200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/06_Performance_Summaries/",
                "/jangeer/_cwc/Crj_200/customer/02_Watchlists/",
                "/jangeer/_cwc/Crj_200/customer/01_Highlights/",
                "/jangeer/_cwc/ERJ170/customer/01_Highlights/"
        };
        LinkedHashSet<String> output = new LinkedHashSet<String>(Arrays.asList(input)); //output is your set of unique strings in preserved order
    }

相关内容

最新更新