将内容分解为新线的文件



我有一个文件,其中包含:

( (CODE <begin_A_defense_of_Michael_Moore>))
( (NP (NP (NP (DT A) (NN defense))
      (PP (IN of)
          (NP (NP (NNP Michael) (NNP Moore))
          (CC and)
          (" ")
          (S-NOM-TTL (NP-SBJ (-NONE- *PRO*))
                 (VP (VBG Bowling)
                 (PP-PRP (IN for)
                     (NP (NNP Columbine))))))))
      (" ")
      (CODE -LRB-)
      (PRN (NP (NN Op-Ed)))
      (CODE -RRB-)
      (PP (IN By)
      (NP (NNP Eloquence)))))
( (FRAG (NP (NNP Wed))
    (NP (NML (NNP Aug))
        (JJ 13th)
        (, ,)
        (NN 2003))
    (PP-TMP (IN at)
        (NP (CD 09:00:09)
            (FW AM) (FW EST)))))
( (S (NP-SBJ (DT This))
     (VP (VBZ is)
     (NP-PRD (NP (DT an) (JJ open) (NN letter))
         (PP (IN to)
             (NP (NP (NNP David) (NNP Hardy))
             (, ,)
             (NP (NP (NN author))
                 (PP (IN of)
                 (NP (NP-TTL (S-NOM-TTL (NP-SBJ (-NONE- *PRO*))
                            (VP (VB Bowling)
                                (PP-PRP (IN for)
                                    (NP (NNP Columbine)))))
                         (: :)
                         (NP (NN Documentary) (CC or) (NN Fiction)))
                     (, ?)
                     (, ,)
                     (RRC (ADVP (RB probably))
                      (NP-PRD (NP (DT the)
                              (ADJP (RBS most) (JJ comprehensive)))
                          (PP (IN among)
                              (NP (NP (JJ many) (NNS rebuttals))
                              (PP (IN of)
                                  (NP (DT the)
                                  (ADJP (NNP Oscar) (HYPH -) (VBG winning))
                                  (NN documentary))))))))))))))
     (. .)))
( (S (NP-SBJ (NNS Critics))
     (VP (VBP have)
     (ADVP-TMP (RB now))
     (VP (VBN gone)
         (ADVP (ADVP (RB so) (RB far))
           (SBAR (IN as)
             (S (NP-SBJ (-NONE- *PRO*))
                (VP (TO to)
                (VP (VB call)
                    (PP-CLR (IN for)
                        (NP (NP (DT the) (NN revocation))
                        (PP (IN of)
                            (NP (DT the) (NN award))))))))))))
     (. .)))
( (S (NP-SBJ (PRP$ Their) (NNS chances))
     (VP (VBP are)
     (ADJP-PRD (JJ small))
     (, ,)
     (ADVP (RB however))
     (, ,)
     (SBAR-PRP (IN as)
           (S (NP-SBJ (PRP$ their) (NNS arguments))
              (VP (VP (VBP rely)
                  (PP-CLR=1 (IN on)
                    (NP (NN polemic) (, ,) (NN exaggeration) (CC and) (NN misrepresentation))))
              (: --)
              (VP (PP (IN in)
                  (NP (JJ other) (NNS words)))
                  (, ,)
                  (PP-CLR=1 (IN on)
                    (NP (NP (DT the) (JJ same) (NNS techniques))
                        (SBAR (WHNP-2 (WP which))
                          (S (NP-SBJ (PRP they))
                             (VP (VBP accuse)
                             (NP (NNP Moore))
                             (PP-CLR (IN of)
                                 (S-NOM (NP-SBJ (-NONE- *PRO*))
                                    (VP (VBG using)
                                        (NP (-NONE- *T*-2)))))))))))))))
     (. .)))

我需要单独进行每个特定的分析。我认为最好的方法是用新的空行分开此文件(有人有其他方法(。有人对如何执行此操作有任何想法吗?我正在使用PHP。此文件来自Masc语料库。

谢谢。

我实际上以以下方式完成了它:

$newfile= file("textfile.txt");
$temp_str='';
$parses=array();
foreach ($newfile as $line) {
    $temp=trim($line);
    if(strlen($temp)>0){
        $temp_str.=$temp;
    }
    else{
        array_push($parses, $temp_str);
        $temp_str='';       
    }  
}

最新更新