在另一个文件中查找每行的计数


How can i get now of times a particular line of one file present in another file 

我有两个文件规则。 在文件规则中包含

    NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
    VGF--->V_VM_VF
    The another file full.txt contains 1000 of such type of rules. i want to calculate count of each rule in the rule.txt and I want to get output as line with count.that count is needed for the calculation of probability of each rule.rule.txt contain cfg rules of each sentence
    fc= codecs.open('full.txt', encoding='utf-8') 
    with open('rule.txt', 'r') as fh:
        for line in fh.readlines():
          if(line in fc.readlines()):
                print line
                count=count+1
    print count
    I have this code .but this is not working..plz help me.I need to calculate the probabilty of each  rule in the rule.txt by checking in full.txt.for probability calculation ,i need count of each rule individually.Can you please help me to count the no of times a rule present in full.txt

我假设您的文件不是超大,并且您的内存足够:

这是文件1:

NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF KLF NFG_JP

这是文件2:

NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
VGF--->V_VM_VF
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF
VGF--->V_VM_VF
KGF--->V_VM_VF P_NSF SSF
KGF--->V_VM_VF P_NSF SSF
VGF--->V_VM_VF KLF NFG_JP
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
VGF--->V_VM_VF
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
VGF--->V_VM_VF KLF NFG_JP
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU

这是代码:

  #!/usr/bin/python
  import pprint
  lines1 = set()
  with open('txt1', 'r') as f1:
      lines1 = set([x.strip() for x in f1.readlines()])
  line_dict = dict()
  with open('txt2', 'r') as f2:
      for line in f2.readlines():
          line = line.strip()
          line_dict.setdefault(line, 0)
          line_dict[line] = line_dict.get(line, 0) + 1
  for line in lines1:
      print '%s : %d' % (line, line_dict.get(line, 0))

输出:

VGF--->V_VM_VF : 7
VGF--->V_VM_VF KLF NFG_JP : 6
KGF--->V_VM_VF P_NSF SSF : 4
NP--->N_NNP N_NN_S_NU N_NNP N_NNP N_NN_O_NU : 8

相关内容

  • 没有找到相关文章

最新更新