我在我的项目中使用的是cedict开源汉英词典。我把它放在一个ActiveRecord建模的postgresql数据库与其他模型没有关系。
当我运行DictionaryEntry.count(:id, :group => :simplified, :having => 'count(id) > 1')
时,它显示了2141个重复条目。这些条目中的简化字符是相同的,但定义不同。
放之四海而皆准
有两个条目,但它有两个不同的定义:
irb(main):009:0> DictionaryEntry.find_all_by_simplified("放之四海而皆准")
DictionaryEntry Load (193.6ms) SELECT "dictionary_entries".* FROM "dictionary_entries" WHERE "dictionary_entries"."simplified" = '放之四海而皆准'
[
[0] #<DictionaryEntry:0x007feb8750f9e0> {
:id => 42164,
:traditional => "放之四海而皆准",
:simplified => "放之四海而皆准",
:pinyin => "fang4 zhi1 si4 hai3 er2 jie1 zhun3",
:definition => "appropriate to any place and any time -idiom; universally applicable/a panacea",
:created_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00,
:updated_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00
},
[1] #<DictionaryEntry:0x007feb8750f378> {
:id => 42165,
:traditional => "放之四海而皆準",
:simplified => "放之四海而皆准",
:pinyin => "fang4 zhi1 si4 hai3 er2 jie1 zhun3",
:definition => "applicable anywhere -idiom",
:created_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00,
:updated_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00
}
]
我想合并这两个条目,以便当我运行DictionaryEntry.find_all_by_simplified("放之四海而皆准")
时,它将返回对象,并将删除对象的定义添加到/
之后的末尾,如下所示:
[
[0] #<DictionaryEntry:0x007feb8750f9e0> {
:id => 42164,
:traditional => "放之四海而皆准",
:simplified => "放之四海而皆准",
:pinyin => "fang4 zhi1 si4 hai3 er2 jie1 zhun3",
:definition => "appropriate to any place and any time -idiom; universally applicable/a panacea/applicable anywhere -idiom",
:created_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00,
:updated_at => Sat, 22 Dec 2012 03:07:44 UTC +00:00
}
]
(我可能也想合并拼音,如果它碰巧是不同的,还不确定…嗯…是的,我可能需要这样做)
在rails控制台中输入以下命令似乎对我有效:
dups = DictionaryEntry.count(:id, :group => :simplified, :having => 'count(id) > 1')
dups.each do |d|
twins = DictionaryEntry.find_all_by_simplified(d[0])
if twins[0].pinyin == twins[1].pinyin
twins[1].definition = twins[1].definition + "/" + twins[0].definition
twins[1].save
twins[0].destroy
end
end
我不得不运行几次