Ruby 正则表达式用于匹配字符模式和换行符之间的子字符串



我有这样格式化的数据,作为单个字符串:

"1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500"

在 Ruby 应用程序中,我想创建两个数组:一个是每个编号列表标记和破折号之间的子字符串,另一个是包含破折号和换行符之间的数字(作为整数)的子字符串,如下所示:

labels = ["Enloe Medical Center","CSU Chico","Walmart Distribution Center","Pacific Coast Producers (Agribusiness)","Marysville School District","Feather River Hospital","Sunsweet Growers (Agriculture)","YRC (Freight Services)","Sierra Pacific Industries (Lumber Products)","Colusa Casino Resort"]
numbers = [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500]

我对我的正则表达式不是很好;我知道如何进行替换和匹配,但我不确定从哪里开始。谁能帮忙?

labels, numbers = string.scan(/^s*d+.s+(.+)s+-s+([d,]+)s*$/).transpose
numbers.map!{|s| s.gsub(",", "").to_i}

有一件事很容易:

/

pat/m - 将换行符视为与 匹配的字符。

另一件事是分组(第 2 部分中的示例)。

你为 1 行编写正则表达式,它适合整个字符串:

r1 = /d+,d+s*$/m
str.scan r1
["2,000 ", "1,805 ", "1,350 ", "1,200 ", "1,000 "]

$匹配行
d编号
+>一次或多次
s空间(0次以上)
既然你知道如何替换,我还没有把它改成数字

r2 = /d+.s*([ws]+)s*-/m
 str.scan(r2).flatten

d+ - 匹配数字 1 或更多次
. - 匹配. - 您必须转义它,因为.匹配任何字符
s* - 空格 0 或更多
[ws]+ - 任何单词字符或空格,1 次或更多次
() - 你正在分组,很容易说我希望这个被这个包围,更多在这里:正则表达式红宝石 - 捕获

s = "1. Enloe Medical Center - 2,000 
 2. CSU Chico - 1,805 
 3. Walmart Distribution Center - 1,350 
 4. Pacific Coast Producers (Agribusiness) - 1,200 
 5. Marysville School District - 1,000 
 6. Feather River Hospital - 865 
 7. Sunsweet Growers (Agriculture) - 600 
 8. YRC (Freight Services) - 500 
 9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500"
arr1 = s.each_line.map { | x | 
  x.match(/- (.*)/)[ 1 ].gsub(/[^0-9]*/,'')
}
arr2 = s.each_line.map { | x | 
  x.match(/d. (.*) - (.*)/)[ 1 ]
}
puts arr1
puts arr2
>
str = %{1. Enloe Medical Center - 2,000
2. CSU Chico - 1,805
3. Walmart Distribution Center - 1,350
4. Pacific Coast Producers (Agribusiness) - 1,200
5. Marysville School District - 1,000
6. Feather River Hospital - 865
7. Sunsweet Growers (Agriculture) - 600
8. YRC (Freight Services) - 500
9. Sierra Pacific Industries (Lumber Products) - 500
10. Colusa Casino Resort - 500}
numbers = str.scan(/- (d.*)$/).flatten.map{|s| s.gsub(",", "").to_i} # => [2000, 1805, 1350, 1200, 1000, 865, 600, 500, 500, 500] # !> assigned but unused variable - numbers
labels = str.scan(/d+.s(.*)s-/).flatten # => ["Enloe Medical Center", "CSU Chico", "Walmart Distribution Center", "Pacific Coast Producers (Agribusiness)", "Marysville School District", "Feather River Hospital", "Sunsweet Growers (Agriculture)", "YRC (Freight Services)", "Sierra Pacific Industries (Lumber Products)", "Colusa Casino Resort"] # !> assigned but unused variable - labels

你可以这样做:

rawlines = <<EOF
1. Enloe Medical Center - 2,000 
2. CSU Chico - 1,805 
3. Walmart Distribution Center - 1,350 
4. Pacific Coast Producers (Agribusiness) - 1,200 
5. Marysville School District - 1,000 
6. Feather River Hospital - 865 
7. Sunsweet Growers (Agriculture) - 600 
8. YRC (Freight Services) - 500 
9. Sierra Pacific Industries (Lumber Products) - 500 
10. Colusa Casino Resort - 500
EOF
labels = []
numbers = []
rawlines.scan(/^[0-9]+. ([^-]+) - ([1-9][0-9]{0,2}(?>,[0-9]{3})*)/) do |label, number|
  labels << label
  numbers << number.gsub(",", "")
end
puts labels
puts numbers

请注意,模式([1-9][0-9]{0,2}(?>,[0-9]{3})*)的这一部分可以替换为([0-9,]+)

最新更新