使用正则表达式从SQL中提取国家数据



我得到了插入国家的SQL,我只是想保存一些列而不是SQL查询中的所有列。

我已经使用正则表达式清理了数据,但只面临一个小问题。如果值在同一行开始,正则表达式不匹配,我使用regex101.com来替换

在这里试试。


我的Regex

((([0-9]),(['a-zA-Z ]*),(['a-zA-Z ]*)).*

示例数据

(1,'Afghanistan','AFG','004','AF','93','Kabul','AFN','Afghan afghani','؋','.af','افغانستان','Asia','Southern Asia','[{"zoneName":"Asia/Kabul","gmtOffset":16200,"gmtOffsetName":"UTC+04:30","abbreviation":"AFT","tzName":"Afghanistan Time"}]','{"kr":"아프가니스탄","br":"Afeganistão","pt":"Afeganistão","nl":"Afghanistan","hr":"Afganistan","fa":"افغانستان","de":"Afghanistan","es":"Afganistán","fr":"Afghanistan","ja":"アフガニスタン","it":"Afghanistan","cn":"阿富汗","tr":"Afganistan"}',33.00000000,65.00000000,'🇦🇫','U+1F1E6 U+1F1EB','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q889'),
(2,'Aland Islands','ALA','248','AX','+358-18','Mariehamn','EUR','Euro','€','.ax','Åland','Europe','Northern Europe','[{"zoneName":"Europe/Mariehamn","gmtOffset":7200,"gmtOffsetName":"UTC+02:00","abbreviation":"EET","tzName":"Eastern European Time"}]','{"kr":"올란드 제도","br":"Ilhas de Aland","pt":"Ilhas de Aland","nl":"Ålandeilanden","hr":"Ålandski otoci","fa":"جزایر الند","de":"Åland","es":"Alandia","fr":"Åland","ja":"オーランド諸島","it":"Isole Aland","cn":"奥兰群岛","tr":"Åland Adalari"}',60.11666700,19.90000000,'🇦🇽','U+1F1E6 U+1F1FD','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),
(3,'Albania','ALB','008','AL','355','Tirana','ALL','Albanian lek','Lek','.al','Shqipëria','Europe','Southern Europe','[{"zoneName":"Europe/Tirane","gmtOffset":3600,"gmtOffsetName":"UTC+01:00","abbreviation":"CET","tzName":"Central European Time"}]','{"kr":"알바니아","br":"Albânia","pt":"Albânia","nl":"Albanië","hr":"Albanija","fa":"آلبانی","de":"Albanien","es":"Albania","fr":"Albanie","ja":"アルバニア","it":"Albania","cn":"阿尔巴尼亚","tr":"Arnavutluk"}',41.00000000,20.00000000,'🇦🇱','U+1F1E6 U+1F1F1','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q222'),(4,'Algeria','DZA','012','DZ','213','Algiers','DZD','Algerian dinar','دج','.dz','الجزائر','Africa','Northern Africa','[{"zoneName":"Africa/Algiers","gmtOffset":3600,"gmtOffsetName":"UTC+01:00","abbreviation":"CET","tzName":"Central European Time"}]','{"kr":"알제리","br":"Argélia","pt":"Argélia","nl":"Algerije","hr":"Alžir","fa":"الجزایر","de":"Algerien","es":"Argelia","fr":"Algérie","ja":"アルジェリア","it":"Algeria","cn":"阿尔及利亚","tr":"Cezayir"}',28.00000000,3.00000000,'🇩🇿','U+1F1E9 U+1F1FF','2018-07-21 07:11:03','2022-05-21 21:06:00',1,'Q262'),(5,'American Samoa','ASM','016','AS','+1-684','Pago Pago','USD','US Dollar','$','.as','American Samoa','Oceania','Polynesia','[{"zoneName":"Pacific/Pago_Pago","gmtOffset":-39600,"gmtOffsetName":"UTC-11:00","abbreviation":"SST","tzName":"Samoa Standard Time"}]','{"kr":"아메리칸사모아","br":"Samoa Americana","pt":"Samoa Americana","nl":"Amerikaans Samoa","hr":"Američka Samoa","fa":"ساموآی آمریکا","de":"Amerikanisch-Samoa","es":"Samoa Americana","fr":"Samoa américaines","ja":"アメリカ領サモア","it":"Samoa Americane","cn":"美属萨摩亚","tr":"Amerikan Samoasi"}',-14.33333333,-170.00000000,'🇦🇸','U+1F1E6 U+1F1F8','2018-07-21 07:11:03','2022-05-21 21:06:00',1,NULL),
<<p>

替换数据/strong>$1, 1, NOW(), NOW()),

预期输出

('Afghanistan','AFG', 1, NOW(), NOW()),
('Aland Islands','ALA', 1, NOW(), NOW()),
('Albania','ALB', 1, NOW(), NOW()),

我在您的regex尝试中注意到的模式分为匹配两组:

  • 第1组(已替换),包含圆括号、第一个数字和逗号
  • 第2组(保留),包含逗号、引号和另一组数字之前的任何字符,
  • 第3组(已替换),包含字符串的其余部分,除最后一行以);
  • 结尾外,每一行以),结尾

此描述转换为正则表达式模式如下:

((d+,)(.*?(?=,'d))(.*?(?:)([,;])))

组1Regex解释(d+,):

  • (: an左括号
  • d+:任意数字组合
  • ,:逗号

组2Regex解释(.*?(?=,'d)):

  • .*?:任意字符的组合(lazy - least possible)
  • (?=,'d):逗号前,后跟引号和数字(不包含在匹配中)

集团3Regex解释(.*?(?:)([,;]))):

  • .*?:任意字符的组合(lazy - least possible)
  • (?:)([,;])):在括号前,后面跟着逗号或冒号

($2, 1, NOW(), NOW())$4n

替换你的正则表达式匹配就足够了点击这里查看演示。

最新更新