如何在url awk/sed/python中的模式之间提取文本



我想从下面的URL中提取插件名称和主题名称

http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2

我试过awk和sed。无法获得所需的结果。

sed

使用这个sed命令:

 sed  's/.*(plugin|theme)s/([^/]*)/.*/2/'

它查找第一个出现的pluginsthemes,然后是斜线(/(。接下来,它使用一系列非斜杠([^/]*(,后跟一个斜杠。该序列被放入组()中,并在替换2处被重新插入。

示例用法:

$ cat file 
http://example.com/wp-content/plugins/contact-form-7/includes/css/styles.css?ver=4.2.1
http://example.com/wp-content/plugins/recent-tweets-widget/tp_twitter_plugin.css?ver=1.0
http://example.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.6.0&ver=4.2.2
http://example.com/wp-content/plugins/js_composer/assets/css/vc-ie8.css
http://example.com/wp-content/themes/themeforest-9412083-specular-responsive-multipurpose-business-theme/specular/style.css?ver=4.2.2
new2, 2.2.2.2, myweb2.com
$ sed  's/.*(plugin|theme)s/([^/]*)/.*/2/' file
contact-form-7
recent-tweets-widget
revslider
js_composer
themeforest-9412083-specular-responsive-multipurpose-business-theme

awk

使用awk实际上更容易,只需将字段分隔符设置为斜线并打印第六个字段即可。

awk -F '/' '{ print $6 }' file

这将产生与上述sed命令相同的结果。

非常简单的python方法

with open('urls.txt') as f:
    for url in f:
        print url.split('/')[5]

最新更新