我正在制作一个脚本,从bash中的谷歌页面获取所有链接。我得到了带有w3m
实用程序和以下脚本的谷歌页面:
#!/bin/bash
# performs a google search using a word in input
word=$1
touch .google
if [ -z $word ]
then
echo "$word missing!"
echo "Aborting..."
exit
fi
a="www.google.com/search?q="
search=$a$word
w3m -no-cookie $search > .google
sleep 1
接下来,我必须从这个页面获取所有网站。我想取所有以www.
开头、以/
结尾的字符串
echo `grep -wo "www[^/]*" .google`> .temp
这样做的问题是,我错过了很多不是以www
开头的链接,同时,当有一个网站没有以/
结尾时,我会冒着破坏一切的风险。
还有什么更好的方法可以从这个响应中获取URL?
链接提取是一个难题。但是,lynx
程序有一个方便的-dump
选项,可以让您跳过大多数(或全部)html解析。
具体来说,请注意底部的References
部分。你可以从那一行开始输出,去掉领先的子弹头编号:
$ lynx -dump 'http://www.seomoz.org/'
#[1]RSS 2.0 [2]publisher
[3]SEOmoz
* [4]Log in
* [5]Sign up
* [6]Help
+ [7]Help Resources
+ [8]Support Forums
+ [9]Request a Feature
+ [10]Contact Us
* [11]Features
* [12]Pricing & Plans
* [13]Community
+ [14]SEO Blog
+ [15]YOUmoz User Blog
+ [16]Top Users
+ [17]Events
+ [18]Recommended Companies
* [19]Resources
+ [20]Learn SEO
+ [21]SEO Tools
+ [22]PRO Q&A Forum
+ [23]Mozscape API
* [24]Blog
+ [25]SEO Blog
+ [26]YOUmoz User Blog
* [27]About
+ [28]Our TAGFEE Mission
+ [29]Meet the Mozzers
+ [30]Contact Us
+ [31]Join Our Team
+ [32]Press & Awards
+ [33]Events
Search SEOmoz
____________________ Search
SEO & Social Monitoring
Made Simple.
SEOmoz PRO combines SEO management, social media monitoring, actionable
recommendations, and so much more in one easy-to-use platform. Try it
free for 30 days.
[34]Try it for Free!
[35]Take a tour of SEOmoz PRO
or see [36]plans & pricing
* Campaign Overview
* Social Dashboard
* Crawl Diagnostics
* Dashboard
* Google Analytics
* Link Analysis
Loved By...
* Zillow
* Disney
* Overstock
* Best Buy
* Yelp
* Sun Microsystems
Roger Mozbot
Be My Buddy...
* [37]RSS
* [38]Twitter
* [39]Facebook
* [40]Google+
Effectively Manage Your SEO and Monitor Your Social Media
[41]Link Analysis
Analyze links and track key performance metrics in an efficient
all-in-one dashboard.
[42]Identify SEO Issues
Identify critical SEO issues and get actionable recommendations.
[43]Monitor Changes
Automatically monitor changes to your rankings and take control of your
organic traffic.
Avinash Kaushik
"SEOmoz tools provide best of class data. Their tools are a
must-have for marketers looking to optimize their organic search
results."
Avinash Kaushik,
Author, Web Analytics 1.0: An Hour A Day
Patrick Altoft
"SEOmoz has enabled us to scale our link-building process quickly
without compromising on quality."
Patrick Altoft,
CEO, Branded3
Latest from the SEOmoz Blog
__________________________________________________________________
[44]jennita
[45]Winners of #MozCation 2012
Posted by [46]jennita on 08/04/2012
Whoa. Ever have one of those times where your expectations are
completely blown out of the water? Well that's what happened during
this year's nomination for a MozCation. Wait, wait, wait, before I get
too far ahead of myself, I...
[47]Read Full Entry
13
2
[48]13 Comments
__________________________________________________________________
Latest from the Community YouMoz Blog
__________________________________________________________________
[49]larry.kim
[50]Does SEO Even Work for Small Businesses?
Posted by [51]larry.kim on 08/03/2012
Clicks on paid search listings beat out organic listings by nearly a
2:1 margin for keywords with high commercial intent in the US. Is SEO
still a viable marketing tactic for the average small business owner?
[52]Read Full Entry
17
3
[53]28 Comments
__________________________________________________________________
Voted Best SEO Tool 2010!
[54]Try it for Free!
Looking for SEO consulting?
SEOmoz doesn't provide consulting, but our friends at [55]Distilled
still do. Rock on!
Copyright ? 1996-2012 SEOmoz. All Rights Reserved.
Product and Tools
* [56]SEOmoz PRO
* [57]Pricing and Plans
* [58]Open Site Explorer
* [59]SEO Toolbar
* [60]Mozscape API
* [61]More SEO Tools
Company
* [62]About
* [63]SEO Blog
* [64]YOUmoz Blog
* [65]Affiliate Program
* [66]Terms & Privacy Policy
* [67]PRO Perks
Popular Content
* [68]Link Building
* [69]Reputation Management
* [70]Analytics
* [71]Social Media
* [72]Content & Blogging
* [73]See All Categories
Stay in Touch
*
+ [74]RSS
+ [75]Twitter
+ [76]Facebook
+ [77]LinkedIn
*
SEOmoz
119 Pine St. Suite 400
Seattle, WA 98101
206.632.3171
* [78]Contact Us
* [79]Sitemap
References
1. http://feeds.feedburner.com/seomoz
2. https://plus.google.com/112544075040456048636
3. http://www.seomoz.org/
4. https://www.seomoz.org/users/login
5. https://www.seomoz.org/users/register
6. http://www.seomoz.org/
7. http://www.seomoz.org/help
8. http://www.seomoz.org/q
9. http://seomoz.zendesk.com/forums/293194-seomoz-PRO-feature-requests
10. http://www.seomoz.org/about/contact
11. http://www.seomoz.org/features
12. http://www.seomoz.org/plans
13. http://www.seomoz.org/community
14. http://www.seomoz.org/blog
15. http://www.seomoz.org/ugc
16. http://www.seomoz.org/users
17. http://www.seomoz.org/about/events
18. http://www.seomoz.org/article/recommended
19. http://www.seomoz.org/resources
20. http://www.seomoz.org/learn-seo
21. http://www.seomoz.org/tools
22. http://www.seomoz.org/q
23. http://www.seomoz.org/api
24. http://www.seomoz.org/blog
25. http://www.seomoz.org/blog
26. http://www.seomoz.org/ugc
27. http://www.seomoz.org/about
28. http://www.seomoz.org/about/mission
29. http://www.seomoz.org/about/team
30. http://www.seomoz.org/about/contact
31. http://www.seomoz.org/about/jobs
32. http://www.seomoz.org/about/press
33. http://www.seomoz.org/about/seo-events
34. http://www.seomoz.org/cart/freetrial?pg=home
35. http://www.seomoz.org/features
36. http://www.seomoz.org/plans
37. http://feeds.feedburner.com/seomoz
38. http://twitter.com/seomoz
39. http://www.facebook.com/SEOmoz
40. https://plus.google.com/112544075040456048636?prsrc=3
41. http://www.seomoz.org/features
42. http://www.seomoz.org/features
43. http://www.seomoz.org/features
44. http://www.seomoz.org/users/profile/81197
45. http://www.seomoz.org/blog/winners-mozcation-2012
46. http://www.seomoz.org/users/profile/81197
47. http://www.seomoz.org/blog/winners-mozcation-2012
48. http://www.seomoz.org/blog/winners-mozcation-2012#comments
49. http://www.seomoz.org/users/profile/402613
50. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses
51. http://www.seomoz.org/users/profile/402613
52. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses
53. http://www.seomoz.org/ugc/does-seo-even-work-for-small-businesses#comments
54. http://www.seomoz.org/cart/freetrial?pg=features
55. http://www.seomoz.org/dp/distilled
56. http://www.seomoz.org/features
57. http://www.seomoz.org/plans
58. http://www.opensiteexplorer.org/
59. http://www.seomoz.org/seo-toolbar
60. http://www.seomoz.org/api
61. http://www.seomoz.org/tools
62. http://www.seomoz.org/about
63. http://www.seomoz.org/blog
64. http://www.seomoz.org/ugc
65. http://www.seomoz.org/dp/seomoz-pro-affiliate-program
66. http://www.seomoz.org/terms-and-privacy
67. http://www.seomoz.org/pro-perks
68. http://www.seomoz.org/blog/category/4
69. http://www.seomoz.org/blog/category/19
70. http://www.seomoz.org/blog/category/8
71. http://www.seomoz.org/blog/category/18
72. http://www.seomoz.org/blog/category/1
73. http://www.seomoz.org/blog
74. http://feeds.feedburner.com/seomoz
75. http://twitter.com/seomoz
76. http://www.facebook.com/SEOmoz
77. http://www.linkedin.com/groups?about=&gid=2976409&trk=anet_ug_grppro
78. http://www.seomoz.org/about/contact
79. http://www.seomoz.org/sitemap
您可能需要对<a href="
进行grep,并将该值带到下一个引号。然后过滤掉所有javascript内容。尽管这个解决方案可能也不是傻瓜式的。