如何使用"sed" shell 命令来混淆由特定字符指示的信息



我正试图编写一个shell命令"sed"或"grep"来混淆信息,然后用一个"*"来"Scraped from"。

例如,示例文件具有:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/>

输出应具有:

2016-12-09 18:57:32 [scrapy.core.engine] INFO: Spider opened
2016-12-09 18:57:32 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2016-12-09 18:57:32 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2016-12-09 18:57:32 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/> (referer: None)
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The world as we have created it is a process of our thinking. It cannot be changed without changing our thinking.”', 'tags': ['change', 'deep-thoughts', 'thinking', 'world'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“It is our choices, Harry, that show what we truly are, far more than our abilities.”', 'tags': ['abilities', 'choices'], 'author': 'J.K. Rowling'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“There are only two ways to live your life. One is as though nothing is a miracle. The other is as though everything is a miracle.”', 'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“The person, be it gentleman or lady, who has not pleasure in a good novel, must be intolerably stupid.”', 'tags': ['aliteracy', 'books', 'classic', 'humor'], 'author': 'Jane Austen'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': "“Imperfection is beauty, madness is genius and it's better to be absolutely ridiculous than absolutely boring.”", 'tags': ['be-yourself', 'inspirational'], 'author': 'Marilyn Monroe'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *
{'text': '“Try not to become a man of success. Rather become a man of value.”', 'tags': ['adulthood', 'success', 'value'], 'author': 'Albert Einstein'}
2016-12-09 18:57:32 [scrapy.core.scraper] DEBUG: Scraped from *

我知道你可以使用sed的s/bla/bla/g'来进行替换,但在我的情况下,我需要替换后面跟着某个字符的信息。我不知道该怎么做。

以模糊信息,后面跟着一个"*"的"Scraped from"。

所以只需将"尖叫"后面的所有内容都替换为一个*:

sed 's/Scraped from .*/Scraped from */'

这里有一个解决方案,可以在关键字from之后保留标点符号(或不保留标点符号(。同样假设您只希望在关键字Scraped from之后而不是"任何"from之后进行此更改。

sed -E 's/(Scraped from[:=]?).*/1 */g' sample_file

处理同一行中的两个from子句有点复杂。这里有一种方法。

示例文件(简化(:

cat sample_file
2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from <200 http://quotes.toscrape.com/>
2016-12-09 [scrapy] DEBUG: Scraped from: <200 http://first/> and from: me.org
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from <3 http://toscrape.com/>

解决方案和输出:

sed -E 's/(Scraped from[:=]?) .*and from/1 * and from/;
s/(Scraped( from[:=]? * and)? from[:=]?).*$/1 */' sample_file
2016-12-09 [scrapy.core.engine] INFO: Spider opened
2016-12-09 [scrapy.logstats] INFO: Scraped 0 items (at 0 items/min)
2016-12-09 [scrapy.ext.telnet] DEBUG: Telnet listening on 127.0.0.1:6023
2016-12-09 [scrapy] DEBUG: Crawled (200) <GET http://quotes.com/> (ref: None)
2016-12-09 [scrapy] DEBUG: Scraped from= *
2016-12-09 [scrapy] DEBUG: Scraped from *
2016-12-09 [scrapy] DEBUG: Scraped from: * and from: *
2016-12-09 [scrapy.core.scraper] DEBUG: Scraped from *

最新更新