我正在学习如何解析数据并尝试创建我可以稍后使用的模板只需更改所需代码的循环、函数和方法的参数。
所以我从twitter api中搜索与哈希标签相关的推文,得到了一个嵌套字典列表。然后,我将抓取的数据保存到一个txt文件中,并尝试清理文本并将其转换为一个表或行。我的问题时,试图创建一个表是定位头,因为第一行TXT文件的所有需要的头,但有一个值旁边的每个头和一些值是字典的关键值对里面。大多数教程都有示例文件,其中第一行是干净的标题头,中间没有任何内容。但这更复杂,我想,如果我学会了怎么做,我会很高兴的
这是数据抱歉,如果它很乱。我在记事本中通过开始每一行新的域来清理它(不知道如何在python中做到这一点,将是一个加知道)。因此,它以方括号开始,表示它是一个列表,然后在列表中有2个键值对,这些对的值都是包含3-4个kv对的字典。
我所需要做的就是将第一行的所有键转换为头,因为在txt文件中所有行的键都是相同的,然后从头和值创建一个表。
[{'domain': {'id': '46', 'name': 'Business Taxonomy', 'description': 'Categories within Brand Verticals that narrow down the scope of Brands'}, 'entity': {'id': '1557696848252391426', 'name': 'Financial Services Business', 'description': 'Brands, companies, advertisers and every non-person handle with the profit intent related to Banks, Credit cards, Insurance, Investments, Stocks '}},
{'domain': {'id': '46', 'name': 'Business Taxonomy', 'description': 'Categories within Brand Verticals that narrow down the scope of Brands'}, 'entity': {'id': '1557697333571112960', 'name': 'Technology Business', 'description': 'Brands, companies, advertisers and every non-person handle with the profit intent related to softwares, apps, communication equipments, hardwares'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '47', 'name': 'Brand', 'description': 'Brands and Companies'}, 'entity': {'id': '1372588659346612225', 'name': 'Binance'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '857879456773357569', 'name': 'Technology', 'description': 'Technology'}},
{'domain': {'id': '66', 'name': 'Interests and Hobbies Category', 'description': 'A grouping of interests and hobbies entities, like Novelty Food or Destinations'}, 'entity': {'id': '913142676819648512', 'name': 'Cryptocurrencies', 'description': 'Cryptocurrency'}},
{'domain': {'id': '30', 'name': 'Entities [Entity Service]', 'description': 'Entity Service top level domain, every item that is in Entity Service should be in this domain'}, 'entity': {'id': '1001503516555337728', 'name': 'Blockchain', 'description': 'Blockchain'}},
{'domain': {'id': '66', 'name': 'Interests and Hobbies Category', 'description': 'A grouping of interests and hobbies entities, like Novelty Food or Destinations'}, 'entity': {'id': '1369311988040355840', 'name': 'NFTs', 'description': 'Non-fungible tokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '781974596148793345', 'name': 'Business & finance'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '781974596794716162', 'name': 'Financial services'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '847894353708068864', 'name': 'Investing', 'description': 'Investing'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '848920371311001600', 'name': 'Technology', 'description': 'Technology and computing'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '913142676819648512', 'name': 'Cryptocurrencies', 'description': 'Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1369311988040355840', 'name': 'NFTs', 'description': 'Non-fungible tokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1390680741206368263', 'name': 'Cryptocurrency exchanges'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1478776259068907541', 'name': 'Cryptotokens'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1484181943616884743', 'name': 'Cryptocoins'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1486271512655003652', 'name': 'Web3'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1491481998862348291', 'name': 'Digital asset industry'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1492162686204854274', 'name': 'Digital assets & cryptocurrency', 'description': 'Cryptocurrency'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1521397643909365760', 'name': 'NFT development'}},
{'domain': {'id': '131', 'name': 'Unified Twitter Taxonomy', 'description': 'A taxonomy of user interests. '}, 'entity': {'id': '1536439027636678656', 'name': 'Decentralized finance'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1007360414114435072', 'name': 'Bitcoin cryptocurrency', 'description': 'Bitcoin Cryptocurrency'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1007361429752594432', 'name': 'Ethereum cryptocurrency', 'description': 'Ethereum Cryptocurrency'}},
{'domain': {'id': '174', 'name': 'Digital Assets & Crypto', 'description': 'For cryptocurrency entities'}, 'entity': {'id': '1478776259068907541', 'name': 'Cryptotokens'}}]
我试过这个代码。但是头文件不能这样定位。
import json
import re
import os
from tabulate import tabulate
file = open('binance_hash_tweets_micro.txt', 'r+')
read = file.readlines()
file.close()
modified = [] #this modified variable is a empty list that can be parsed into using loops that call modified
for row in read:
modified.append(row)
print(modified)
header = modified.pop(0)
def fixed_length(text,length):
if len(text) > length:
text = text[:length]
elif len(text) < length:
text = (text + " " * length) [:length]
return text
for column in header:
print(fixed_length(column,20), end = " ")
print()
如果有人能帮忙。我很感激。:)
您不需要自己解析,使用ast.literal_eval()
来解析
import ast
with open('binance_hash_tweets_micro.txt', 'r') as f:
binance_list = ast.literal_eval(f.read())
first = binance_list[0]
header = ['domain_' + key for key in first['domain']] + ['entity_' + key for key in first['entity']]
print(header)
打印
['domain_id', 'domain_name', 'domain_description', 'entity_id', 'entity_name', 'entity_description']