如何在执行 json.dumps 时只保留 ascii 并丢弃非 ascii、nbsp 等

我使用csv阅读器读取csv文件，然后使用字典将其转换为json文件。
在这样做时，我只想要没有非 ascii 字符或 nbsp 的字母和数字。我正在尝试这样做：

with open ('/file', 'rb') as file_Read:
reader = csv.reader(file_Read)
lis = []
di = {}
for r in reader:
di = {r[0].strip():[some_val]}
lis.append(di)
with open('/file1', 'wb') as file_Dumped:
list_to_be_written = json.dumps(lis)
file_Dumped.write(liss)

当我读取文件时，输出由xa0xa0xa0xa0等序列以及键组成。
Ex -{"name xa0xa0xa0xa0":[9]}
如果我这样做json.dumps(lis,ensure_ascii=False)那么我会看到键周围的空白区域。
Ex -{"name ":[9]}
如何完全删除除字母和数字以外的所有内容？

如果空格只在行尾，则可以使用.strip().如果需要在 ascii 字符之间留空格，可以使用如下内容：

my_string.replace('  ', '').strip()

要删除非 ASCII 字符，请尝试以下操作：

my_string = 'name  xa0xa0xa0xa0'
my_string.encode('ascii', 'ignore').strip()

你可以试试这个：

import pandas as pd
import json
# Read the csv file using pandas
df = pd.read_csv("YourInputCSVFile")
#Convert all column types to str in order to remove non-ascii characters
df = df.astype(str)
#Iterate between all columns in order to remove non-ascii characters
for column in df:
df[column] = df[column].apply(lambda x: ''.join([" " if ord(i) < 32 or ord(i) > 126 else i for i in x]))
#Convert the dataframe to dictionary for json conversion
df_dict = df.to_dict()
#Save the dictionary contents to a json file
with open('data.json', 'w') as fp:
json.dump(df_dict, fp)

相关内容

最新更新

热门标签：