我正在编写一个脚本来向供应商进行 API 调用。初始调用返回一个 JSON 列表，其中包含要从中获取数据的 URI 列表。当我连接到其中一个 URI 并检索该数据时，返回的不是 JSON，而是逗号分隔。我可以毫无问题地将其写入CSV文件。

我想做的是将其直接写入我的数据库，这就是问题所在。行以分隔，字段以逗号分隔，有时用双引号括起来，有时不用双引号括起来。使问题更加复杂的是，用双引号括起来的一些字段中有逗号。

我需要能够获取标头(我已经弄清楚了)，以便我可以将它们用于字段名称以写入数据库(供应商喜欢更改顺序并偶尔排除字段)我不能只是将数据转储到表中，因为可能有新的、丢失的或无序的字段。我已经尝试了一个数字或方法，但没有正确拆分此字符串。

下面是数据集中一行的示例："July Test", "", 'nothing to see here', "1043 E Main, Dallas, TX 40565", more random crap

我需要的是"July Test", "", "nothing to see here", "1043 E Main, Dallas, TX 40565", "more random crap"

这是我的 HTTP 调用和处理返回。也许我应该以不同的方式做？我已经注释掉了我尝试过和失败的所有内容。

获取最新文件的 URL，打开连接并导出数据

site= str(x["full_csv_url"])
hdr = {'User-Agent': 'Mozilla/5.0'}
req = Request(site,headers=hdr)
req.add_header('Authorization', token)
with urlopen(req) as x:
data = x.read().decode('utf-8')        

try:
#for i in data.split('n'):
#    list = print([i])
list_of_lines = data.splitlines(True)

new_split_data = []

for i in range(1, 2):   #nlines
ith_line = str(list_of_lines[i])
ith_line = ith_line.replace("n","")
ith_line = ith_line.replace("r","")


"""Split a python-tokenizable expression on comma operators"""
#compos = [-1] # compos stores the positions of the relevant commas in the argument string
#compos.extend(t[2][1] for t in generate_tokens(StringIO(ith_line).readline) if t[1] == ',')
#compos.append(len(ith_line))
#new_ith_line = [ ith_line[compos[i]+1:compos[i+1]] for i in xrange(len(compos)-1)]

#for i in new_ith_line:
#    print[i]
print(ith_line)
print("New Line")
print("New Line")
#new_ith_line = re.split(r', (?=(?:"[^"]*?(?: [^"]*)*))|, (?=[^",]+(?:,|$))', ith_line)
new_ith_line = list(csv.reader(ith_line, delimiter=','))
#new_ith_line = re.split(r',(?=")', ith_line)
#new_ith_line = new_ith_line.replace("'"","'")
#new_ith_line = new_ith_line.replace(""'","'")
print(new_ith_line)
##Didnt work-- split fields with commas between double quotes
##newstr = ith_line.split(",(?=(?:[^"]*"[^"]*")*[^"]*$)")

# Didnt work, only returned 1st 2 columns
#print(pp.commaSeparatedList.parseString(ith_line).asList())

# Didnt work, returned error
#newStr = [ '"{}"'.format(x) for x in list(csv.reader([ith_line], delimiter=',', quotechar='"'))[0] ]
#print(newStr)

#print(ith_line)
#each_line = data.body.getText().partition("n")[i]

我设法找到了一个正则表达式，它通过一个小的调整就适合我的情况。

此代码： new_list = re.findall(r'(？：[^，"]|"(?:\.|[^"])*")+'，列表)

给了我："七月测试"、"、"这里没什么可看的"、"1043 E Main，达拉斯，德克萨斯州 40565"、"更多随机废话">

然后，我能够创建一个列表并加载到数据库。

处理 URI 数据时出现问题(尝试直接加载到数据库)

获取最新文件的 URL，打开连接并导出数据

相关内容

最新更新

热门标签：