使用Python对CSV文件进行迭代并对每行进行API调用的有效方法



我创建了一个脚本,可以读取csv文件并在每行上触发api调用。它是有效的,但我担心的是,如果文件超过1M行,我是否会遇到内存问题。

import json
import requests
import csv
import time
"""
PURPOSE:
This is a script designed to: 
1. read through a CSV file 
2. loop through each row of the CSV file
3. build and trigger an API Request to the registerDeviceToken endpoint
using the contents of each row of the CSV File
INSTRUCTIONS:
1. Create a CSV file with columns in the following order (left to right):
1. email
2. applicationName (i.e. your bundle ID or package name)
3. platform (i.e. APNS, APNS_SANDBOX, GCM)
4. token (i.e. device token)
2. Save CSV file and make note of the full 'filepath'
3. Define the required constant variables below
4. Run python script
Note: If your CSV files does not contain column headings, then set 
contains_headers to 'False'        
"""

# Define constant variables
api_endpoint = '<Insert API Endpoint>'
# Update per user specifications
file_location = '/Users/bob/Development/Python/token.csv' # Add location of CSV File
api_key = '<insert API Key: Type server-side' # Add your API Key
contains_headers = True # Set to True is file contains column headers

def main():
# Open and read CSV File
with open(r'%s' % (file_location)) as x:
reader = csv.reader(x)
if contains_headers == True:
next(reader)  # Skip the first row if file contains column headers
counter = 0 # This counter is used to monitor Rate Limit

# Loop through each 
for row in reader:

jsonBody = {}
device = {}
# Create JSON body for API Request
device['applicationName'] = row[1]
device['platform'] = row[2]
device['token'] = row[3]
device['dataFields'] = {'endpointEnabled': True}
jsonBody['email'] = row[0]
jsonBody['device'] = device

# Create API Request
destinationHeaders = {
'api_key': api_key,
'Content-Type': 'application/json'
}
r = requests.post(api_endpoint, headers=destinationHeaders, json=jsonBody)
print(r)
data = json.loads(r.text)

# Print Successes/Errors to console 
msg = 'user %s token %s' % (row[0],row[3])
if r.status_code == 200:
try:
msg = 'Success - %s. %s' % (msg, data['msg'])
except Exception:
continue
else:
msg = 'Failure - %s. Code: %s, Details: %s' % (msg, r.status_code, data['msg'])
print(msg)
# Add delay to avoid rate limit
counter = counter + 1
if counter == 400:
time.sleep(2)
counter = 0

if __name__ == '__main__':
main()

我读过关于使用Pandas并将分块作为一种选项的文章,但使用Dataframe对我来说并不直观,而且我不知道如何像上面的例子那样解析分块的每一行。几个问题:

  1. 如果文件超过一百万行,我当前拥有的文件会遇到内存问题吗?如果有帮助的话,每个CSV应该只有4列
  2. 熊猫组块会更有效率吗?如果是这样的话,我如何在"csv chunk"的每一行上迭代以构建我的API请求,就像上面的例子一样

在我可怜的尝试组块文件时,打印此代码中的"行"的结果:

for chunk in pd.read_csv(file_location, chunksize=chunk_size):
for row in chunk:
print(row)

返回

email
device
applicationName
platform
token

所以我非常困惑。提前感谢您的帮助。

看看python生成器。生成器是一种迭代器,不会将所有值存储在内存中

def read_file_generator(file_name):
with open(file_name) as csv_file:
for row in csv_file:
yield row

def main():
for row in read_file_generator("my_file.csv"):
print(row)

if __name__ == '__main__':
main()
``

相关内容

  • 没有找到相关文章

最新更新