如何在 python 脚本中自动生成代码行



我有一个名为test.py python文件。在此文件中,我将执行一些pyspark命令。

#!/usr/bin/env python
import sys
from pyspark import SparkContext, SparkConf
from pyspark.sql import HiveContext
conf = SparkConf()
sc = SparkContext(conf=conf)
sqlContext = HiveContext(sc)
# create a data frame from hive tables
df=sqlContext.table("testing.test")
# register the data frame as temp table
df.registerTempTable('mytempTable')
# find number of records in data frame
records = df.count()
print "records='%s'" %records

if records < 1000000:
 sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable".format(hivedb,table))
else:
 sqlContext.sql("create table {}.{} stored as parquet as select * from mytempTable where id <= 1000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 1000000 and id <= 2000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 2000000 and id <= 3000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 3000000 and id <= 4000000".format(hivedb,table))
 sqlContext.sql("insert into table {}.{} select * from mytempTable where id > 4000000 and id <= 5000000".format(hivedb,table))
 and so on till the last million

else后的if-else语句中 我手动编写的代码。

我想在脚本中自动生成这部分代码。

如何生成类似的代码行,直到records的最后一百万?

您可以使用一个简单的循环:

fmt = "insert into table {hivedb}.{table} select * from mytempTable where id > {low} and id <= {hi}"
for low in range(100000, 1000000, 100000):
    stmt = fmt.format(low=low, hi=low+100000, hivedb=hivedb, table=table)
    sqlContext.sql(stmt)