如何在mongodb中创建文档和集合来进行python代码配置。获取属性名称,数据类型,函数从mongodb调用?
mongodb collection sample example
db.attributes.insertMany([
{ attributes_names: "email", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "email_valid" }
{ attributes_names: "address", attributes_datype: "string", attributes_isNull="false", attributes_std_function = "address_valid" }
]);
Python脚本和函数
def email_valid(df):
df1 = df.withColumn(df.columns[0], regexp_replace(lower(df.columns[0]), "^a-zA-Z0-9@._-| ", ""))
extract_expr = expr(
"regexp_extract_all(emails, '(\w+([\.-]?\w+)*@\[A-Za-z-.]+([\.-]?\w+)*(\.\w{2,3})+)', 0)")
df2 = df1.withColumn(df.columns[0], extract_expr)
.select(df.columns[0])
return df2
如何在python脚本中获取所有mongodb的值,并根据属性调用函数。
从python脚本创建MongoDB
集合:
import pymongo
# connect to your mongodb client
client = pymongo.MongoClient(connection_url)
# connect to the database
db = client[database_name]
# get the collection
mycol = db[collection_name]
from bson import ObjectId
from random_object_id import generate
# create a sample dictionary for the collection data
mydict = { "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" }
# insert the dictionary into the collection
mycol.insert_one(mydict)
要在MongoDB
中插入多个值,使用insert_many()而不是insert_one(),并将字典列表传递给它。那么你的list of dictionary就会像这样
mydict = [{ "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" },
{ "_id": ObjectId(generate()),
"attributes_names": "email",
"attributes_datype": "string",
"attributes_isNull":"false",
"attributes_std_function" : "email_valid" }]
将MongoDB
收集的所有数据放入python脚本:
data = list()
for x in mycol.find():
data.append(x)
import pandas as pd
data = pd.json_normalize(data)
然后像访问字典列表中的元素一样访问数据:
value = data[0]["attributes_names"]