Converting to Pandas DataFrame



我有一行看起来像这样:

{0: '{"Paradigms":["Agile Software Development","Scrum","DevOps","Serverless Architecture"],"Platforms":["Kubernetes","Linux","Windows","Eclipse","PagerDuty","Apache2","Docker","AWS EC2","Amazon Web Services (AWS)","Sysdig","Apache Kafka","AWS Lambda","Azure","OpenStack"],"Storage":["AWS S3","MongoDB","Cassandra","MySQL","PostgreSQL","AWS DynamoDB","Spring Data MongoDB","AWS RDS","MySQL/MariaDB","Datadog","Memcached"],"Languages":["Java","PHP","SQL","Bash","Perl","JavaScript","Python","C#","Go"],"Frameworks":["Ruby on Rails (RoR)","AWS HA",".NET","Serverless Framework","Selenium","CodeIgniter","Express.js"],"Other":["Cisco","Content Delivery Networks (CDN)","Kubernetes Operations (Kops)","Prometheus","VMware ESXi","Bash Scripting","Scrum Master","Infrastructure as Code","Performance Tuning","Serverless","System Administration","Linux System Administration","Code Review"],"Libraries/APIs":["Node.js","Jenkins Pipeline","jQuery","React","Selenium Grid"],"Tools":["Jenkins","Bitbucket","GitHub","AWS ECS","AWS IAM","Amazon CloudFront CDN","Terraform","AWS CloudFormation","Git Flow","Artifactory","Nginx","Grafana","Zabbix","Docker Compose","AWS CLI","AWS ECR","Chef","Jira","Git","Postfix","MongoDB Shell","Wowza","Amazon SQS","AWS SES","Subversion (SVN)","TeamCity","Microsoft Visual Studio","Google Kubernetes Engine (GKE)","VMware ESX","Fluentd","Sumo Logic","Slack","Apache ZooKeeper","AWS Fargate","Ansible","ELK (Elastic Stack)","Microsoft Team Foundation Server","Azure Kubernetes Service (AKS)"]}',
1: '{"Platforms":["Debian Linux","Windows","Linux","NetBeans"],"Storage":["MySQL","Morphia","MongoDB","Oracle SQL","PostgreSQL","IBM DB2"],"Languages":["HTML5","CSS","Java","JavaScript","C++","Less","XPath","PHP","R","XSLT","XUL"],"Frameworks":["GWT","JUnit","Hibernate","AngularJS","JavaServer Pages (JSP)","Spring","JNI","Selenium","ASP.NET","Apache Velocity"],"Other":["Ajax","COM"],"Libraries/APIs":["HTML5 Canvas","Digester","JAXB","Java Servlets","Node.js","Jackson","JDBC","Standard Template Library (STL)","FFTW","ODBC","OpenGL","XStream"],"Tools":["Subversion (SVN)","Apache Ant","Mime4J","YourKit","IntelliJ IDEA","Apache Tomcat","Git","GCC","Cygwin","Maven","Eclipse IDE","UJAC","Flash","Mathematica","Perforce","CVS","GDB","Grunt","JDeveloper"]}',
2: '{"Platforms":["Firebase","XAMPP"],"Storage":["JSON"],"Languages":["CSS","Sass","JavaScript","TypeScript","HTML5","CSS3"],"Frameworks":["Angular","Bootstrap 3+","Jasmine"],"Libraries/APIs":["jQuery","Pure CSS"],"Tools":["Git","NPM","GitHub","Atom","Angular CLI","Photoshop CS5","Karma"]}',
3: '{"Paradigms":["Agile","CQRS","Azure DevOps"],"Platforms":["Debian Linux","Windows","Azure","Red Hat Linux","Visual Studio Code","Docker"],"Storage":["PostgreSQL","SQL Server 2016"],"Languages":["Python","JavaScript","C#","SQL","Java","C","C++","Bash","HTML"],"Frameworks":["AngularJS",".NET","Qt","Ruby on Rails (RoR)","Hibernate","Spring",".NET Core"],"Other":["IIS","Google Material Design","EDA","Sagas","Visual Studio Team Services (VSTS)"],"Libraries/APIs":["jQuery","Ruby on Rails API"],"Tools":["Microsoft Visual Studio","Qt Creator","Mercurial","Git","Jira","Terraform","Jenkins","Atom","Vim Text Editor","Eclipse IDE","Maven","SonarQube","Azure Kubernetes Service (AKS)"]}',

我该如何转换它,使每个键(Paradigms、Platforms等(成为pandas数据帧中的一列?我已经尝试了一些东西。。。

我很困,所以非常感谢你的帮助!:(

预期的输出是这样的,但在Paradigms(如存储、工具、语言等(之后有额外的列作为额外的键:

Paradigms Platforms Storage Languages ...
0        0        0        0
1        1        1        1
2        2        2        2
3        3        3        3
4        4        4        4

这是talentpool_subset['skills'][0]的值(与talentpool_df相同,只是较大数据帧的子集(:

'{"Paradigms":["Agile Software Development","Scrum","DevOps","Serverless Architecture"],"Platforms":["Kubernetes","Linux","Windows","Eclipse","PagerDuty","Apache2","Docker","AWS EC2","Amazon Web Services (AWS)","Sysdig","Apache Kafka","AWS Lambda","Azure","OpenStack"],"Storage":["AWS S3","MongoDB","Cassandra","MySQL","PostgreSQL","AWS DynamoDB","Spring Data MongoDB","AWS RDS","MySQL/MariaDB","Datadog","Memcached"],"Languages":["Java","PHP","SQL","Bash","Perl","JavaScript","Python","C#","Go"],"Frameworks":["Ruby on Rails (RoR)","AWS HA",".NET","Serverless Framework","Selenium","CodeIgniter","Express.js"],"Other":["Cisco","Content Delivery Networks (CDN)","Kubernetes Operations (Kops)","Prometheus","VMware ESXi","Bash Scripting","Scrum Master","Infrastructure as Code","Performance Tuning","Serverless","System Administration","Linux System Administration","Code Review"],"Libraries/APIs":["Node.js","Jenkins Pipeline","jQuery","React","Selenium Grid"],"Tools":["Jenkins","Bitbucket","GitHub","AWS ECS","AWS IAM","Amazon CloudFront CDN","Terraform","AWS CloudFormation","Git Flow","Artifactory","Nginx","Grafana","Zabbix","Docker Compose","AWS CLI","AWS ECR","Chef","Jira","Git","Postfix","MongoDB Shell","Wowza","Amazon SQS","AWS SES","Subversion (SVN)","TeamCity","Microsoft Visual Studio","Google Kubernetes Engine (GKE)","VMware ESX","Fluentd","Sumo Logic","Slack","Apache ZooKeeper","AWS Fargate","Ansible","ELK (Elastic Stack)","Microsoft Team Foundation Server","Azure Kubernetes Service (AKS)"]}'````

由于我还不能发表评论,以下是我的想法。据我所知;范式、平台等";将是列名,值将变成行。

将上面的行指定为字典(dict(。只获取要创建列表的值,然后从该列表创建数据帧。

import json
data = [ json.loads(v) for v in dict.values()]
df = pd.DataFrame(data)

希望能有所帮助。

最新更新