我有下面的excel表可在此处下载。
对于熊猫,我是这样读的:
import pandas as pd
infile = "sample1_neu_input_deconv.xlsx"
outdf = pd.read_excel(infile)
outdf.head()
看起来像这样:
In [8]: outdf.head()
Out[8]:
ID_REF Gene.Symbol GSM1711905 GSM1711906 GSM1711907
0 10344620 Gm10568 78.496 70.582 78.496
1 10344622 Gm10568 87.940 85.746 94.670
2 10344624 Lypla1 324.306 450.037 231.723
3 10344633 Tcea1 361.733 758.949 917.704
4 10344637 Atp6v1h 236.272 275.910 453.972
现在我要做的是用以下命令剥离Gene.Symbol
列并使其大写:
outdf["Gene.Symbol"].map(str.strip).map(str.upper)
但它给了我以下错误:
TypeError: descriptor 'strip' requires a 'str' object but received a 'unicode'
正确的方法是什么?
您可以连锁连续的矢量化str
调用来实现您想要的:
In [4]:
outdf['Gene.Symbol'] = outdf['Gene.Symbol'].str.strip().str.upper()
outdf['Gene.Symbol']
Out[4]:
0 GM10568
1 GM10568
2 LYPLA1
3 TCEA1
4 ATP6V1H
5 OPRK1
6 RB1CC1
7 FAM150A
8 ST18
9 PCMTD1
10 RRS1
11 ADHFE1
12 3110035E14RIK
13 SGK3
14 6030422M02RIK
15 CSPP1
16 CSPP1
17 CSPP1
18 CSPP1
19 CSPP1
20 CSPP1
21 CSPP1
22 CSPP1
23 CSPP1
24 CSPP1
25 CSPP1
26 CSPP1
27 CSPP1
28 CSPP1
29 PREX2
...
24649 LOC380994
24650 LOC100504530
24651 SSTY2
24652 LOC665698
24653 LOC380994
24654 SSTY2
24655 LOC100039147
24656 LOC665746
24657 SSTY2
24658 LOC665128
24659 SSTY2
24660 RBM31Y
24661 LOC100039753
24662 SSTY1
24663 SSTY1
24664 SSTY1
24665 LOC380994
24666 LOC100504530
24667 LOC100039753
24668 SRSY
24669 SLY
24670 LOC100504530
24671 SLY
24672 LOC100039753
24673 SSTY2
24674 LOC100042196
24675 LOC380994
24676 LOC100040235
24677 LOC100041704
24678 SSTY2
Name: Gene.Symbol, dtype: object