用特殊字符读取文件，并将其写入HTML

我有一个python脚本，该脚本读取PDF文件的名称，并将其写入带有PDF链接的HTML文件。除非一个名称具有特殊字符，否则一切都很好。

我在SE上阅读了许多其他答案。

f = open("jobs/index.html", "w")
#html divs go here
for root, dirs, files in os.walk('jobs/'):
    files.sort()
    for name in files:
        if ((name!="index.html")&(name!=".htaccess")):
            f.write("<a href='"+name+"'>"+name.rstrip(".pdf")+"</a>n<br><br>n")
            print name.rstrip(".pdf")

返回：
caba n-sanchez，jane.pdf
史密斯，约翰。pdf

当然是打破文本和链接到该pdf。

我如何正确编码文件或"名称"变量，以便正确写入特殊字符？
即，Cabán-Sanchez，Jane.pdf

您正在尝试将Unicode字符(在这种情况下为á(写入HTML文件，您应该指定HTML Meta CharSet。

<meta charset="UTF-8">

其余的在我的机器中正常工作

andraantariksa@LaptopnyaAndra:~$ cd Desktop/
andraantariksa@LaptopnyaAndra:~/Desktop$ mkdir jobs
andraantariksa@LaptopnyaAndra:~/Desktop$ cd jobs/
andraantariksa@LaptopnyaAndra:~/Desktop/jobs$ touch "Cabán-Sanchez, Jane.pdf"
andraantariksa@LaptopnyaAndra:~/Desktop/jobs$ ls
'Cabán-Sanchez, Jane.pdf'
andraantariksa@LaptopnyaAndra:~/Desktop/jobs$ cd ../
andraantariksa@LaptopnyaAndra:~/Desktop$ python
Python 2.7.15+ (default, Nov 27 2018, 23:36:35) 
[GCC 7.3.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> f = open("jobs/index.html", "w")
>>> #html divs go here
... for root, dirs, files in os.walk('jobs/'):
...     files.sort()
...     for name in files:
...         if ((name!="index.html")&(name!=".htaccess")):
...             f.write("<a href='"+name+"'>"+name.rstrip(".pdf")+"</a>n<br><br>n")
...             print name.rstrip(".pdf")
... 
Cabán-Sanchez, Jane
andraantariksa@LaptopnyaAndra:~/Desktop$ cat jobs/index.html 
<a href='Cabán-Sanchez, Jane.pdf'>Cabán-Sanchez, Jane</a>
<br><br>

我不习惯python 2.7，但这应该有效：

from io import open
with open("jobs/index.html", "w", encoding='utf-8') as f:
    for root, dirs, files in os.walk('jobs/'):
        files.sort()
        for name in files:
            if not name in ("index.html", ".htaccess"):
                f.write("<a href='{}'>{}</a>n<br><br>n".format(name, name.rstrip(".pdf")))
                print name.rstrip(".pdf")

您还应该通过在模块顶部添加这些行来声明您的编码级别：

#!/usr/bin/env python
# -*- coding: utf-8 -*-

最后，您可以尝试通过添加u""您的f.write行明确将字符串声明为Unicode，例如：

f.write(u"...")

为什么 io.open：backporting python 3打开(编码=; utf-8＆quot;(到python 2
当您可以时，为什么要使用with关键字：

相关内容

最新更新

热门标签：