python3默认编码为unicode,由str类型进⾏表⽰。⼆进制数据使⽤byte类型表⽰。字符串通过编码转换成字节码,字节码通过解码成为字符串encode:str --> bytesdecode:bytes --> str实例python 3.0+
str = \"我是Python3\"
str_utf8 = str.encode('utf-8')str_gbk = str.encode('GBK')print(str)
print(\"UTF-8 编码:\", str_utf8)print(\"GBK 编码:\",str_gbk)
print(\"UTF-8 解码:\", str_utf8.decode('utf-8'))print(\"GBK解码:\",str_gbk.decode('GBK'))
输出结果如下:
我是Python3
UTF-8 编码: b'\\xe6\\x88\\x91\\xe6\\x98\\xafPython3'GBK 编码: b'\\xce\\xd2\\xca\\xc7Python3'UTF-8 解码: 我是Python3GBK解码: 我是Python3
分析:
1. python3默认的编码为unicode,utf-8可以看做是unicode的⼀个扩展集2. encode:指明要使⽤的编码,decode:指明当前编码的编码格式
#-*-coding:gb2312 -*- #这个也可以去掉import sys
print(sys.getdefaultencoding())
msg = \"我爱北京天安门\"
#msg_gb2312 = msg.decode(\"utf-8\").encode(\"gb2312\")
msg_gb2312 = msg.encode(\"gb2312\") #默认就是unicode,不⽤再decode,喜⼤普奔gb2312_to_unicode = msg_gb2312.decode(\"gb2312\")
gb2312_to_utf8 = msg_gb2312.decode(\"gb2312\").encode(\"utf-8\")print(msg)
print(msg_gb2312)
print(gb2312_to_unicode)print(gb2312_to_utf8)
输出结果:
utf-8
我爱北京天安门
b'\\xce\\xd2\\xb0\\xae\\xb1\\xb1\\xbe\\xa9\\xcc\\xec\\xb0\\xb2\\xc3\\xc5'我爱北京天安门
b'\\xe6\\x88\\x91\\xe7\\x88\\xb1\\xe5\\x8c\\x97\\xe4\\xba\\xac\\xe5\\xa4\\xa9\\xe5\\xae\\x\\xe9\\x97\\xa8'
因篇幅问题不能全部显示,请点此查看更多更全内容
Copyright © 2019- net188.cn 版权所有 湘ICP备2022005869号-2
违法及侵权请联系:TEL:199 1889 7713 E-MAIL:2724546146@qq.com
本站由北京市万商天勤律师事务所王兴未律师提供法律服务