Python组合数据类型（二）

今天继续学习一下Python组合数据类型。

字典类型及操作

字典类型定义

映射：是一种键（索引）和值（数据）的对应

字典的建立

赋值创建字典
d={“key1”:”value1”,”key2”:”value2”}
工厂函数
d=dict(user1=”123”,user2=”234”,user3=”345”)
内建方法:fromkeys
d={}.fromkeys((‘username’,’password’),())
字典中的key有相同的value值,默认为None

基本字典操作方法

len(d)返回d中的键-值对的数量
d[k]返回关联到k上的值
d[k]=v将值v关联到键k上
del d[k]删除键为k的项
k in d检查d中是否含有键为k项

字典方法

详见python基础教程P59.

jieba库的使用

jieba库的基本介绍

概述：jieba库是优秀的中文分词第三方库

中文文本需要通过分词获得单个的词语
jieba是优秀的中文分词第三方库，需要额外安装
jieba库提供三种分词模式，最简单的只需掌握一个函数

jieba库的使用

精确模式：把文本精确的分开，不存在冗余单词
全模式：把文本所有可能词语都扫描出来，有冗余
搜索引擎模式：在精确模式基础上，对长词再次切分

实例：文本词频统计

英文文本词频统计

def getTxt():
	txt = open("hamlet.txt","r").read()
	txt = txt.lower()
	for ch in '~!@#$%^&*()_+{}[]|\:;"<>,.?/-=`'
		txt = txt.replace(ch," ")
	return txt

hamletTxt = getTxt()
words = hamletTxt.split()
counts = {}
for word in words:
	counts[word] = counts.get(word,0)+1

items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10)
	word,count = items[i]
	print("{0:<10}{1:>5}".format(word,count))

中文文本词频统计

import jieba
txt = open("threekingdoms.txt","r",encoding="utf-8").read()
words = jieba.lcut(txt)
counts = {}
for word in words:
	if len(word) == 1:
		continue
	else:
		counts[word] = counts.get(word,0)+1

items = list(counts.items())
items.sort(key=lambda x:x[1], reverse=True)
for i in range(10)
	word,count = items[i]
	print("{0:<10}{1:>5}".format(word,count))

#python学习

Python组合数据类型（二）

https://chujian521.github.io/blog/2018/08/19/Python组合数据类型（二）/

作者

Encounter

发布于

2018年8月19日

许可协议

Encounter 上一篇

Python组合数据类型下一篇