本文共 765 字,大约阅读时间需要 2 分钟。
简单的数据词频统计
import stringtext = "http requset highclient springboot requset"data = text.lower().split()words = {}for word in data: if word not in words: words[word] = 1 else: words[word] = words[word] + 1result = sorted(words.items(), reverse=True)print(result)输出[('springboot', 1), ('requset', 2), ('http', 1), ('highclient', 1)]
英文书词频统计(瓦登尔湖)
import stringpath = 'D:/python3/Walden.txt'with open(path,'r',encoding= 'utf-8') as text: words = [raw_word.strip(string.punctuation).lower() for raw_word in text.read().split()]words_index = set(words)counts_dict = {index:words.count(index) for index in words_index}for word in sorted(counts_dict,key=lambda x: counts_dict[x],reverse=True): print('{} -- {} times'.format(word,counts_dict[word]))
转载地址:http://phcuz.baihongyu.com/