東海オンエアの "臭い"動画をpythonを使って探してみた

東海オンエアの"臭い"動画をYoutube Data API と辞書を使って探してみた。流れは以下の通りである。
①Youtube Data APIを使って、東海オンエアのチャンネルの動画タイトルを取得する
　↓
②動画タイトルを単語ごとに分割する
　↓
③辞書を使って動画タイトルを分析する

詳細はそれぞれ以下の記事にまとめてある。
①Youtube Data APIを使って、東海オンエアのチャンネルの動画タイトルを取得する
chindafalldesu.hatenablog.com

②動画タイトルを単語ごとに分割する
chindafalldesu.hatenablog.com

③辞書を使って動画タイトルを分析する
chindafalldesu.hatenablog.com

'臭い'動画を探すために以下の辞書を使う。

word2score={'臭い':5,'臭':5,'閲覧':1,'汚い':3,'不潔':3,'不衛生':3,'劣悪':1,'便所':1, 'ごみ':1, 'うんこ':3,'精子':1,'珍':1,'棒':1,'酒':1,'酔':1,'酔う':1, '酔い':1, '泥酔':1 }

（ソースコード）

import MeCab
import pickle
import pandas
import requests
import time

channelid='UCutJqz56653xV2wwSvut_hQ'
apikey='（Youtube Data APIのキー）'
url = 'https://www.googleapis.com/youtube/v3/search?key='+apikey+'&channelId='+channelid+'&part=snippet,id&order=date&maxResults=50'
num=1
sentences=[]
vid=[]

response = requests.get(url)

for i in range(50):
    sentences.append(response.json()['items'][i]['snippet']['title'])
    vid.append("[https://www.youtube.com/watch?v="+response.json()['items'][i]['id']['videoId']+':embed:cite]')
    num+=1

for j in range(4):
    time.sleep(1)
    next=response.json()["nextPageToken"]
    nexturl=url+'&pageToken='+next
    response = requests.get(nexturl)
    for i in range(50):
        sentences.append(response.json()['items'][i]['snippet']['title'])
        try:
            vid.append("[https://www.youtube.com/watch?v="+response.json()['items'][i]['id']['videoId']+':embed:cite]')
        except:
            vid.append("NULL")
        num+=1

t=MeCab.Tagger('-Ochasen')
words_list=[]

for sentence in sentences:
    words=[]
    parsed=t.parse(sentence)
    for line in parsed.splitlines()[:-1]:
        words.append(line.split('\t')[2])
    words_list.append(words)

word2score={'臭い':5,'臭':5,'閲覧':1,'汚い':3,'不潔':3,'不衛生':3,'劣悪':1,'便所':1, 'ごみ':1, 'うんこ':3,'精子':1,'珍':1,'棒':1,'酒':1,'酔':1,'酔う':1, '酔い':1, '泥酔':1 }

scores=[]
for words in words_list:
    score=0
    for word in words:
        if word in word2score:
            score+=word2score[word]
    scores.append(score)

data=dict(zip(sentences,scores))
data_sorted = sorted(data.items(),key=lambda x:x[1], reverse=True)
for i in range(5):
    print(data_sorted[i])

# data=dict(zip(vid,scores))
# data_sorted = sorted(data.items(),key=lambda x:x[1], reverse=True)
# for i in range(5):
#     print(data_sorted[i][0])

（実行結果①）

> python .\words10.py
('【いい匂いはNG】｢臭くておいしい料理｣選手権！', 5)
('【酔酔酔】当てるまで終われない利き日本酒', 4)
('「ネコのうんこコーヒー」てつやネコのでも美味しいんじゃね！？', 3)
('精子観察キットで東海オンエアの精子を測定したらまさかの...', 2)
('【閲覧注意】この動画を見ると頭がおかしくなります。', 1)

（実行結果②）はてなブログでそのまま動画を閲覧できるように編集