ONLY DO WHAT ONLY YOU CAN DO

こけたら立ちなはれ 立ったら歩きなはれ

「2014 FIFA ワールドカップ Brasil」を t-SNE で分析

「Copa do Mundo de Futebol FIFA Brasil 2014」を t-SNE(t-distributed stochastic neighbor embedding) で分析してみた。

こちらの記事を参考にさせていただきました。
puyokw.hatenablog.com

データは、ここから入手。
https://www.whoscored.com/Statistics
こんな風な tab 区切りファイルに保存して

Team	Rank	Rating	Shotsconceded 	Shots	Tackles	CaughtOffside	Blocks	Interception	Clearances	Save	Goals	Dribbles	PossessionLoss	AerialWon	AerialLost	Passes	KeyPpasses	Assists	Fouls	Fouled
Algeria	10	6.92	15.5	9	20.8	1.5	10.8	16.3	29.5	4.8	1.5	6.5	19.8	14.8	22	326	6.5	1.3	17.3	13.5
Argentina	4	7.12	11.1	15.4	19	1.4	12.5	14.1	26.7	2.4	0.9	12.3	24	10.4	9.7	461.3	10.4	0.3	11	16.6
Australia	30	6.39	11.3	9	12.3	1	12	15.3	21	2.7	1	10.3	24.4	12.7	10.7	401.7	7.3	0.7	16.7	13
...省略...
Switzerland	1	7.14	18.3	16.3	20	1.3	15.8	12	24.5	4.3	1.8	6.5	23.6	11	10.3	392	13	1.5	16.5	15.8
Uruguay	26	6.63	10.8	11.8	18.5	2.8	12.3	15.8	24.3	2.3	1	5.5	23.3	18.3	17.5	346	7.8	0.5	18.3	15.8
USA	11	6.91	23.5	11	20	1	17.1	13.5	38.5	5.8	1	9	17.5	15	16.3	385.3	5.8	0.8	12.3	14

R に読み込み

d <- read.table("WorldCup2014TeamStatistics.txt", header=T)

rownames(d) <- c("Algeria",
"Argentina",
"Australia",
"Belgium",
"Bosnia_and_Herzegovina",
"Brazil",
"Cameroon",
"Chile",
"Colombia",
"Costa_Rica",
"Croatia",
"Ecuador",
"England",
"France",
"Germany",
"Ghana",
"Greece",
"Honduras",
"Iran",
"Italy",
"Ivory_Coast",
"Japan",
"Mexico",
"Netherlands",
"Nigeria",
"Portugal",
"Russia",
"South_Korea",
"Spain",
"Switzerland",
"Uruguay",
"USA")

t-SNE(t-distributed stochastic neighbor embedding)

#色の準備
colors = rainbow(6)
names(colors) = unique(d_merge$cluster)

set.seed(1)

#次元削除 (t-SNE)
library(tsne)
d_scale.tsne = tsne(d_scale, max_iter=100)
plot(d_scale.tsne, t='n', main="tsne")
text(d_scale.tsne, labels=d_merge$label,col=colors[d_merge$cluster])

f:id:fornext1119:20180701152425p:plain

d_scale.tsne = tsne(d_scale, max_iter=500)
plot(d_scale.tsne, t='n', main="tsne")
text(d_scale.tsne, labels=d_merge$label,col=colors[d_merge$cluster])

f:id:fornext1119:20180701152546p:plain

d_scale.tsne = tsne(d_scale, max_iter=1000)
plot(d_scale.tsne, t='n', main="tsne")
text(d_scale.tsne, labels=d_merge$label,col=colors[d_merge$cluster])

f:id:fornext1119:20180701152701p:plain