可以使用scale函数轻松地完成数字列的标准化,但是如果我们也要标准化数据帧中的多个列(如果还存在分类列),则将使用dplyr软件包的mutate_if函数。例如,如果我们有一个数据帧df,则可以将其设为df%>%mutate_if(is.numeric,scale)
考虑以下数据帧-
> x1<-sample(letters[1:4],20,replace=TRUE) > x2<-rpois(20,2) > df1<-data.frame(x1,x2) > df1输出结果
x1 x2 1 c 4 2 c 1 3 a 4 4 a 1 5 b 0 6 c 4 7 c 2 8 a 1 9 c 2 10 d 2 11 b 0 12 b 3 13 c 0 14 d 1 15 a 2 16 d 1 17 a 2 18 d 2 19 c 1 20 a 3
加载dplyr软件包并标准化df1中的数字列-
> library(dplyr) > df1%>%mutate_if(is.numeric,scale)输出结果
x1 x2 1 c 1.7168098 2 c -0.6242945 3 a 1.7168098 4 a -0.6242945 5 b -1.4046626 6 c 1.7168098 7 c 0.1560736 8 a -0.6242945 9 c 0.1560736 10 d 0.1560736 11 b -1.4046626 12 b 0.9364417 13 c -1.4046626 14 d -0.6242945 15 a 0.1560736 16 d -0.6242945 17 a 0.1560736 18 d 0.1560736 19 c -0.6242945 20 a 0.9364417
> y1<-sample(c("S1","S2","S3"),20,replace=TRUE) > y2<-rnorm(20,34,2.3) > y3<-rnorm(20,500,47.1) > df2<-data.frame(y1,y2,y3) > df2输出结果
y1 y2 y3 1 S2 33.67237 511.9535 2 S2 30.47941 509.6286 3 S3 35.19967 605.8329 4 S2 27.82392 590.1114 5 S2 33.91328 485.1736 6 S1 38.26157 449.6714 7 S3 32.46148 495.2131 8 S3 32.06987 477.6192 9 S2 33.32162 448.6335 10 S2 37.55487 544.3631 11 S2 34.84706 462.9035 12 S1 34.59332 532.0554 13 S2 32.36337 501.9207 14 S2 32.26520 516.7858 15 S3 33.62168 530.5313 16 S3 33.06213 515.0878 17 S1 35.09752 454.7614 18 S3 31.79898 499.8527 19 S1 32.85342 509.8768 20 S3 33.72336 503.8084
标准化df2中的数字列-
> df2%>%mutate_if(is.numeric,scale)输出结果
y1 y2 y3 1 S2 0.09796633 0.11297890 2 S2 -1.30368623 0.05666468 3 S3 0.76842187 2.38692048 4 S2 -2.46939699 2.00611458 5 S2 0.20372057 -0.53568372 6 S1 2.11253906 -1.39561547 7 S3 -0.43359265 -0.29250727 8 S3 -0.60550146 -0.71866529 9 S2 -0.05600808 -1.42075459 10 S2 1.80231017 0.89800290 11 S2 0.61363310 -1.07510811 12 S1 0.50224659 0.59988493 13 S2 -0.47666141 -0.13003510 14 S2 -0.51975777 0.23002594 15 S3 0.07571152 0.56296787 16 S3 -0.16991946 0.18889687 17 S1 0.72358127 -1.27232444 18 S3 -0.72441871 -0.18012673 19 S1 -0.26153720 0.06267550 20 S3 0.12034948 -0.08431193