在整个世界中,人们学会估算缺失值的第一步就是用相关的均值代替它们。这意味着,如果我们有一列缺少一些值,则将其替换为剩余值的平均值。在R中,我们可以通过使用该列的均值用缺少的值替换该列并传递na.rm = TRUE参数以及该参数来实现。
请看以下数据帧-
set.seed(121) x<-sample(c(0:2,NA),20,replace=TRUE) y<-sample(c(0:10,NA),20,replace=TRUE) z<-sample(c(rnorm(2,1,0.40),NA),20,replace=TRUE) df<-data.frame(x,y,z) df
输出结果
x y z 1 NA 1 1.525471 2 NA 10 1.525471 3 NA 0 NA 4 2 1 NA 5 NA 3 NA 6 0 4 1.525471 7 2 9 NA 8 0 5 NA 9 2 7 NA 10 2 6 1.296308 11 2 1 1.296308 12 0 NA 1.525471 13 NA 8 1.296308 14 0 5 NA 15 1 7 1.296308 16 NA 1 1.525471 17 0 1 NA 18 NA 5 1.525471 19 0 8 1.296308 20 1 1 1.296308
用剩余值的平均值替换x列中的NA-
df$x[is.na(df$x)]<-mean(df$x,na.rm=TRUE) df
输出结果
x y z 1 0.9230769 1 1.525471 2 0.9230769 10 1.525471 3 0.9230769 0 NA 4 2.0000000 1 NA 5 0.9230769 3 NA 6 0.0000000 4 1.525471 7 2.0000000 9 NA 8 0.0000000 5 NA 9 2.0000000 7 NA 10 2.0000000 6 1.296308 11 2.0000000 1 1.296308 12 0.0000000 NA 1.525471 13 0.9230769 8 1.296308 14 0.0000000 5 NA 15 1.0000000 7 1.296308 16 0.9230769 1 1.525471 17 0.0000000 1 NA 18 0.9230769 5 1.525471 19 0.0000000 8 1.296308 20 1.0000000 1 1.296308
用剩余值的平均值替换y列中的NA-
df$y[is.na(df$y)]<-mean(df$y,na.rm=TRUE) df
输出结果
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 NA 4 2.0000000 1.000000 NA 5 0.9230769 3.000000 NA 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 NA 8 0.0000000 5.000000 NA 9 2.0000000 7.000000 NA 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 NA 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 NA 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308
用剩余值的平均值替换z列中的NA
df$z[is.na(df$z)]<-mean(df$z,na.rm=TRUE) df
输出结果
x y z 1 0.9230769 1.000000 1.525471 2 0.9230769 10.000000 1.525471 3 0.9230769 0.000000 1.410890 4 2.0000000 1.000000 1.410890 5 0.9230769 3.000000 1.410890 6 0.0000000 4.000000 1.525471 7 2.0000000 9.000000 1.410890 8 0.0000000 5.000000 1.410890 9 2.0000000 7.000000 1.410890 10 2.0000000 6.000000 1.296308 11 2.0000000 1.000000 1.296308 12 0.0000000 4.368421 1.525471 13 0.9230769 8.000000 1.296308 14 0.0000000 5.000000 1.410890 15 1.0000000 7.000000 1.296308 16 0.9230769 1.000000 1.525471 17 0.0000000 1.000000 1.410890 18 0.9230769 5.000000 1.525471 19 0.0000000 8.000000 1.296308 20 1.0000000 1.000000 1.296308