在数据分析中,在数据框中查找某些NA值非常普遍,但如果包含NA值的列对分析无用,则所有NA值都不会产生问题。我们可以将所有NA值替换为0或将其他有用的列替换为其他值。
请看以下数据帧-
> set.seed(99) > x1<-sample(c(5,10,15,NA),20,replace=TRUE) > x2<-sample(c(1,2,3,NA),20,replace=TRUE) > x3<-sample(c(20,21,22,23,24,25,NA),20,replace=TRUE) > x4<-sample(c(letters[1:10],NA),20,replace=TRUE) > x5<-sample(c(1:10,NA),20,replace=TRUE) > df<-data.frame(x1,x2,x3,x4,x5) > df x1 x2 x3 x4 x5 1 NA NA 25 <NA> NA 2 5 2 24 f 2 3 NA 2 25 i 7 4 10 NA 23 i 10 5 10 1 21 c 3 6 5 NA NA h NA 7 15 2 20 g 10 8 10 NA 25 d 10 9 10 2 23 c 5 10 10 1 NA f 8 11 NA 3 25 <NA> 5 12 10 2 NA h 4 13 NA 3 25 g 1 14 5 2 NA c 8 15 NA 2 NA <NA> 3 16 NA NA 23 f 7 17 15 1 24 <NA> 9 18 NA NA NA b 3 19 5 3 NA d 3 20 10 2 20 g 8
将NA的连续列更改为零-
> df[,c("x1","x2")][is.na(df[,c("x1","x2")])] <- 0 > df x1 x2 x3 x4 x5 1 0 0 25 <NA> NA 2 5 2 24 f 2 3 0 2 25 i 7 4 10 0 23 i 10 5 10 1 21 c 3 6 5 0 NA h NA 7 15 2 20 g 10 8 10 0 25 d 10 9 10 2 23 c 5 10 10 1 NA f 8 11 0 3 25 <NA> 5 12 10 2 NA h 4 13 0 3 25 g 1 14 5 2 NA c 8 15 0 2 NA <NA> 3 16 0 0 23 f 7 17 15 1 24 <NA> 9 18 0 0 NA b 3 19 5 3 NA d 3 20 10 2 20 g 8
将NA的非连续列更改为零-
> df[,c("x3","x5")][is.na(df[,c("x3","x5")])] <- 0 > df x1 x2 x3 x4 x5 1 0 0 25 <NA> 0 2 5 2 24 f 2 3 0 2 25 i 7 4 10 0 23 i 10 5 10 1 21 c 3 6 5 0 0 h 0 7 15 2 20 g 10 8 10 0 25 d 10 9 10 2 23 c 5 10 10 1 0 f 8 11 0 3 25 <NA> 5 12 10 2 0 h 4 13 0 3 25 g 1 14 5 2 0 c 8 15 0 2 0 <NA> 3 16 0 0 23 f 7 17 15 1 24 <NA> 9 18 0 0 0 b 3 19 5 3 0 d 3 20 10 2 20 g 8