有时我们会在数据集中获得不必要的信息,需要删除这些信息,这些信息可能是单个案例,多个案例,整个变量或任何其他无助于实现我们的分析目标的事物,因此我们希望将其删除。如果我们想借助dplyr软件包从R数据帧中删除此类行,则可以使用anti_join函数。
请看以下数据帧:
> set.seed(2514) > x1<-rnorm(20,5) > x2<-rnorm(20,5,0.05) > df1<-data.frame(x1,x2) > df1
输出结果
x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290 15 3.842797 4.950219 16 5.270018 4.995953 17 6.344269 5.008545 18 5.366249 4.905290 19 5.547608 5.098554 20 5.266844 5.003416
加载dplyr软件包:
> library(dplyr)
从df1中删除第1至5行:
> anti_join(df1,df1[1:5,]) Joining, by = c("x1", "x2") x1 x2 1 5.950218 5.038626 2 4.903268 5.010087 3 7.462286 4.974513 4 5.056762 5.097812 5 6.031768 5.002989 6 3.814416 4.990552 7 3.359167 4.891964 8 5.304671 4.950883 9 4.768564 4.953290 10 3.842797 4.950219 11 5.270018 4.995953 12 6.344269 5.008545 13 5.366249 4.905290 14 5.547608 5.098554 15 5.266844 5.003416
从df1中删除第11至18行:
> anti_join(df1,df1[11:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 5.547608 5.098554 12 5.266844 5.003416
从df1中删除第6至12行:
> anti_join(df1,df1[6:12,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.304671 4.950883 7 4.768564 4.953290 8 3.842797 4.950219 9 5.270018 4.995953 10 6.344269 5.008545 11 5.366249 4.905290 12 5.547608 5.098554 13 5.266844 5.003416
从df1中删除第15至20行:
> anti_join(df1,df1[15:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989 11 3.814416 4.990552 12 3.359167 4.891964 13 5.304671 4.950883 14 4.768564 4.953290
从df1中删除第5至18行:
> anti_join(df1,df1[5:18,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 5.547608 5.098554 6 5.266844 5.003416
从df1中删除第11至20行:
> anti_join(df1,df1[11:20,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 5.343063 4.931962 3 2.211267 5.034461 4 5.092191 5.075641 5 3.883282 4.997900 6 5.950218 5.038626 7 4.903268 5.010087 8 7.462286 4.974513 9 5.056762 5.097812 10 6.031768 5.002989
从df1中删除第1至10行:
> anti_join(df1,df1[1:10,]) Joining, by = c("x1", "x2") x1 x2 1 3.814416 4.990552 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416
从df1中删除第2至11行:
> anti_join(df1,df1[2:11,]) Joining, by = c("x1", "x2") x1 x2 1 5.567262 4.998607 2 3.359167 4.891964 3 5.304671 4.950883 4 4.768564 4.953290 5 3.842797 4.950219 6 5.270018 4.995953 7 6.344269 5.008545 8 5.366249 4.905290 9 5.547608 5.098554 10 5.266844 5.003416