在进行分析时,我们可能会遇到不需要的数据,并且希望将其删除。该数据可以是整行或多行。例如,如果某行包含的值大于,小于或等于某个阈值,则可能不需要它,因此我们可以将其删除。在R中,我们借助于单个方括号对子集进行辅助实现。
请看以下数据帧-
> set.seed(99) > x1<-rnorm(20) > x2<-rnorm(20,0.1) > x3<-rnorm(20,0.2) > x4<-rnorm(20,0.5) > x5<-rnorm(20,1) > df<-data.frame(x1,x2,x3,x4,x5) > df x1 x2 x3 x4 x5 1 0.2139625022 1.19892152 0.33297863 0.33708211 1.03661152 2 0.4796581346 0.85251346 -1.47926432 0.38578484 1.28852606 3 0.0878287050 0.04058331 -0.07847958 0.05534064 -0.10597134 4 0.4438585075 -0.24456879 -1.35241100 0.75695917 1.89223849 5 -0.3628379205 0.32266830 -1.17969925 -0.60013713 2.18146915 6 0.1226740295 0.65178634 -1.15705659 -0.83657589 1.35116793 7 -0.8638451881 0.78364282 -0.72113718 0.70489861 1.06300672 8 0.4896242667 -0.44587940 -0.66681774 0.53528735 2.39426172 9 -0.3641169125 -1.26743616 1.85664439 0.06108749 0.98749208 10 -1.2942420067 1.50005184 0.04492028 0.90040586 1.67807643 11 -0.7457690454 1.47305395 -1.37655243 1.08517131 0.94385342 12 0.9215503620 0.55025656 0.82408260 0.98212854 1.13599383 13 0.7500543504 -0.04629386 0.53022068 -0.30483385 2.86457602 14 -2.5085540159 0.22809724 -0.19812226 0.80307719 2.14870835 15 -3.0409340953 -2.19472095 -0.88139693 -0.32617573 0.06001394 16 0.0002658005 -1.26656892 0.12307794 0.64142892 0.93811373 17 -0.3940189942 -0.09747955 -0.32553662 1.24035721 0.62390950 18 -1.7450276608 0.16808578 0.59128965 1.88504655 1.20968885 19 0.4986314508 0.19050341 -0.48045326 -0.13357748 1.70545858 20 0.2709537888 0.42275997 -0.54869693 0.73858864 1.65208847
假设我们要删除第1行,那么我们可以如下进行操作:
> df = df[-1,] > df x1 x2 x3 x4 x5 2 0.4796581346 0.85251346 -1.47926432 0.38578484 1.28852606 3 0.0878287050 0.04058331 -0.07847958 0.05534064 -0.10597134 4 0.4438585075 -0.24456879 -1.35241100 0.75695917 1.89223849 5 -0.3628379205 0.32266830 -1.17969925 -0.60013713 2.18146915 6 0.1226740295 0.65178634 -1.15705659 -0.83657589 1.35116793 7 -0.8638451881 0.78364282 -0.72113718 0.70489861 1.06300672 8 0.4896242667 -0.44587940 -0.66681774 0.53528735 2.39426172 9 - 0.3641169125 -1.26743616 1.85664439 0.06108749 0.98749208 10 -1.2942420067 1.50005184 0.04492028 0.90040586 1.67807643 11 -0.7457690454 1.47305395 - 1.37655243 1.08517131 0.94385342 12 0.9215503620 0.55025656 0.82408260 0.98212854 1.13599383 13 0.7500543504 -0.04629386 0.53022068 -0.30483385 2.86457602 14 -2.5085540159 0.22809724 -0.19812226 0.80307719 2.14870835 15 -3.0409340953 -2.19472095 -0.88139693 -0.32617573 0.06001394 16 0.0002658005 -1.26656892 0.12307794 0.64142892 0.93811373 17 -0.3940189942 -0.09747955 -0.32553662 1.24035721 0.62390950 18 -1.7450276608 0.16808578 0.59128965 1.88504655 1.20968885 19 0.4986314508 0.19050341 -0.48045326 -0.13357748 1.70545858 20 0.2709537888 0.42275997 -0.54869693 0.73858864 1.65208847
连续行可以通过以下方式删除-
> df = df[-c(1:2),] > df x1 x2 x3 x4 x5 4 0.4438585075 -0.24456879 -1.35241100 0.75695917 1.89223849 5 -0.3628379205 0.32266830 -1.17969925 -0.60013713 2.18146915 6 0.1226740295 0.65178634 -1.15705659 -0.83657589 1.35116793 7 -0.8638451881 0.78364282 -0.72113718 0.70489861 1.06300672 8 0.4896242667 -0.44587940 -0.66681774 0.53528735 2.39426172 9 -0.3641169125 -1.26743616 1.85664439 0.06108749 0.98749208 10 -1.2942420067 1.50005184 0.04492028 0.90040586 1.67807643 11 -0.7457690454 1.47305395 -1.37655243 1.08517131 0.94385342 12 0.9215503620 0.55025656 0.82408260 0.98212854 1.13599383 13 0.7500543504 -0.04629386 0.53022068 -0.30483385 2.86457602 14 -2.5085540159 0.22809724 -0.19812226 0.80307719 2.14870835 15 -3.0409340953 -2.19472095 -0.88139693 -0.32617573 0.06001394 16 0.0002658005 -1.26656892 0.12307794 0.64142892 0.93811373 17 -0.3940189942 -0.09747955 -0.32553662 1.24035721 0.62390950 18 -1.7450276608 0.16808578 0.59128965 1.88504655 1.20968885 19 0.4986314508 0.19050341 -0.48045326 -0.13357748 1.70545858 20 0.2709537888 0.42275997 -0.54869693 0.73858864 1.65208847
现在我们可能想要删除第1行和第3行,因此我们将从df中删除第4行和第6行,可以如下所示进行操作-
> df = df[-c(1,3),] > df x1 x2 x3 x4 x5 5 -0.3628379205 0.32266830 -1.17969925 -0.60013713 2.18146915 7 -0.8638451881 0.78364282 -0.72113718 0.70489861 1.06300672 8 0.4896242667 -0.44587940 -0.66681774 0.53528735 2.39426172 9 -0.3641169125 -1.26743616 1.85664439 0.06108749 0.98749208 10 -1.2942420067 1.50005184 0.04492028 0.90040586 1.67807643 11 -0.7457690454 1.47305395 -1.37655243 1.08517131 0.94385342 12 0.9215503620 0.55025656 0.82408260 0.98212854 1.13599383 13 0.7500543504 -0.04629386 0.53022068 -0.30483385 2.86457602 14 -2.5085540159 0.22809724 -0.19812226 0.80307719 2.14870835 15 -3.0409340953 -2.19472095 -0.88139693 -0.32617573 0.06001394 16 0.0002658005 -1.26656892 0.12307794 0.64142892 0.93811373 17 -0.3940189942 -0.09747955 -0.32553662 1.24035721 0.62390950 18 -1.7450276608 0.16808578 0.59128965 1.88504655 1.20968885 19 0.4986314508 0.19050341 -0.48045326 -0.13357748 1.70545858 20 0.2709537888 0.42275997 -0.54869693 0.73858864 1.65208847