当不满足参数分析的假设时,我们将继续进行非参数分析,由于数据不是正态分布的,因此非参数分析通常会处理中位数。如果要查找按组划分的中位数,并且数据存储在data.table对象中,则可以使用lapply函数,如以下示例所示。
加载data.table包:
> library(data.table)
考虑下面的data.table对象:
> Group<-sample(LETTERS[1:4],20,replace=TRUE) > x1<-rnorm(20,1,0.87) > x2<-rnorm(20,5,1.2) > x3<-rnorm(20,500,20) > x4<-rnorm(20,50,1.14) > dt1<-data.table(Group,x1,x2,x3,x4) > dt1
输出结果
Group x1 x2 x3 x4 1: B 0.515370827 6.174187 542.9350 50.28300 2: B 0.522858146 6.976872 510.5568 49.71331 3: A 1.055456751 3.192242 476.7693 48.88280 4: A -0.024912175 2.847402 506.5335 50.67151 5: C -0.196164614 3.328402 508.6321 48.39842 6: C 1.290014270 5.556677 524.5811 48.27884 7: D 1.486977865 5.897758 486.5484 49.51944 8: D -0.007248341 6.468281 532.3197 51.45941 9: D 2.182819501 5.394480 442.8788 49.58497 10: B 2.211356101 6.443493 488.6105 49.02810 11: D -0.419805499 3.586357 485.3483 49.87930 12: B 1.865157121 6.099377 533.5723 51.51517 13: D 2.389899358 4.531113 507.7677 49.68121 14: C 0.411933014 4.602449 492.0163 50.05786 15: B 1.439917480 4.031037 475.5113 49.90952 16: A 1.749343791 5.170324 513.3880 50.25203 17: D 1.648629013 5.439521 519.4953 50.00103 18: A 1.825107893 2.489396 482.8070 49.83169 19: B 0.757930091 4.975242 501.2664 49.70943 20: D 1.989164222 3.915599 491.8682 50.91287
查找dt1中所有列的分组方式中位数:
> dt1[,lapply(.SD,median),by=Group]
输出结果
Group x1 x2 x3 x4 1: B 1.098924 6.136782 505.9116 49.81141 2: A 1.402400 3.019822 494.6703 50.04186 3: C 0.411933 4.602449 508.6321 48.39842 4: D 1.648629 5.394480 491.8682 49.87930
让我们看另一个例子:
> Class<-sample(c("First","Second","Third"),20,replace=TRUE) > Payment<-sample(1:10,20,replace=TRUE) > dt2<-data.table(Class,Payment) > dt2
输出结果
Class Payment 1: First 5 2: First 4 3: First 3 4: Second 5 5: First 1 6: Third 8 7: First 3 8: Second 7 9: Second 6 10: Second 10 11: First 4 12: Second 2 13: Second 2 14: First 10 15: First 1 16: Third 3 17: Third 1 18: Second 5 19: Third 4 20: Second 10
查找dt2中所有列的分组方式中位数:
> dt2[,lapply(.SD,median),by=Class]
输出结果
Class Payment 1: First 3.5 2: Second 5.5 3: Third 3.5