当我们拥有有助于区分数值列的因子列时,我们可能希望找到每个因子水平的最大值。这将有助于我们比较因子水平的最大值,如果要通过获取数据帧中的所有列来进行此操作,则需要将聚合函数与合并函数一起使用。
请看以下数据帧-
set.seed(78) Group<-sample(LETTERS[1:5],20,replace=TRUE) Rank<-sample(1:10,20,replace=TRUE) Score<-sample(1:100,20) df1<-data.frame(Group,Rank,Score) df1
输出结果
Group Rank Score 1 D 2 5 2 E 4 67 3 D 4 59 4 D 5 40 5 E 6 4 6 C 10 70 7 B 10 61 8 B 4 72 9 A 4 29 10 C 5 89 11 E 1 99 12 C 1 37 13 B 7 83 14 D 4 50 15 B 1 48 16 D 10 9 17 B 1 36 18 D 3 46 19 A 3 34 20 B 10 71
查找组因子水平的最大分数,并返回所有列的输出-
merge(aggregate(Score~Group,df1,max),df1,by=c("Group","Score"))
输出结果
Group Score Rank 1 A 78 8 2 B 79 3 3 C 73 5 4 D 99 6 5 E 98 4
让我们看另一个例子-
Class<-sample(c("First","Second","Third"),20,replace=TRUE) Gender<-sample(c("Male","Female"),20,replace=TRUE) Years<-sample(1:5,20,replace=TRUE) df2<-data.frame(Class,Gender,Years) df2
输出结果
Class Gender Years 1 Third Female 5 2 First Female 4 3 Third Female 5 4 Third Male 4 5 Second Male 4 6 Third Female 3 7 First Male 1 8 Third Male 2 9 First Female 5 10 Second Male 5 11 Second Male 5 12 Third Female 3 13 Third Female 1 14 Second Male 4 15 First Male 2 16 Second Female 1 17 First Female 5 18 Third Female 5 19 Second Female 4 20 First Male 3
找到类别因子水平的最大Year并返回所有列的输出-
merge(aggregate(Years~Class,df2,max),df2,by=c("Class","Years"))
输出结果
Class Years Gender 1 First 5 Male 2 First 5 Male 3 First 5 Male 4 Second 5 Male 5 Second 5 Female 6 Second 5 Female 7 Third 4 Male