有时我们有可以组合的因子水平,或者我们想将这些水平分组为一个水平。这通常是在以下情况下完成的:对于特定因子水平,我们只有一个值;或者存在一些导致组合因子水平的理论概念。例如,如果我们有一个名为df的数据帧,其中包含一个因子列,比如说x具有四个类别A,B,C和D,则可以将它们分组为A和B,如-
df$x[df$x %in% c("A","B")]<-"A" df$x[df$x %in% c("C","D")]<-"B"
考虑以下数据帧-
factor<-sample(LETTERS[1:4],20,replace=TRUE) response<-rpois(20,5) df1<-data.frame(factor,response) df1输出结果
factor response 1 A 5 2 C 7 3 D 5 4 C 13 5 C 5 6 C 4 7 B 4 8 B 10 9 C 4 10 D 6 11 B 5 12 B 3 13 A 7 14 A 2 15 A 2 16 D 3 17 B 1 18 C 5 19 D 6 20 D 4
在df1的因子列中重新编码因子水平-
df1$factor[df1$factor %in% c("A","B")]<-"A" df1$factor[df1$factor %in% c("C","D")]<-"B" df1
输出
factor response 1 A 5 2 B 7 3 B 5 4 B 13 5 B 5 6 B 4 7 A 4 8 A 10 9 B 4 10 B 6 11 A 5 12 A 3 13 A 7 14 A 2 15 A 2 16 B 3 17 A 1 18 B 5 19 B 6 20 B 4
grp<-sample(c("G1","G2","G3"),20,replace=TRUE) Y<-rnorm(20) df2<-data.frame(grp,Y) df2输出结果
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 G1 1.46432790 4 G3 -0.90843955 5 G1 -0.15202516 6 G2 1.15456629 7 G2 1.24002828 8 G2 -1.10731484 9 G2 0.27423208 10 G3 1.06444903 11 G2 -0.21824650 12 G1 0.25843090 13 G1 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 G2 0.54630987 17 G3 -0.09808820 18 G1 -0.65171471 19 G2 -0.62371231 20 G2 -0.03319190
df2的grp列中的记录因子级别-
df2$grp[df2$grp %in% c("G1","G2")]<-"Control" df2
grp Y 1 G3 -0.39900138 2 G3 1.04085657 3 Control 1.46432790 4 G3 -0.90843955 5 Control -0.15202516 6 Control 1.15456629 7 Control 1.24002828 8 Control -1.10731484 9 Control 0.27423208 10 G3 1.06444903 11 Control -0.21824650 12 Control 0.25843090 13 Control 0.07686889 14 G3 -0.21955611 15 G3 -0.05359245 16 Control 0.54630987 17 G3 -0.09808820 18 Control -0.65171471 19 Control -0.62371231 20 Control -0.03319190