在R编程中,大多数带有字符串值的列可以由字符数据类型或因子数据类型表示。例如,如果我们有一个具有四个唯一值(如A,B,C和D)的“组”列,则它可以是具有四个级别的字符或因子。如果我们要获取这些列的子集,则可以使用子集函数。查看以下示例。
请看以下数据帧-
set.seed(888) Grp<-sample(c("A","B","C"),20,replace=TRUE)Age<-sample(21:50,20) df1<-data.frame(Grp,Age) df1
输出结果
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 7 B 47 8 A 45 9 B 43 10 B 37 11 B 30 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 17 B 49 18 A 44 19 C 38 20 B 26
str(df1)'data.frame':20磅。2个变量:
$ Grp: chr "A" "C" "C" "C" ... $ Age: int 35 40 48 46 36 33 47 45 43 37 ...
根据Grp列值A和C获取df1的子集-
subset(df1, Grp %in% c("A","C"))
输出结果
Grp Age 1 A 35 2 C 40 3 C 48 4 C 46 5 C 36 6 C 33 8 A 45 12 A 24 13 C 39 14 C 50 15 C 25 16 A 34 18 A 44 19 C 38
让我们看另一个例子-
Class<-sample(c("First","Second","Third","Fourth"),20,replace=TRUE) Score<-sample(1:10,20,replace=TRUE) df2<-data.frame(Class,Score) df2
输出结果
Class Score 1 First 10 2 First 3 3 First 1 4 First 7 5 First 1 6 Third 4 7 First 3 8 First 3 9 Second 2 10 First 8 11 Fourth 1 12 Third 6 13 First 6 14 Second 1 15 First 8 16 Fourth 4 17 Third 7 18 Fourth 4 19 Third 7 20 Fourth 1
str(df2)'data.frame':20磅。2个变量:
$ Class: chr "First" "Third" "Second" "First" ... $ Score: int 1 4 9 8 9 10 2 8 5 8 ...
根据Class列值First和Fourth获取df2的子集-
subset(df2, Class %in% c("First","Fourth"))
输出结果
Class Score 1 First 1 4 First 8 5 First 9 6 Fourth 10 7 Fourth 2 9 Fourth 5 10 Fourth 8 11 Fourth 8 13 Fourth 7 14 Fourth 10 15 First 7 16 Fourth 10 17 Fourth 4 19 First 2 20 First 10