在大多数情况下,我们获取的数据格式不是我们想要的,因此,我们需要根据需要进行更改。当分类变量的级别由单词而不是数字表示时,我们可以将这些级别转换为小写或大写。有时,这样做只是为了使信息看起来对用户友好。通常,我们发现这些值是小写的,因此可以借助sapply函数将其转换为大写。
请看以下数据帧-
> x1<-letters[1:20] > x2<-20:1 > x3<-rep(c("india","china","usa","saudi arabia","jordan"),times=4) > df<-data.frame(x1,x2,x3) > df x1 x2 x3 1 a 20 india 2 b 19 china 3 c 18 usa 4 d 17 saudi arabia 5 e 16 jordan 6 f 15 india 7 g 14 china 8 h 13 usa 9 i 12 saudi arabia 10 j 11 jordan 11 k 10 india 12 l 9 china 13 m 8 usa 14 n 7 saudi arabia 15 o 6 jordan 16 p 5 india 17 q 4 china 18 r 3 usa 19 s 2 saudi arabia 20 t 1 jordan > df_new<-as.data.frame(sapply(df, toupper)) > df_new x1 x2 x3 1 A 20 INDIA 2 B 19 CHINA 3 C 18 USA 4 D 17 SAUDI ARABIA 5 E 16 JORDAN 6 F 15 INDIA 7 G 14 CHINA 8 H 13 USA 9 I 12 SAUDI ARABIA 10 J 11 JORDAN 11 K 10 INDIA 12 L 9 CHINA 13 M 8 USA 14 N 7 SAUDI ARABIA 15 O 6 JORDAN 16 P 5 INDIA 17 Q 4 CHINA 18 R 3 USA 19 S 2 SAUDI ARABIA 20 T 1 JORDAN
让我们再看一个示例,其中第二个变量的首字母大写-
> y1<-letters[26:7] > y2<-rep(c("Statistics","Biology","Psychology","Marketing","Physics"),each=4) > y3<-rep(c(2,4,6,8),times=5) > df_y<-data.frame(y1,y2,y3) > df_y y1 y2 y3 1 z Statistics 2 2 y Statistics 4 3 x Statistics 6 4 w Statistics 8 5 v Biology 2 6 u Biology 4 7 t Biology 6 8 s Biology 8 9 r Psychology 2 10 q Psychology 4 11 p Psychology 6 12 o Psychology 8 13 n Marketing 2 14 m Marketing 4 15 l Marketing 6 16 k Marketing 8 17 j Physics 2 18 i Physics 4 19 h Physics 6 20 g Physics 8 > df_y_new<-as.data.frame(sapply(df_y, toupper)) > df_y_new y1 y2 y3 1 Z STATISTICS 2 2 Y STATISTICS 4 3 X STATISTICS 6 4 W STATISTICS 8 5 V BIOLOGY 2 6 U BIOLOGY 4 7 T BIOLOGY 6 8 S BIOLOGY 8 9 R PSYCHOLOGY 2 10 Q PSYCHOLOGY 4 11 P PSYCHOLOGY 6 12 O PSYCHOLOGY 8 13 N MARKETING 2 14 M MARKETING 4 15 L MARKETING 6 16 K MARKETING 8 17 J PHYSICS 2 18 I PHYSICS 4 19 H PHYSICS 6 20 G PHYSICS 8