如果我们在 R 数据框中有一个字符串列,并且字符串与数字混合,并且我们想要找到字符串列的每一行中的字符数,那么 nchar 函数可以与 gsub 函数一起使用,如下例所示。
由于 R 区分大小写,因此在进行此类分析时,我们需要确保对小写和大写字母使用正确的表示法。
以下代码段创建了一个示例数据框 -
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ") df1<-data.frame(x) df1
创建以下数据框 -
x 1 A01K 2 140AL 3 A142R 4 A255SW 5 A2474EZ 6 CA214N 7 C14O 8 CGSLT 9 DC23QW 10 D2411RWEDE 11 FL233EGV 12 G36521VCLPBA 13 G54TRU 14 H214FI 15 245IA 16 ID3699 17 IL01 18 IFDFDN 19 K2254FDES 20 KY244RLPKJ
要查找 x 列每行中的字符数,请将以下代码添加到上述代码段中 -
x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ") df1<-data.frame(x) df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x)) df1输出结果
如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -
x No_of_Chars 1 A01K 2 2 140AL 2 3 A142R 2 4 A255SW 3 5 A2474EZ 3 6 CA214N 3 7 C14O 2 8 CGSLT 5 9 DC23QW 4 10 D2411RWEDE 6 11 FL233EGV 5 12 G36521VCLPBA 7 13 G54TRU 4 14 H214FI 3 15 245IA 2 16 ID3699 2 17 IL01 2 18 IFDFDN 6 19 K2254FDES 5 20 KY244RLPKJ 7
以下代码段创建了一个示例数据框 -
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574") df2<-data.frame(y) df2
创建以下数据框 -
y 1 ala5412bama 2 ala1475ska 3 american11022samoa 4 arizona3652 5 arkan1475sas 6 califor2365nia 7 co1475lorado 8 0014connecticut 9 dela25366ware 10 district257of22columbia 11 florid02535a 12 57412georgia 13 gu25987am 14 hawaii36250 15 20057idaho 16 i369852llinois 17 indiana0146563 18 3255iowa 19 kansas3682701 20 kentucky2574
要查找 y 列的每一行中的字符数,请将以下代码添加到上述代码段中 -
y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574") df2<-data.frame(y) df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y)) df2输出结果
如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -
y No_of_Chars 1 ala5412bama 7 2 ala1475ska 6 3 american11022samoa 13 4 arizona3652 7 5 arkan1475sas 8 6 califor2365nia 10 7 co1475lorado 8 8 0014connecticut 11 9 dela25366ware 8 10 district257of22columbia 18 11 florid02535a 7 12 57412georgia 7 13 gu25987am 4 14 hawaii36250 6 15 20057idaho 5 16 i369852llinois 8 17 indiana0146563 7 18 3255iowa 4 19 kansas3682701 6 20 kentucky2574 8