如何在R中查找字符串列的每一行中的字符数?

如果我们在 R 数据框中有一个字符串列,并且字符串与数字混合,并且我们想要找到字符串列的每一行中的字符数,那么 nchar 函数可以与 gsub 函数一起使用,如下例所示。

由于 R 区分大小写,因此在进行此类分析时,我们需要确保对小写和大写字母使用正确的表示法。

示例 1

以下代码段创建了一个示例数据框 -

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1

创建以下数据框 -

     x
1  A01K
2  140AL
3  A142R
4  A255SW
5  A2474EZ
6  CA214N
7  C14O
8  CGSLT
9  DC23QW
10 D2411RWEDE
11 FL233EGV
12 G36521VCLPBA
13 G54TRU
14 H214FI
15 245IA
16 ID3699
17 IL01
18 IFDFDN
19 K2254FDES
20 KY244RLPKJ

要查找 x 列每行中的字符数,请将以下代码添加到上述代码段中 -

x<-c("A01K", "140AL", "A142R", "A255SW", "A2474EZ", "CA214N", "C14O", "CGSLT", "DC23QW", "D2411RWEDE", "FL233EGV", "G36521VCLPBA", "G54TRU", "H214FI", "245IA", "ID3699", "IL01", "IFDFDN", "K2254FDES", "KY244RLPKJ")
df1<-data.frame(x)
df1$No_of_Chars<-nchar(gsub("[^A-Z]","",df1$x))
df1
输出结果

如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -

    x    No_of_Chars
1  A01K         2
2  140AL        2
3  A142R        2
4  A255SW       3
5  A2474EZ      3
6  CA214N       3
7  C14O         2
8  CGSLT        5
9  DC23QW       4
10 D2411RWEDE   6
11 FL233EGV     5
12 G36521VCLPBA 7
13 G54TRU       4
14 H214FI       3
15 245IA        2
16 ID3699       2
17 IL01         2
18 IFDFDN       6
19 K2254FDES    5
20 KY244RLPKJ   7

示例 2

以下代码段创建了一个示例数据框 -

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2

创建以下数据框 -

      y
1  ala5412bama
2  ala1475ska
3  american11022samoa
4  arizona3652
5  arkan1475sas
6  califor2365nia
7  co1475lorado
8  0014connecticut
9  dela25366ware
10 district257of22columbia
11 florid02535a
12 57412georgia
13 gu25987am
14 hawaii36250
15 20057idaho
16 i369852llinois
17 indiana0146563
18 3255iowa
19 kansas3682701
20 kentucky2574

要查找 y 列的每一行中的字符数,请将以下代码添加到上述代码段中 -

y<-c("ala5412bama","ala1475ska","american11022samoa","arizona3652","arkan1475sas","califor2365nia","co1475lorado","0014connecticut","dela25366ware","district257of22columbia","florid02535a","57412georgia","gu25987am","hawaii36250","20057idaho","i369852llinois","indiana0146563","3255iowa","kansas3682701","kentucky2574")
df2<-data.frame(y)
df2$No_of_Chars<-nchar(gsub("[^a-z]","",df2$y))
df2
输出结果

如果您将上述所有给定的片段作为单个程序执行,它会生成以下输出 -

          y          No_of_Chars
1  ala5412bama              7
2  ala1475ska               6
3  american11022samoa      13
4  arizona3652              7
5  arkan1475sas             8
6  califor2365nia          10
7  co1475lorado             8
8  0014connecticut         11
9  dela25366ware            8
10 district257of22columbia 18
11 florid02535a             7
12 57412georgia             7
13 gu25987am                4
14 hawaii36250              6
15 20057idaho               5
16 i369852llinois           8
17 indiana0146563           7
18 3255iowa                 4
19 kansas3682701            6
20 kentucky2574             8

猜你喜欢