当我们从外部来源导入数据时,标题或列名称可能会使用下划线分隔的值导入,并且如果原始数据具有相同格式,这也是可能的。因此,为了使标头更短并且看起来更好,我们希望删除下划线符号,这可以通过gsub函数轻松完成。
请看以下数据帧-
x_1<-sample(1:10,20,replace=TRUE) x_2<-sample(1:10,20,replace=TRUE) x_3<-sample(1:10,20,replace=TRUE) x_4<-sample(1:10,20,replace=TRUE) x_5<-sample(1:10,20,replace=TRUE) df1<-data.frame(x_1,x_2,x_3,x_4,x_5) df1
输出结果
x_1 x_2 x_3 x_4 x_5 1 10 4 6 5 10 2 6 10 2 1 4 3 9 9 6 1 4 4 6 1 5 5 8 5 7 7 4 7 4 6 1 5 2 1 8 7 8 5 5 2 9 8 8 4 1 9 8 9 8 1 7 4 3 10 5 9 3 10 3 11 2 7 5 6 9 12 10 1 4 1 5 13 8 10 10 1 2 14 3 10 5 7 6 15 5 6 9 1 10 16 3 8 6 4 7 17 8 9 5 7 2 18 6 10 5 6 8 19 1 8 3 2 9 20 8 1 5 10 5
从列名称中删除下划线-
names(df1)<-gsub("\\_","",names(df1)) df1
输出结果
x1 x2 x3 x4 x5 1 6 8 2 9 6 2 1 9 3 4 10 3 2 1 8 10 10 4 4 10 3 6 1 5 10 6 6 6 5 6 9 4 6 6 2 7 3 9 10 5 9 8 8 1 5 3 8 9 4 9 2 5 6 10 9 3 3 5 4 11 7 1 4 6 3 12 10 6 3 3 1 13 7 6 10 10 8 14 9 6 4 1 1 15 7 5 10 2 1 16 1 3 7 4 8 17 2 1 7 2 8 18 1 10 8 2 3 19 8 7 6 6 10 20 3 8 9 8 3
让我们看另一个例子-
y_1<-rnorm(20) y_2<-rnorm(20,2,1) y_3<-rnorm(20,2,0.5) y_4<-rnorm(20,2,0.0003) y_5<-rnorm(20,10,1) df2<-data.frame(y_1,y_2,y_3,y_4,y_5) df2
输出结果
y_1 y_2 y_3 y_4 y_5 1 0.514450792 2.4374182 3.230083 1.999826 12.625661 2 -0.312792686 0.8350701 2.769788 1.999740 8.699441 3 -0.710758168 2.7832089 1.971917 2.000519 8.430542 4 -0.060647019 1.4626953 1.971298 2.000600 9.568890 5 2.363567996 0.8239008 2.626454 2.000266 10.038633 6 1.227010669 2.6716199 1.844929 1.999768 7.838243 7 -0.994717233 1.1798125 2.084188 1.999643 11.254072 8 2.584374114 1.6053897 2.453163 2.000089 11.256447 9 0.863363636 1.0685646 1.457286 2.000659 11.001834 10 -0.190736476 1.4468239 1.829696 2.000229 10.425032 11 0.716178594 2.7498080 2.406190 1.999487 9.906237 12 -1.670744103 1.1184815 2.206973 2.000288 8.993506 13 1.011970392 2.7794836 2.560877 2.000160 12.564313 14 -0.099591556 1.5176429 1.841669 2.000175 12.050816 15 3.230713917 1.8450534 2.065576 2.000189 9.243683 16 0.734370382 0.8649671 1.550325 2.000698 10.320533 17 1.156661539 3.8099910 2.842250 1.999826 10.134682 18 -0.496844480 2.0082680 1.456640 2.000119 10.498172 19 -0.001995988 1.7054230 2.702496 1.999963 8.572382 20 -0.190562902 2.6200714 1.822893 1.999612 9.683227
从列名称中删除下划线-
names(df2)<-gsub("\\_","",names(df2)) df2
输出结果
y1 y2 y3 y4 y5 1 0.35283126 2.7403674 1.5855939 1.999599 10.615962 2 2.04048363 1.7570445 1.9365559 1.999934 10.734033 3 -0.99194313 1.9299296 3.4318183 2.000200 8.821012 4 0.03923376 2.8984508 1.3765896 1.999948 8.371278 5 0.48921437 1.7272755 2.0049735 1.999814 10.769563 6 -1.52296501 1.1843431 1.3387394 1.999670 10.984169 7 -0.43659539 3.0847073 2.0724138 2.000099 10.163438 8 -1.07562516 2.4046583 2.3631921 1.999976 8.119308 9 0.25897051 4.0599361 2.5180669 2.000179 8.780155 10 0.90011031 0.5844179 3.0924616 2.000156 10.945022 11 -1.01455924 1.3601391 1.3491111 2.000197 11.172243 12 -1.21902395 1.5613617 1.6721161 2.000014 9.752595 13 1.10335026 3.0485505 2.5479672 2.000200 10.851384 14 1.66150031 0.9157312 2.0733168 2.000298 10.045139 15 -2.88733135 1.6426962 1.4906487 1.999932 10.596103 16 -0.20689147 1.7962494 0.9636048 1.999893 10.489436 17 -0.66668766 2.0058826 1.7932363 2.000102 10.702172 18 -0.32072057 2.8834813 2.1764040 2.000017 10.699573 19 -0.29862766 4.6416591 2.8638125 1.999819 10.211451 20 -0.47632229 1.2781510 2.8128627 1.999981 9.046588