相关矩阵可帮助我们一次确定多个变量之间线性关系的方向和强度。因此,很容易决定在线性模型中应使用哪些变量,以及哪些变量可以删除。我们可以简单地将cor函数与数据帧名称一起使用来找到相关矩阵。
考虑下面的连续变量数据框-
> set.seed(9) > x1<-rnorm(20) > x2<-rnorm(20,0.2) > x3<-rnorm(20,0.5) > x4<-rnorm(20,0.8) > x5<-rnorm(20,1) > df<-data.frame(x1,x2,x3,x4,x5) > df x1 x2 x3 x4 x5 1 -0.76679604 1.95699294 -0.30845634 1.081222227 1.11407587 2 -0.81645834 0.38225214 -1.51938169 -0.402708626 -0.05365988 3 -0.14153519 -0.06688875 -0.23872407 1.265163691 1.15599915 4 -0.27760503 1.12642163 0.88288656 1.152016386 2.30039421 5 0.43630690 -0.49333188 2.23086367 0.210143783 -0.15588645 6 -1.18687252 2.88199007 0.29691805 -0.053599959 1.21604185 7 1.19198691 0.42252448 -0.49639735 0.553267880 1.80447819 8 -0.01819034 -0.50667241 -0.80653629 2.339338571 0.26788427 9 -0.24808460 0.61721325 -0.49783160 1.346077684 -0.61809812 10 -0.36293689 0.56955678 -0.06502873 2.364961851 1.83906927 11 1.27757055 -0.71376435 2.25205784 1.049670178 0.64856205 12 -0.46889715 -0.11691475 -0.04777135 -1.162418630 0.28371561 13 0.07105410 1.24905921 -0.35852571 -0.009060223 0.05970815 14 -0.26603845 0.36811181 0.54929453 0.301314912 1.73016571 15 1.84525720 0.23144021 0.29995552 1.105121769 0.56212952 16 -0.83944966 -0.81033054 -0.60395445 0.510792758 0.75061790 17 -0.07744806 0.58275153 0.74058804 2.257714201 0.32792906 18 -2.61770553 -0.61969653 0.88111362 1.673755484 1.80101407 19 0.88788403 0.56171109 2.73045895 -0.152956042 -0.48886193 20 -0.70749145 0.29337136 1.69920239 0.768324524 1.45401160
找到df中所有变量的相关矩阵-
> cor(df) x1 x2 x3 x4 x5 x1 1.00000000 -0.1332350 0.25115920 -0.04210749 -0.28891754 x2 -0.13323501 1.0000000 -0.15071432 -0.15398933 0.14759671 x3 0.25115920 -0.1507143 1.00000000 -0.05268172 -0.02505888 x4 -0.04210749 -0.1539893 -0.05268172 1.00000000 0.27861734 x5 -0.28891754 0.1475967 -0.02505888 0.27861734 1.00000000
考虑下面的连续变量数据框-
> a1<-rpois(20,2) > a2<-rpois(20,5) > a3<-rpois(20,8) > a4<-rpois(20,10) > a5<-rpois(20,15) > df_new<-data.frame(a1,a2,a3,a4,a5) > df_new a1 a2 a3 a4 a5 1 2 8 9 5 13 2 1 4 7 11 16 3 2 2 5 12 11 4 1 3 12 9 15 5 1 4 8 4 14 6 0 6 9 8 14 7 2 6 12 10 9 8 7 5 13 11 20 9 0 6 6 13 19 10 4 7 10 8 12 11 0 3 14 8 20 12 3 2 10 15 13 13 2 8 7 12 14 14 2 6 10 11 14 15 2 1 5 10 21 16 2 3 12 10 14 17 3 6 7 9 17 18 0 7 6 14 16 19 2 6 6 9 15 20 2 3 7 8 12
找到df_new中所有变量的相关矩阵-
> cor(df_new) a1 a2 a3 a4 a5 a1 1.000000000 0.02485671 0.26409706 0.05617819 0.009229284 a2 0.024856710 1.00000000 -0.04540504 -0.10727065 -0.184062998 a3 0.264097059 -0.04540504 1.00000000 -0.17991092 -0.013487095 a4 0.056178192 -0.10727065 -0.17991092 1.00000000 0.115063107 a5 0.009229284 -0.18406300 -0.01348709 0.11506311 1.000000000