回归中的标准化系数也称为beta系数,它们是通过对因变量和自变量进行标准化而获得的。对因变量和自变量的标准化意味着以均值和标准偏差分别变为0和1的方式转换这些变量的值。在创建模型时,我们可以通过使用比例函数找到线性回归模型的标准化系数。
请看以下数据帧-
> set.seed(99) > x<-rnorm(10,1.5) > y<-rnorm(10,2) > df1<-data.frame(x,y) > df1
输出结果
x y 1 1.7139625 1.2542310 2 1.9796581 2.9215504 3 1.5878287 2.7500544 4 1.9438585 -0.5085540 5 1.1371621 -1.0409341 6 1.6226740 2.0002658 7 0.6361548 1.6059810 8 1.9896243 0.2549723 9 1.1358831 2.4986315 10 0.2057580 2.2709538
创建回归模型-
> Model1<-lm(y~x,data=df1) > summary(Model1)
输出结果
Call: lm(formula = y ~ x, data = df1) Residuals: Min 1Q Median 3Q Max -2.5458 -0.7047 0.1862 0.9178 1.7566 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.9635 1.2055 1.629 0.142 x -0.4034 0.7988 -0.505 0.627 Residual standard error: 1.453 on 8 degrees of freedom Multiple R-squared: 0.0309, Adjusted R-squared: -0.09024 F-statistic: 0.2551 on 1 and 8 DF, p-value: 0.6272
创建标准化系数的回归模型-
> Model1_standardized_coefficients<-lm(scale(y)~scale(x),data=df1) > summary(Model1_standardized_coefficients)
输出结果
Call: lm(formula = scale(y) ~ scale(x), data = df1) Residuals: Min 1Q Median 3Q Max -1.8288 -0.5063 0.1338 0.6593 1.2619 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -3.701e-18 3.302e-01 0.000 1.000 scale(x) -1.758e-01 3.480e-01 -0.505 0.627 Residual standard error: 1.044 on 8 degrees of freedom Multiple R-squared: 0.0309, Adjusted R-squared: -0.09024 F-statistic: 0.2551 on 1 and 8 DF, p-value: 0.6272
让我们看另一个例子-
> y<-rnorm(10,2.5) > x1<-rnorm(10,0.2) > x2<-rnorm(10,0.5) > x3<-rnorm(10,1.5) > df2<-data.frame(x1,x2,x3,y) > df2
输出结果
x1 x2 x3 y 1 1.573053947 0.6329786 -0.07655243 3.598922 2 0.650256559 -1.1792643 2.12408260 3.252513 3 0.053706144 0.2215204 1.83022068 2.440583 4 0.328097240 -1.0524110 1.10187774 2.155431 5 -2.094720947 -0.8796993 0.41860307 2.722668 6 -1.166568921 -0.8570566 1.42307794 3.051786 7 0.002520447 -0.4211372 0.97446338 3.183643 8 0.268085782 -0.3668177 1.89128965 1.954121 9 0.290503410 2.1566444 0.81954674 1.132564 10 0.522759967 0.3449203 0.75130307 3.900052
> Model2_standardized_coefficients<- lm(scale(y)~scale(x1)+scale(x2)+scale(x3),data=df2) > summary(Model2_standardized_coefficients)
输出结果
Call: lm(formula = scale(y) ~ scale(x1) + scale(x2) + scale(x3), data = df2) Residuals: Min 1Q Median 3Q Max -1.4389 -0.5336 0.1917 0.3699 1.2726 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -8.577e-17 2.970e-01 0.000 1.000 scale(x1) 3.896e-01 3.415e-01 1.141 0.297 scale(x2) -6.845e-01 3.682e-01 -1.859 0.112 scale(x3) -4.808e-01 3.409e-01 -1.410 0.208 Residual standard error: 0.9392 on 6 degrees of freedom Multiple R-squared: 0.4119, Adjusted R-squared: 0.1179 F-statistic: 1.401 on 3 and 6 DF, p-value: 0.331