如果在R数据帧中有两个连续的列和一个分类列,则可以在分类列中找到类别的连续值之间的相关系数。为此,我们可以按功能使用并通过spearman方法传递cor函数,如以下示例所示。
请看以下数据帧:
> x1<-sample(c("A","B","C"),20,replace=TRUE) > y1<-rnorm(20,1,0.24) > z1<-rpois(20,2) > df1<-data.frame(x1,y1,z1) > df1
输出结果
x1 y1 z1 1 A 1.1155324 2 2 C 0.9801564 3 3 B 0.9116162 1 4 A 0.8406772 3 5 C 0.8009355 2 6 A 0.9331637 2 7 B 1.0642089 1 8 B 1.1633515 0 9 B 1.1599037 5 10 B 1.0509981 2 11 B 0.7574267 1 12 B 0.8456225 1 13 B 0.8926751 2 14 B 0.6074419 3 15 C 0.7999792 0 16 A 1.0685236 2 17 B 0.9756677 3 18 A 0.9495342 0 19 C 1.0109747 2 20 A 0.9090985 4
查找x1中类别的y1和z1之间的相关性:
> by(df1,df1$x1,FUN=function(x) cor(df1$y1,df1$z1,method="spearman")) df1$x1: A
输出结果
[1] 0.03567607
df1$x1: B
输出结果
[1] 0.03567607
df1$x1: C
输出结果
[1] 0.03567607
> x2<-sample(c("India","China","France"),20,replace=TRUE) > y2<-rexp(20,0.335) > z2<-runif(20,2,10) > df2<-data.frame(x2,y2,z2) > df2
输出结果
x2 y2 z2 1 France 2.31790394 2.649538 2 China 10.61012173 8.340615 3 France 5.00085220 6.602884 4 France 1.67707140 7.722530 5 India 9.60663732 9.837268 6 France 1.46030289 5.370930 7 France 10.44614704 9.035748 8 India 0.39506766 6.318701 9 China 1.83071453 7.282782 10 China 0.23080001 7.210144 11 India 2.27763766 9.233019 12 China 18.21276888 9.928614 13 France 1.72085517 9.176826 14 India 4.77786071 8.899026 15 China 8.55501571 7.240147 16 China 0.19832026 5.641800 17 India 0.03113389 6.928705 18 China 0.56958471 3.496314 19 China 0.72728737 6.903436 20 India 8.73571474 5.286486
查找x2中类别的y2和z2之间的相关性:
> by(df2,df2$x2,FUN=function(x) cor(df2$y2,df2$z2,method="spearman")) df2$x2: China
输出结果
[1] 0.487218
df2$x2: France
输出结果
[1] 0.487218
df2$x2: India
输出结果
[1] 0.487218