要在R中创建随机样本,我们可以使用样本函数,但是如果提供了值的权重,则需要基于权重分配值的概率。例如,如果我们有一个数据帧df,其中包含具有一些值的列X和具有相应权重的另一列Weight,则可以如下生成大小为10的随机样本-
df[sample(seq_len(nrow(df)),10,prob=df$Weight_x),]
请看以下数据帧-
set.seed(1256) x<−rnorm(20,5,1) weight_x<−sample(1:10,20,replace=TRUE) df<−data.frame(x,weight_x) df
输出结果
x weight_x 1 4.126636 10 2 5.806501 1 3 5.768463 10 4 5.980315 8 5 6.593158 2 6 4.298533 10 7 6.196574 4 8 4.136517 5 9 4.504645 10 10 4.416107 6 11 5.257177 10 12 5.836453 1 13 5.334041 10 14 4.959786 2 15 3.406828 7 16 4.149746 2 17 4.657464 4 18 4.820102 10 19 5.401021 9 20 6.718216 6
使用重量列查找不同的样本-
df[sample(seq_len(nrow(df)),5,prob=df$weight_x),]
输出结果
x weight_x 11 5.257177 10 19 5.401021 9 13 5.334041 10 10 4.416107 6 5 6.593158 2
df[sample(seq_len(nrow(df)),3,prob=df$weight_x),]
输出结果
x weight_x 13 5.334041 10 3 5.768463 10 18 4.820102 10
df[sample(seq_len(nrow(df)),7,prob=df$weight_x),]
输出结果
x weight_x 9 4.504645 10 19 5.401021 9 12 5.836453 1 5 6.593158 2 15 3.406828 7 11 5.257177 10 6 4.298533 10
df[sample(seq_len(nrow(df)),10,prob=df$weight_x),]
输出结果
x weight_x 4 5.980315 8 9 4.504645 10 19 5.401021 9 1 4.126636 10 13 5.334041 10 12 5.836453 1 11 5.257177 10 18 4.820102 10 10 4.416107 6 3 5.768463 10
df[sample(seq_len(nrow(df)),9,prob=df$weight_x),]
输出结果
x weight_x 8 4.136517 5 11 5.257177 10 7 6.196574 4 4 5.980315 8 9 4.504645 10 6 4.298533 10 19 5.401021 9 18 4.820102 10 16 4.149746 2
df[sample(seq_len(nrow(df)),4,prob=df$weight_x),]
输出结果
x weight_x 1 4.126636 10 6 4.298533 10 11 5.257177 10 7 6.196574 4
df[sample(seq_len(nrow(df)),15,prob=df$weight_x),]
输出结果
x weight_x 3 5.768463 10 15 3.406828 7 19 5.401021 9 16 4.149746 2 9 4.504645 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 18 4.820102 10 6 4.298533 10 4 5.980315 8 17 4.657464 4 1 4.126636 10 20 6.718216 6 13 5.334041 10
df[sample(seq_len(nrow(df)),2,prob=df$weight_x),]
输出结果
x weight_x 11 5.257177 10 13 5.334041 10
df[sample(seq_len(nrow(df)),12,prob=df$weight_x),]
输出结果
x weight_x 1 4.126636 10 3 5.768463 10 8 4.136517 5 11 5.257177 10 10 4.416107 6 6 4.298533 10 13 5.334041 10 4 5.980315 8 20 6.718216 6 12 5.836453 1 18 4.820102 10 19 5.401021 9
df[sample(seq_len(nrow(df)),18,prob=df$weight_x),]
输出结果
x weight_x 5 6.593158 2 4 5.980315 8 6 4.298533 10 20 6.718216 6 15 3.406828 7 3 5.768463 10 9 4.504645 10 10 4.416107 6 13 5.334041 10 19 5.401021 9 8 4.136517 5 11 5.257177 10 18 4.820102 10 1 4.126636 10 7 6.196574 4 12 5.836453 1 17 4.657464 4 16 4.149746 2