R Cheat Sheet (13): Simulation
Author: nex3z
2015-05-20
使用像R一样的统计编程语言的好处之一,是其具备大量生成随机数的工具。
1. 随机采样
使用sample() 函数可以生成随机数,下面的例子模拟了投掷一个六面骰子四次的结果,连续执行可以得到不同的结果:
> sample(1:6, 4, replace = TRUE)
> sample(1:6, 4, replace = TRUE)
> sample(1:6, 4, replace = TRUE)
[1] 3 6 3 1
> sample(1:6, 4, replace = TRUE)
[1] 3 2 3 4
> sample(1:6, 4, replace = TRUE)
[1] 3 6 3 1
> sample(1:6, 4, replace = TRUE)
[1] 3 2 3 4
其中第一个参数1:6 为采样的范围,第二个参数4 为采样次数,第三个参数replace = TRUE 表示同样的数字在出现一次之后,还可以继续出现,留空则不允许同一数字出现多次:
[1] 8 7 20 13 4 14 5 17 11 10
> sample(1:20, 10)
[1] 8 7 20 13 4 14 5 17 11 10
> sample(1:20, 10)
[1] 8 7 20 13 4 14 5 17 11 10
当不指明采样次数时,默认使用采样次数与数据长度相等,
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
[1] "F" "Z" "R" "D" "H" "M" "W" "L" "J" "C" "N" "U" "Q" "K" "I" "P" "G" "Y" "O" "E" "T" "A" "B" "X" "S" "V"
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> sample(LETTERS)
[1] "F" "Z" "R" "D" "H" "M" "W" "L" "J" "C" "N" "U" "Q" "K" "I" "P" "G" "Y" "O" "E" "T" "A" "B" "X" "S" "V"
> LETTERS
[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
> sample(LETTERS)
[1] "F" "Z" "R" "D" "H" "M" "W" "L" "J" "C" "N" "U" "Q" "K" "I" "P" "G" "Y" "O" "E" "T" "A" "B" "X" "S" "V"
LETTERS 为R中预定义的变量,其中包含了26个英文字母。
现在假设我们要仿真投掷100次硬币的结果,硬币是不公平的,有0.3的概率投掷出反面,0.7的概率投掷出正面。此时要为sample() 函数加上prob 参数指明概率:
> flips <- sample(c(0, 1), 100, replace=TRUE, prob=c(0.3, 0.7))
[1] 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0
[61] 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 0 1 0
> flips <- sample(c(0, 1), 100, replace=TRUE, prob=c(0.3, 0.7))
> flips
[1] 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0
[61] 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 0 1 0
> sum(flips)
[1] 62
> flips <- sample(c(0, 1), 100, replace=TRUE, prob=c(0.3, 0.7))
> flips
[1] 1 1 1 1 0 1 0 1 1 0 1 0 1 1 1 1 0 0 1 1 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 0 1 1 1 1 0 1 0 1 1 1 1 1 0 1 0 1 1 0 0 0 0 1 0 0
[61] 1 1 1 0 0 1 0 1 0 1 1 0 1 1 1 1 0 1 1 0 1 0 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 0 1 0
> sum(flips)
[1] 62
实际仿真的结果中有62次掷出正面。
2. 随机分布
2.1. 二项式分布
上面所示的投掷硬币实验的输出是二进制的(0或1),可以使用rbinom() 函数来仿真二项随机变量(binomial random variable)。R中的每一种概率分布都有一个对应的r*** 函数(r代表random),一个d函数(d代表density),一个p函数(p代表probability),和一个q***函数(q代表for quantile)。这里使用rbinom() 的函数名中,r代表随机,binom代表二项分布,其签名为:
rbinom(n, size, prob)
下面再次仿真投掷100次硬币的结果,硬币是不公平的,有0.3的概率投掷出反面,0.7的概率投掷出正面:
> rbinom(1, size=100, prob=0.7)
> rbinom(1, size=100, prob=0.7)
[1] 74
> rbinom(1, size=100, prob=0.7)
[1] 74
这里直接得到仿真结果,投掷出74次正面。
下面请求100次观测,每次观测包含1次实验,得到每次实验的结果:
> flips2 <- rbinom(n=100, size=1, prob=0.7)
[1] 1 1 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
> flips2 <- rbinom(n=100, size=1, prob=0.7)
> flips2
[1] 1 1 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
> sum(flips2)
[1] 80
> flips2 <- rbinom(n=100, size=1, prob=0.7)
> flips2
[1] 1 1 1 1 1 0 1 1 0 1 0 0 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 0 0 1 0 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1
[61] 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 1 0 0 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
> sum(flips2)
[1] 80
2.2. 正态分布
函数rnorm() 中的norm代表正态分布,其签名为:
rnorm(n, mean = 0, sd = 1)
rnorm(n, mean = 0, sd = 1)
rnorm(n, mean = 0, sd = 1)
下面生成10个均值为0,标准差为1的随机数:
[1] -0.3103136 -0.6610793 -1.1848685 -1.4658500 0.7673424 0.1602987 -0.8568694 1.3657184 -0.6502526 -0.9302382
> rnorm(10)
[1] -0.3103136 -0.6610793 -1.1848685 -1.4658500 0.7673424 0.1602987 -0.8568694 1.3657184 -0.6502526 -0.9302382
> rnorm(10)
[1] -0.3103136 -0.6610793 -1.1848685 -1.4658500 0.7673424 0.1602987 -0.8568694 1.3657184 -0.6502526 -0.9302382
下面生成10个均值为100,标准差为25的随机数:
[1] 92.38162 115.10283 90.42909 112.25354 126.44536 88.07762 107.79431 63.36577 92.94852 129.13348
> rnorm(10, 100, 25)
[1] 92.38162 115.10283 90.42909 112.25354 126.44536 88.07762 107.79431 63.36577 92.94852 129.13348
> rnorm(10, 100, 25)
[1] 92.38162 115.10283 90.42909 112.25354 126.44536 88.07762 107.79431 63.36577 92.94852 129.13348
2.3. 泊松分布
函数rpois() 中的pois代表泊松分布,其签名为:
rpois(n, lambda)
用例如:
> rpois(5, 10)
[1] 11 11 10 13 9
> rpois(5, 10)
[1] 11 11 10 13 9
除上面介绍到的分布之外,R中还提供了很多其他的分布,使用方法可以直接查阅文档。
2.4. 生成随机矩阵
使用replicate() 函数可以通过重复操作来生成矩阵,下面生成一个随机数矩阵:
> my_pois <- replicate(15, rpois(5, 10))
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 11 13 14 7 8 15 8 13 8 9 13 6 9 9 9
[2,] 16 9 5 5 5 13 10 11 11 11 11 9 8 6 8
[3,] 8 8 7 6 3 14 10 10 14 13 10 6 10 8 12
[4,] 15 11 7 11 11 10 8 10 16 7 14 15 11 7 5
[5,] 13 10 10 6 12 9 11 13 8 8 13 14 13 11 9
> my_pois <- replicate(15, rpois(5, 10))
> my_pois
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 11 13 14 7 8 15 8 13 8 9 13 6 9 9 9
[2,] 16 9 5 5 5 13 10 11 11 11 11 9 8 6 8
[3,] 8 8 7 6 3 14 10 10 14 13 10 6 10 8 12
[4,] 15 11 7 11 11 10 8 10 16 7 14 15 11 7 5
[5,] 13 10 10 6 12 9 11 13 8 8 13 14 13 11 9
> my_pois <- replicate(15, rpois(5, 10))
> my_pois
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15]
[1,] 11 13 14 7 8 15 8 13 8 9 13 6 9 9 9
[2,] 16 9 5 5 5 13 10 11 11 11 11 9 8 6 8
[3,] 8 8 7 6 3 14 10 10 14 13 10 6 10 8 12
[4,] 15 11 7 11 11 10 8 10 16 7 14 15 11 7 5
[5,] 13 10 10 6 12 9 11 13 8 8 13 14 13 11 9