R Cheat Sheet (7): Matrices and Data Frames

Author: nex3z 2015-05-07

本节内容涵盖矩阵（Matrix）和数据框（Data Frame），二者均为“正方形”的数据类型，用于存储行列形式的表格数据。二者的差别在于，matrix只能存储相同类型的数据，而data frame可以存储不同类别的数据。

Contents

1. Matrix
2. Data Frame

1. Matrix

对于之前见过的vector，可以使用length() 来查看其长度，而对vector使用dim() ，并不会得到有效结果：

> my_vector <- 1:20
> my_vector
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20
> length(my_vector)
[1] 20
> dim(my_vector)
NULL

dim() 可以获取参数对象维度（dim）属性，对dim() 进行赋值，相当于改变参数对象的维度：

> dim(my_vector) <- c(4, 5)
> dim(my_vector)
[1] 4 5
> attributes(my_vector)
$dim
[1] 4 5
> my_vector
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20

通过dim(my_vector) <- c(4, 5) ，my_vector 变成了一个4*5的矩阵，my_vector 的类型也变成了matrix：

> class(my_vector)
[1] "matrix"

为了便于区分，使用my_matrix 来代替my_vector :

> my_matrix <- my_vector

可以通过matrix() 直接生成矩阵：

> my_matrix2 <- matrix(1:20, 4, 5)
> my_matrix2
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    5    9   13   17
[2,]    2    6   10   14   18
[3,]    3    7   11   15   19
[4,]    4    8   12   16   20
> identical(my_matrix, my_matrix2)
[1] TRUE

现在，设想一下这样一个情景：我们表格中的数字代表了一项临床实验的测量结果，每一行代表一位病人，每一列代表一个测量项目。我们希望给各行命名，这样就能区分出每一行代表了哪一位病人。首先，建立一个包含了病人姓名的character vector：

> patients <- c("Bill", "Gina", "Kelly", "Sean")

然后，使用cinb()把patients 和my_matrix 组合起来：

> cbind(patients, my_matrix)
     patients                       
[1,] "Bill"   "1" "5" "9"  "13" "17"
[2,] "Gina"   "2" "6" "10" "14" "18"
[3,] "Kelly"  "3" "7" "11" "15" "19"
[4,] "Sean"   "4" "8" "12" "16" "20"

值得注意的是，在上面的结果中，所有的数字都被加上了双引号，这意味着这些数字被转换成了character。因为matrix只能存储相同类型的数据，在进行cbind(patients, my_matrix) 时，my_matrix 中的数字被隐式地强制转换成了patients 中元素的类型，即character。

2. Data Frame

使用data frame可以在不破坏原有my_matrix 的基础上，将patients 添加进去：

> my_data <- data.frame(patients, my_matrix)
> my_data
  patients X1 X2 X3 X4 X5
1     Bill  1  5  9 13 17
2     Gina  2  6 10 14 18
3    Kelly  3  7 11 15 19
4     Sean  4  8 12 16 20
> class(my_data)
[1] "data.frame"

除了对行进行命名，也可以为data frame的每一列进行命名：

> cnames <- c("patient", "age", "weight", "bp", "rating", "test")
> colnames(my_data) <- cnames
> my_data
  patient age weight bp rating test
1    Bill   1      5  9     13   17
2    Gina   2      6 10     14   18
3   Kelly   3      7 11     15   19
4    Sean   4      8 12     16   20

一	二	三	四	五	六	日
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28	29	30	31