实验五 基本数据管理(二)

发布于:2022-12-07 ⋅ 阅读:(535) ⋅ 点赞:(0)

实验5 基本数据管理(二)

实验目的:操纵日期和缺失值,熟悉数据类型的转换,掌握变量的创建和重编码,掌握数据集的排序、合并与取子集,掌握选入和丢弃变量

实验内容:

1.创建如下数据框city

name province zipcode
1 Shijiazhuang Hebei 050000
2 Taiyuan Shanxi 030000
3 Guiyang Guizhou 550000
> name<-c('Shijiazhuang','Taiyuan','Guiyang')
> province<-c('Hebei','Shanxi','Guizhou')
> zipcode<-c('050000','030000','550000')
> city<-data.frame(name,province,zipcode)
> city
          name province zipcode
1 Shijiazhuang    Hebei  050000
2      Taiyuan   Shanxi  030000
3      Guiyang  Guizhou  550000

image-20220917182821949

2 使用$为city数据框增加一列人口数population(分别取值为:1103,446,488)

> city$population<-c(1103,446,488)
> city
          name province zipcode population
1 Shijiazhuang    Hebei  050000       1103
2      Taiyuan   Shanxi  030000        446
3      Guiyang  Guizhou  550000        488

image-20220917184533741

3 使用within函数或者transform函数为步骤2之后的city数据框增加一列log(population)

> transform(city,'log(population)'=log(city$population))#population中的元素要为数字类型
          name province zipcode population log.population.
1 Shijiazhuang    Hebei  050000       1103        7.005789
2      Taiyuan   Shanxi  030000        446        6.100319
3      Guiyang  Guizhou  550000        488        6.190315

image-20220917184625422

4 创建如下数据框data1,其中,Sale2013-Sale2016变量值通过生成随机数产生。

image-20220917184809844

> Sale2013<-trunc(runif(5,2000,5000))
> Sale2014<-trunc(runif(5,2000,5000))
> Sale2015<-trunc(runif(5,2000,5000))
> Sale2016<-trunc(runif(5,2000,5000))
> Name<-c('苹果','谷歌','脸书','亚马逊','腾讯')
> Company<-c('Apple','Google','Facebook','Amozon','Tencent')
> data1<-data.frame(Name,Company,Sale2013,Sale2014,Sale2015,Sale2016)
> data1
    Name  Company Sale2013 Sale2014 Sale2015 Sale2016
1   苹果    Apple     4811     2277     3281     2053
2   谷歌   Google     2620     3368     3020     3058
3   脸书 Facebook     3556     2091     4630     4396
4 亚马逊   Amozon     3933     4463     3702     4174
5   腾讯  Tencent     2926     4232     4081     4994

image-20220918112854677

5 使用melt函数对数据框data1进行变形,结果如下所示。

Name Company Year Sale
苹果 Apple Sale2013 5000
谷歌 Google Sale2013 3500
脸书 Facebook Sale2013 2300
亚马逊 Amozon Sale2013 2100
腾讯 Tencent Sale2013 3100
苹果 Apple Sale2014 5050
谷歌 Google Sale2014 3800
脸书 Facebook Sale2014 2900
亚马逊 Amozon Sale2014 2500
腾讯 Tencent Sale2014 3300
苹果 Apple Sale2015 5050
谷歌 Google Sale2015 4000
脸书 Facebook Sale2015 3200
亚马逊 Amozon Sale2015 2800
腾讯 Tencent Sale2015 3700
苹果 Apple Sale2016 6000
谷歌 Google Sale2016 4800
脸书 Facebook Sale2016 4500
亚马逊 Amozon Sale2016 3500
腾讯 Tencent Sale2016 4300
> install.packages('reshape')#下载reshape包
> library(reshape)
>melt(data1,id.vars=c("Name","Company"),measure.vars=c("Sale2013","Sale2014","Sale2015","Sale2016"),variable_name ="Year")
     Name  Company     Year value
1    苹果    Apple Sale2013  4811
2    谷歌   Google Sale2013  2620
3    脸书 Facebook Sale2013  3556
4  亚马逊   Amozon Sale2013  3933
5    腾讯  Tencent Sale2013  2926
6    苹果    Apple Sale2014  2277
7    谷歌   Google Sale2014  3368
8    脸书 Facebook Sale2014  2091
9  亚马逊   Amozon Sale2014  4463
10   腾讯  Tencent Sale2014  4232
11   苹果    Apple Sale2015  3281
12   谷歌   Google Sale2015  3020
13   脸书 Facebook Sale2015  4630
14 亚马逊   Amozon Sale2015  3702
15   腾讯  Tencent Sale2015  4081
16   苹果    Apple Sale2016  2053
17   谷歌   Google Sale2016  3058
18   脸书 Facebook Sale2016  4396
19 亚马逊   Amozon Sale2016  4174
20   腾讯  Tencent Sale2016  4994

image-20220918121841921

6 随机生成30个在区间[0,100]的学生成绩信息,保存在向量score中,按照如下规则对其重新编码,并将重新编码后的成绩信息存储于class中。

class=1, if score<60

class=2, if score>=60 and score<=80

class=3, if score>80

> score<-trunc(runif(30,0,100))
> score
 [1] 50 56 44  0 39 63 68 73 68 70 36  3 65 16 47 53 64 88 16 93 70 40 25 84 42  9 86 56 73  0
> class<-ifelse(score<60,1,ifelse(score>=60 & score<=80,2,3))
> class
 [1] 1 1 1 1 1 2 2 2 2 2 1 1 2 1 1 1 2 3 1 3 2 1 1 3 1 1 3 1 2 1

image-20220918095343494

7 将上述class的数字编码改为字符编码,即1,2,3分别对应“A”,“B”,“C”。

> class<-ifelse(score<60,'A',ifelse(score>=60 & score<=80,'B','C'))
> class
 [1] "A" "A" "A" "A" "A" "B" "B" "B" "B" "B" "A" "A" "B" "A" "A" "A" "B" "C" "A" "C" "B" "A" "A" "C" "A" "A"
[27] "C" "A" "B" "A"

image-20220918095604771

8 将mtcars数据集中的前6行存储在数据集mydf中,调用fix()打开交互式编辑器,将wt修改为“weight”.使用rename函数将vs修改为“Engine”。编辑后的mydf数据显示结果为:

image-20220918100513245

> mydf<-mtcars[1:6,]
> mydf
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
> fix(mydf)
> library(reshape)
> rename(mydf,c(vs='Engine'))
                   mpg cyl disp  hp drat weight  qsec Engine am gear carb
Mazda RX4         21.0   6  160 110 3.90  2.620 16.46      0  1    4    4
Mazda RX4 Wag     21.0   6  160 110 3.90  2.875 17.02      0  1    4    4
Datsun 710        22.8   4  108  93 3.85  2.320 18.61      1  1    4    1
Hornet 4 Drive    21.4   6  258 110 3.08  3.215 19.44      1  0    3    1
Hornet Sportabout 18.7   8  360 175 3.15  3.440 17.02      0  0    3    2
Valiant           18.1   6  225 105 2.76  3.460 20.22      1  0    3    1

image-20220918100015415

image-20220918100442568

9 生成如下数据框stu.score,部分数据如下所示:

image-20220918101158459

其中:

1.) 数据框共96个观测

2.) No表示学生学号:1001-1006共6位学生

3.) sub.new表示科目:math,engish,Python,Java四门课程

4.) term.new表示学期:1-4 四个学期

5.) 成绩score服从正态分布,均值为70,标准差为10

6.) 因此,644共96个观测值

7.) 要求显示前12行数据和后12行数据

> no<-paste(100,1:6,sep = '')#利用广播
> no
[1] "1001" "1002" "1003" "1004" "1005" "1006"
> sub.new<-rep(c('Math','Eng','Python','Java'),each=6,times=4)#times参数可以不用加
> sub.new
 [1] "Math"   "Math"   "Math"   "Math"   "Math"   "Math"   "Eng"    "Eng"    "Eng"    "Eng"    "Eng"   
[12] "Eng"    "Python" "Python" "Python" "Python" "Python" "Python" "Java"   "Java"   "Java"   "Java"  
[23] "Java"   "Java"   "Math"   "Math"   "Math"   "Math"   "Math"   "Math"   "Eng"    "Eng"    "Eng"   
[34] "Eng"    "Eng"    "Eng"    "Python" "Python" "Python" "Python" "Python" "Python" "Java"   "Java"  
[45] "Java"   "Java"   "Java"   "Java"   "Math"   "Math"   "Math"   "Math"   "Math"   "Math"   "Eng"   
[56] "Eng"    "Eng"    "Eng"    "Eng"    "Eng"    "Python" "Python" "Python" "Python" "Python" "Python"
[67] "Java"   "Java"   "Java"   "Java"   "Java"   "Java"   "Math"   "Math"   "Math"   "Math"   "Math"  
[78] "Math"   "Eng"    "Eng"    "Eng"    "Eng"    "Eng"    "Eng"    "Python" "Python" "Python" "Python"
[89] "Python" "Python" "Java"   "Java"   "Java"   "Java"   "Java"   "Java"  
> term.new<-rep(c(1:4),each=24)
> term.new
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3
[54] 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
> score<-trunc(rnorm(96,70,10))
> score
 [1] 68 80 66 65 71 75 80 71 65 77 79 70 79 58 59 68 53 81 85 61 63 77 68 79 69 65 61 63 65 75 74 59 60 82 74
[36] 56 66 71 77 54 68 57 61 85 62 64 72 61 99 68 59 64 64 64 67 87 77 61 72 67 72 65 66 66 44 61 75 69 55 57
[71] 59 68 53 61 76 69 78 64 54 61 70 64 66 75 69 76 66 66 80 70 88 68 87 70 62 55
> stu.score<-data.frame(no,sub.new,term.new,score)
> head(stu.score,12)
     no sub.new term.new score
1  1001    Math        1    68
2  1002    Math        1    80
3  1003    Math        1    66
4  1004    Math        1    65
5  1005    Math        1    71
6  1006    Math        1    75
7  1001     Eng        1    80
8  1002     Eng        1    71
9  1003     Eng        1    65
10 1004     Eng        1    77
11 1005     Eng        1    79
12 1006     Eng        1    70
> tail(stu.score,12)
     no sub.new term.new score
85 1001  Python        4    69
86 1002  Python        4    76
87 1003  Python        4    66
88 1004  Python        4    66
89 1005  Python        4    80
90 1006  Python        4    70
91 1001    Java        4    88
92 1002    Java        4    68
93 1003    Java        4    87
94 1004    Java        4    70
95 1005    Java        4    62
96 1006    Java        4    55

image-20220918103327692

image-20220918103337414

10.继续对stu.score进行操作

1.) 基于上述的stu.score数据框,生成第一学期学生成绩信息,得到子数据框stu.score1.

> stu.score1<-stu.score[stu.score$term.new==1,]
> stu.score1
     no sub.new term.new score
1  1001    Math        1    68
2  1002    Math        1    80
3  1003    Math        1    66
4  1004    Math        1    65
5  1005    Math        1    71
6  1006    Math        1    75
7  1001     Eng        1    80
8  1002     Eng        1    71
9  1003     Eng        1    65
10 1004     Eng        1    77
11 1005     Eng        1    79
12 1006     Eng        1    70
13 1001  Python        1    79
14 1002  Python        1    58
15 1003  Python        1    59
16 1004  Python        1    68
17 1005  Python        1    53
18 1006  Python        1    81
19 1001    Java        1    85
20 1002    Java        1    61
21 1003    Java        1    63
22 1004    Java        1    77
23 1005    Java        1    68
24 1006    Java        1    79

image-20220918104045221

2.) 将上述的stu.score1转换为宽数据,数据框为stu.score1.new.实验结果如下所示:

image-20220918104302103

> library(reshape2)
> stu.score1.new<-dcast(stu.score1,no+term.new~stu.score1$sub.new,value.var = 'score')
> stu.score1.new
    no term.new Eng Java Math Python
1 1001        1  66   58   80     68
2 1002        1  61   70   46     69
3 1003        1  65   67   75     59
4 1004        1  73   77   58     70
5 1005        1  65   65   75     64
6 1006        1  90   72   62     62

image-20220918135546967

3.) 筛选出stu.score1.new中,Eng成绩>70,Python成绩大于80的数据。(实际条件表达式与自己的实验数据相一致即可)

> stu.score1.new[stu.score1.new$Eng>70&stu.score1.new$Python>80,]
[1] no       term.new Eng      Java     Math     Python  
<0 行> (或0-长度的row.names)

image-20220918135603164

11.使用R语言基础安装包的graphics中的pie函数绘制基础饼图,使用plotrix包中的pie3D函数绘制3D饼图。以data3_1.csv数据集中的满意度的调查者人数为数据源,绘制如下所示饼图。

image-20220918135800427

实验步骤:

1.) 读取数据data3_1.csv到数据集data3_1

2.) 使用函数par(mfrow=c(1,2),mai=c(0.1,0.6,0.1,0.6),cex=0.8)设置布局为1行2列的矩阵,参数mai设置以数值向量表示边界大小,顺序为"下、左、上、右",单位为英寸。

3.) 使用table()生成满意度的频数分布表,保存至tab

4.) 使用names()设置名称向量,保存至name

5.) 使用prop.table(tab)计算不同满意度出现的概率值,乘以100之后的到百分比,存储在percent中

6.) 使用paste()设置标签向量的显示形式为:中立 34%,即中立和百分数之间有一个英文半角空格

7.) 使用pie()函数绘制普通饼图

8.) 使用plottrix包中的pie3D绘制3维饼图

> data3_1<-read.csv('E:/R语言/作业/data3_1.csv')
> par(mfrow=c(1,2),mai=c(0.1,0.6,0.1,0.6),cex=0.8)
> tab<-table(data3_1$满意度)
> tab

不满意   满意   中立 
   800    520    680 
> name<-names(tab)
> name
[1] "不满意" "满意"   "中立"  
> percent<-prop.table(tab)*100
> percent

不满意   满意   中立 
    40     26     34  
> lab<-paste(name,' ',percent,'%',sep = '')
> lab
[1] "不满意 40%" "满意 26%"   "中立 34%"  
> pie(percent,labels = lab,col = rainbow(length(lab)))
> install.packages('plotrix')#未安装要先进行安装
> library(plotrix)
> pie3D(percent,labels = lab,explode = 0.1)

image-20220918142102500

image-20220918142044528

本文含有隐藏内容,请 开通VIP 后查看