R语言 数据的整理与清洗(Factor篇)

发布于:2024-04-30 ⋅ 阅读:(35) ⋅ 点赞:(0)

《Cookbook for R》 Manipulating Data ~ Factors

Renaming levels of a factor 重命名因子的水平

先创建示例

x <- factor(c("alpha","beta","gamma","alpha","beta"))
x
#> [1] alpha beta  gamma alpha beta 
#> Levels: alpha beta gamma

levels(x)
#> [1] "alpha" "beta"  "gamma"

想要重命名因子的level
最简单的办法是使用来自于plyr包的 revalue()mapvalues()

library(plyr)
revalue(x, c("beta"="two", "gamma"="three"))
#> [1] alpha two   three alpha two  
#> Levels: alpha two three

mapvalues(x, from = c("beta", "gamma"), to = c("two", "three"))
#> [1] alpha two   three alpha two  
#> Levels: alpha two three

也可以用R的内置函数来做以下事情

# 以重命名形式:将beta改名为two
levels(x)[levels(x)=="beta"] <- "two"

# 也可以根据位置进行重命名,但这有可能会改变的你数据
# 因为因子的数值或者位置发生变化可能导致错误的数据

# 按照不同水平下表来重命名:更改第三个项目"gamma"为"three"
levels(x)[3] <- "three"
x
#> [1] alpha two   three alpha two  
#> Levels: alpha two three

# 所有水平都重命名
levels(x) <- c("one","two","three")
x
#> [1] one   two   three one   two  
#> Levels: one two three

可以按名称重命名因子的level(不使用plyr
但请记住,这仅在all levels都呈现在列表上时才有效;
如果有任何一个不在列表中,它们将被替换为 NA

# 按名称重命名因子的level
x <- factor(c("alpha","beta","gamma","alpha","beta"))
levels(x) <- list(A="alpha", B="beta", C="gamma")
x
#> [1] A B C A B
#> Levels: A B C

也可以使用R的字符串搜索和替换函数来重命名因子的level
Note:alpha 周围的 ^$ 是为了确保整个字符串匹配。
如果没有它们而有一个名为 alphabet 的级别,它也会匹配。

# 创建一个简单的例子
x <- factor(c("alpha","beta","gamma","alpha","beta"))
x
#> [1] alpha beta  gamma alpha beta 
#> Levels: alpha beta gamma

levels(x) <- sub("^alpha$", "one", levels(x))
x
#> [1] one   beta  gamma one   beta 
#> Levels: one beta gamma


# 替换所有元素中的a为X
levels(x) <- gsub("a", "X", levels(x))
x
#> [1] one   betX  gXmmX one   betX 
#> Levels: one betX gXmmX

# gsub() 每一个因子中的对应组分都替换
# sub() 仅替换每一个因子中对应的第一个组分

Re-computing the levels of factor 重新计算因子水平

重新计算一个因子的水平。
当一个因子包含的level实际上并不存在于数据中时,这很有用。
这可能发生在数据导入过程中,也可能发生在删除某些行时。

对单个因子来说:

x <- factor(c("alpha","beta","alpha"), levels=c("alpha","beta","gamma"))
x
#> [1] alpha beta  alpha
#> Levels: alpha beta gamma

# 删掉额外的level
x <- factor(x)
x
#> [1] alpha beta  alpha
#> Levels: alpha beta

导入数据后,您可能会有一个包含因子和其他类型向量的数据框,并希望重新计算所有因子的levels
可以使用 droplevels() 函数实现

# 创建一个含有因子的数据框 (附带额外的level)
df <- data.frame(
    x = factor(c("alpha","beta","alpha"), levels=c("alpha","beta","gamma")),
    y = c(5,8,2),
    z = factor(c("red","green","green"), levels=c("red","green","blue"))
)

df$x
#> [1] alpha beta  alpha
#> Levels: alpha beta gamma
df$z
#> [1] red   green green
#> Levels: red green blue


# 取掉额外的level
df <- droplevels(df)

df$x
#> [1] alpha beta  alpha
#> Levels: alpha beta
df$z
#> [1] red   green green
#> Levels: red green

Changing the order of levels of a factor 更改因子的level顺序

R中的因子有两种:有序和无序,例如,{小,中,大}和{钢笔,刷子,铅笔}。
对于大多数分析,因子是有序的还是无序的并不重要。

1、如果因子是有序的,那么levels的具体顺序很重要(小<中<大)。

2、如果因子是无序的,levels仍然会以某种顺序出现;
但此时的顺序只是为了方便(钢笔、铅笔、画笔)–它将决定例如输出将如何打印,或图形的排列。

更改水平顺序的一种方法是在因子上使用 factor() 并直接指定顺序。
在这个例子中,也可以使用函数 ordered() 来代替 factor()

示例数据

# 创建具有错误顺序的因子
sizes <- factor(c("small", "large", "large", "small", "medium"))
sizes
#> [1] small  large  large  small  medium
#> Levels: large medium small

方法一:用factor()明确指定levels顺序

sizes <- factor(sizes, levels = c("small", "medium", "large"))
sizes
#> [1] small  large  large  small  medium
#> Levels: small medium large

方法二:对于有序因子,可以用ordered()调整它的顺序

sizes <- ordered(c("small", "large", "large", "small", "medium"))
sizes <- ordered(sizes, levels = c("small", "medium", "large"))
sizes
#> [1] small  large  large  small  medium
#> Levels: small < medium < large

方法三:使用 relevel() 将特定level置于列表的第一级别
此方法不适合有序因子

sizes <- factor(c("small", "large", "large", "small", "medium"))
sizes
#> [1] small  large  large  small  medium
#> Levels: large medium small

# 使 medium 到第一级别
sizes <- relevel(sizes, "medium")
sizes
#> [1] small  large  large  small  medium
#> Levels: medium large small

# 使 small 到第一级别
sizes <- relevel(sizes, "small")
sizes
#> [1] small  large  large  small  medium
#> Levels: small medium large

方法四:在创建因子时指定正确顺序

sizes <- factor(c("small", "large", "large", "small", "medium"),
                levels = c("small", "medium", "large"))
sizes
#> [1] small  large  large  small  medium
#> Levels: small medium large

如果想反转因子的levels

sizes <- factor(c("small", "large", "large", "small", "medium"))
sizes
#> [1] small  large  large  small  medium
#> Levels: large medium small

sizes <- factor(sizes, levels=rev(levels(sizes)))
sizes
#> [1] small  large  large  small  medium
#> Levels: small medium large

网站公告

今日签到

点亮在社区的每一天
去签到