清除因子水平(折叠多个级别/标签)

什么是最有效(即有效/适当)的方式来清理包含需要崩溃的多个级别的因素。

例:

## Given: 
x <- c("Y", "Y", "Yes", "N", "No", "H")   # The 'H' should be treated as NA

## expectedOutput
[1] Yes  Yes  Yes  No   No   <NA>
Levels: Yes No  # <~~ NOTICE ONLY **TWO** LEVELS

一个选项当然是清洁字符串之前手使用子和朋友。

另一种方法,是允许重复标签,然后丢弃它们

## Duplicate levels ==> "Warning: deprecated"
x.f <- factor(x, levels=c("Y", "Yes", "No", "N"), labels=c("Yes", "Yes", "No", "No"))

## the above line can be wrapped in either of the next two lines
factor(x.f)      
droplevels(x.f) 

但是,有没有更有效的方法?

虽然我知道水平和标签参数应该是向量,我实验列表和命名列表和命名向量来看看会发生什么
不用说,没有一个让我更接近我的目标。

  factor(x, levels=list(c("Yes", "Y"), c("No", "N")), labels=c("Yes", "No"))
  factor(x, levels=c("Yes", "No"), labels=list(c("Yes", "Y"), c("No", "N")))

  factor(x, levels=c("Y", "Yes", "No", "N"), labels=c(Y="Yes", Yes="Yes", No="No", N="No"))
  factor(x, levels=c("Y", "Yes", "No", "N"), labels=c(Yes="Y", Yes="Yes", No="No", No="N"))
  factor(x, levels=c("Yes", "No"), labels=c(Y="Yes", Yes="Yes", No="No", N="No"))
使用levels函数,并传递一个命名列表,其名称是所需的级别名称,元素是应重命名的当前名称。

x <- c("Y", "Y", "Yes", "N", "No", "H")
x <- factor(x)
levels(x) <- list(Yes=c("Y", "Yes"), No=c("N", "No"))
x
## [1] Yes  Yes  Yes  No   No   <NA>
## Levels: Yes No

如级别文档中所述;也看到那里的例子。

value: For the ‘factor’ method, a
vector of character strings with length at least the number
of levels of ‘x’, or a named list specifying how to rename
the levels.

这也可以在一行,如Marek在这里:http://stackoverflow.com/a/10432263/210673;在这里解释水平< - 手法http://stackoverflow.com/a/10491881/210673

> `levels<-`(factor(x), list(Yes=c("Y", "Yes"), No=c("N", "No")))
[1] Yes  Yes  Yes  No   No   <NA>
Levels: Yes No
http://stackoverflow.com/questions/19410108/cleaning-up-factor-levels-collapsing-multiple-levels-labels

本站文章除注明转载外,均为本站原创或编译
转载请明显位置注明出处:清除因子水平(折叠多个级别/标签)