如何将data.frame分组几周然后求和?

假设我有几年的数据,如下所示

# load date package and set random seed
library(lubridate)
set.seed(42)

# create data.frame of dates and income
date <- seq(dmy("26-12-2010"), dmy("15-01-2011"), by = "days")
df <- data.frame(date = date, 
                 wday = wday(date),
                 wday.name = wday(date, label = TRUE, abbr = TRUE),
                 income = round(runif(21, 0, 100)),
                 week = format(date, format="%Y-%U"),
                 stringsAsFactors = FALSE)

#          date wday wday.name income    week
# 1  2010-12-26    1       Sun     91 2010-52
# 2  2010-12-27    2       Mon     94 2010-52
# 3  2010-12-28    3      Tues     29 2010-52
# 4  2010-12-29    4       Wed     83 2010-52
# 5  2010-12-30    5     Thurs     64 2010-52
# 6  2010-12-31    6       Fri     52 2010-52
# 7  2011-01-01    7       Sat     74 2011-00
# 8  2011-01-02    1       Sun     13 2011-01
# 9  2011-01-03    2       Mon     66 2011-01
# 10 2011-01-04    3      Tues     71 2011-01
# 11 2011-01-05    4       Wed     46 2011-01
# 12 2011-01-06    5     Thurs     72 2011-01
# 13 2011-01-07    6       Fri     93 2011-01
# 14 2011-01-08    7       Sat     26 2011-01
# 15 2011-01-09    1       Sun     46 2011-02
# 16 2011-01-10    2       Mon     94 2011-02
# 17 2011-01-11    3      Tues     98 2011-02
# 18 2011-01-12    4       Wed     12 2011-02
# 19 2011-01-13    5     Thurs     47 2011-02
# 20 2011-01-14    6       Fri     56 2011-02
# 21 2011-01-15    7       Sat     90 2011-02

我想把每周(周日到周六)的“收入”加起来.目前我做以下事情:

Weekending 2011-01-01 = sum(df$income[1:7]) = 487
Weekending 2011-01-08 = sum(df$income[8:14]) = 387
Weekending 2011-01-15 = sum(df$income[15:21]) = 443

但是,我想要一个更强大的方法,它将自动按周计算.我无法弄清楚如何自动将数据子集化为数周.任何帮助将非常感激.

最佳答案
首先使用格式将日期转换为周数,然后使用plyr :: ddply()来计算摘要:

library(plyr)
df$week <- format(df$date, format="%Y-%U")
ddply(df, .(week), summarize, income=sum(income))
     week income
1 2011-52    413
2 2012-01    435
3 2012-02    379

有关format.date的更多信息,请参阅?strptime,特别是将%U定义为周数的位.

编辑:

鉴于修改的数据和要求,一种方法是将日期除以7以得到表示星期的数字. (或者更准确地说,除以一周内的秒数来获得自纪元以来的周数,默认情况下是1970-01-01.

在代码中:

df$week <- as.Date("1970-01-01")+7*trunc(as.numeric(df$date)/(3600*24*7))
library(plyr)
ddply(df, .(week), summarize, income=sum(income))

        week income
1 2010-12-23    298
2 2010-12-30    392
3 2011-01-06    294
4 2011-01-13    152

我没有检查星期日的星期界限.您必须检查此项,并在公式中插入适当的偏移量.

转载注明原文:如何将data.frame分组几周然后求和? - 代码日志