R – 根据第二个数据帧中最接近的匹配来分配列值

我有两个数据框,logger和df(次数是数字):

logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)

df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413),
temp = NA
)

我想在logf $time中搜索与df $time中每行最接近的匹配,并将相关的logger $temp分配给df $temp.到目前为止,我已成功使用以下循环:

for (i in 1:length(df$time)){
closestto<-which.min(abs((logger$time) - (df$time[i])))
df$temp[i]<-logger$temp[closestto]
}

但是,我现在有大数据帧(记录器有13,620行,df有266138),处理时间很长.我已经读过循环不是最有效的方法,但我不熟悉替代方案.有更快的方法吗?

最佳答案
我会使用data.table.它使得它非常容易且超快速地加入按键.对于您正在寻找的行为,甚至有一个非常有用的roll =“nearest”参数(除非您的示例数据中没有必要,因为df的所有时间都出现在logger中).在下面的示例中,我将df $time重命名为df $time1,以明确哪个列属于哪个表…

#  Load package
require( data.table )

#  Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )

#  Join based on the key column of the two tables (time & time1)
#  roll = "nearest" gives the desired behaviour
#  list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
#          time obs      time1     temp
# 1: 1280248361   8 1280248361 18.07644
# 2: 1280248366   4 1280248366 21.88957
# 3: 1280248370   3 1280248370 19.09015
# 4: 1280248376   5 1280248376 22.39770
# 5: 1280248381   6 1280248381 24.12758
# 6: 1280248383  10 1280248383 22.70919
# 7: 1280248385   1 1280248385 18.78183
# 8: 1280248389   2 1280248389 18.17874
# 9: 1280248393   9 1280248393 18.03098
#10: 1280248403   7 1280248403 22.74372

转载注明原文:R – 根据第二个数据帧中最接近的匹配来分配列值 - 代码日志