﻿ R-Kendall的tau-b和tau-c的关联度量 - 代码日志

#### R-Kendall的tau-b和tau-c的关联度量

``````Kendall_tau_a = (P - Q) / (n*(n-1)/2)

Kendall_tau_b = (P - Q) / ( (P + Q + Y0)*(P + Q + X0) )^0.5

Kendall_tau_c = (P-Q)*( (2*m)/n^2*(m-1) )
``````

tau-a：等于一致的减去不一致的对,除以一个因子来计算总对数(样本量).

tau-b：明确的关系关系 – 即数据对的两个成员具有相同的值;该值等于一致的减去不一致对除以表示在x(X0)之间未绑定的对数与不与y(Y0)相关的数字之间的几何平均值的项.

tau-c：较大表变体也针对非方形表优化;等于一致的减去不一致的对乘以调整表大小的因子).

``````# number of concordant pairs
P = function(t) {
r_ndx = row(t)
c_ndx = col(t)
sum(t * mapply(function(r, c){sum(t[(r_ndx > r) & (c_ndx > c)])},
r = r_ndx, c = c_ndx))}

# number of discordant pairs
Q = function(t) {
r_ndx = row(t)
c_ndx = col(t)
sum(t * mapply( function(r, c){
sum(t[(r_ndx > r) & (c_ndx < c)])
},
r = r_ndx, c = c_ndx) )
}

# sample size (total number of pairs)
n = n = sum(t)

# the lesser of number of rows or columns
m = min(dim(t))
``````

> P
> Q
> m
> n

(加上tau-b的XO& Y0)

``````kendall_tau_c = function(t){
t = as.matrix(t)
m = min(dim(t))
n = sum(t)
ks_tauc = (m*2 * (P(t)-Q(t))) / ((n^2)*(m-1))
}
``````

``````cpa_group = c(4, 2, 4, 3, 2, 2, 3, 2, 1, 5, 5, 1)
revenue_per_customer_group = c(3, 3, 1, 3, 4, 4, 4, 3, 5, 3, 2, 2)
weight = c(1, 3, 3, 2, 2, 4, 0, 4, 3, 0, 1, 1)

dfx = data.frame(CPA=cpa_group, LCV=revenue_per_customer_group, freq=weight)

# reshape data frame so 1 row for each event
# (prediate step to create contingency table)
dfx2 = data.frame( lapply(dfx, function(x){rep(x, dfx\$freq)}))

t = xtabs(~ revenue + cpa, dfx)

kc = kendall_tau_c(t)

# returns -.35
``````