# 分类:比较两个集合的方法

Title： Large enough sample size to rank two groups of data reliably according to their means

Authors: Zhesi Shen, Liying Yang, Zengru Di, Jinshan Wu

Abstract:

Often we need to compare two sets of data, say X and Y, and often via comparing their means 𝜇𝑋 and 𝜇𝑌 . However, when two sets are highly overlapped (say for example 𝜎2𝑋+𝜎2𝑌‾‾‾‾‾‾‾‾√≫|𝜇𝑋−𝜇𝑌| ), ranking the two sets according to their means might not be reliable. Based on the observation that replacing the one-by-one comparison, where we take one sample from each set at a time and compare the two samples, with the 𝐾𝑋 -by- 𝐾𝑌 comparison, where we take 𝐾𝑋 samples {𝑥1,𝑥2,…,𝑥𝐾𝑋} from one set and 𝐾𝑌 samples {𝑦1,𝑦2,…,𝑦𝐾𝑋} from the other set at a time and compare the averages ∑𝐾𝑋𝑗=1𝑥𝑗𝐾𝑋 and ∑𝐾𝑌𝑗=1𝑦𝑗𝐾𝑌 , reduces the overlap and thus improves the reliability, we propose a definition of the minimum representative size 𝜅 of each set for comparing sets by requiring roughly speaking 𝜎2𝐾𝑋+𝜎2𝐾𝑌‾‾‾‾‾‾‾‾‾‾√≪|𝜇𝑋−𝜇𝑌| ). Applied to journal comparison, this minimum representative size 𝜅 might be used as a complementary index to the journal impact factor (JIF) to indicate a measure of reliability of comparing two journals using their JIFs. Generally, this idea of minimum representative size can be used when any two sets of data with overlapping distributions are compared.

Keywords: Journal impact factor Minimum representative size Bootstrap sampling

## 参考文献

