# 分类:比较两个集合的方法

Title： Large enough sample size to rank two groups of data reliably according to their means

Authors: Zhesi Shen, Liying Yang, Zengru Di, Jinshan Wu

Abstract:

Often we need to compare two sets of data, say X and Y, and often via comparing their means 𝜇𝑋 and 𝜇𝑌 . However, when two sets are highly overlapped (say for example 𝜎2𝑋+𝜎2𝑌‾‾‾‾‾‾‾‾√≫|𝜇𝑋−𝜇𝑌| ), ranking the two sets according to their means might not be reliable. Based on the observation that replacing the one-by-one comparison, where we take one sample from each set at a time and compare the two samples, with the 𝐾𝑋 -by- 𝐾𝑌 comparison, where we take 𝐾𝑋 samples {𝑥1,𝑥2,…,𝑥𝐾𝑋} from one set and 𝐾𝑌 samples {𝑦1,𝑦2,…,𝑦𝐾𝑋} from the other set at a time and compare the averages ∑𝐾𝑋𝑗=1𝑥𝑗𝐾𝑋 and ∑𝐾𝑌𝑗=1𝑦𝑗𝐾𝑌 , reduces the overlap and thus improves the reliability, we propose a definition of the minimum representative size 𝜅 of each set for comparing sets by requiring roughly speaking 𝜎2𝐾𝑋+𝜎2𝐾𝑌‾‾‾‾‾‾‾‾‾‾√≪|𝜇𝑋−𝜇𝑌| ). Applied to journal comparison, this minimum representative size 𝜅 might be used as a complementary index to the journal impact factor (JIF) to indicate a measure of reliability of comparing two journals using their JIFs. Generally, this idea of minimum representative size can be used when any two sets of data with overlapping distributions are compared.

Keywords: Journal impact factor Minimum representative size Bootstrap sampling

## 参考文献

1. Zhesi Shen, Liying Yang, Zengru Di, Jinshan Wu. Large enough sample size to rank two groups of data reliably according to their means. Scientometrics 118: 653-671 (2019). https://doi.org/10.1007/s11192-018-2995-0
2. Zhesi Shen, Liying Yang, Jinshan Wu, Lognormal distribution of citation counts is the reason for the relation between Impact Factors and Citation Success Index, Journal of Informetrics, 12(1), 153–157(2018). https://doi.org/10.1016/j.joi.2017.12.007
3. Stringer, M. J., Sales-Pardo, M., & Amaral, L. A. N. (2008). Effectiveness of journal ranking schemes as a tool for locating information. PLoS ONE, 3, e1683.
4. Milojević, S., Radicchi, F., & Bar-Ilan, J. (2017). Citation success index an intuitive pair-wise journal comparison metric. Journal of Informetrics, 11, 223-231.
5. Xiaoling Liu, Mihai Păunescu, Viorel Proteasa, Jinshan Wu, Minimum Representative Size in Comparing Research Performance of Universities: the Case of Medicine Faculties in Romania, Journal of Data and Information Science, 3(3),32-42(2018), https://doi.org/10.2478/jdis-2018-0013