Overview
1.基尼系数的设计思想
2.Gini Coefficient 的两种实现
基尼系数的设计思想
原始的 gini 的计算公式
https://kimberlyfessel.com/mathematics/applications/gini-use-cases/
Gini Coefficient 的两种实现
import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
import scipy.integrate
predictions = [0.9, 0.3, 0.8, 0.75, 0.65, 0.6, 0.78, 0.7, 0.05, 0.4, 0.4, 0.05, 0.5, 0.1, 0.1]
actual = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
actual = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
def gini_ori_numpy(x):
total = 0
for i, xi in enumerate(x[:-1], 1):
total += np.sum(np.abs(xi - x[i:]))
return total / (len(x)**2 * np.mean(x))
incomes = np.array([50, 50, 70, 70, 70, 90, 150, 150, 150, 150])
gini_ori_numpy(incomes)
0.226
np.c_按照列去 concat 向量
np.c_[np.array([1,2,3]), np.array([4,5,6])]
array([[1, 4],
[2, 5],
[3, 6]])
np.cumsum 按照维度去累积求和
a = np.array([[1,2,3], [4,5,6]])
np.cumsum(a,axis=0) # sum over rows for each of the 3 columns
array([[1, 2, 3],
[5, 7, 9]])
np.cumsum(a,axis=1) # sum over columns for each of the 2 rows
array([[ 1, 3, 6],
[ 4, 9, 15]])
然后是 gini 的实现
def gini(actual, pred):
assert (len(actual) == len(pred))
all = np.asarray(np.c_[actual, pred, np.arange(len(actual))], dtype=np.float)
"""
np.lexsort((b,a)) 按照 a 先排序, 再按照 b 排序, 返回的是这种排序规则下的 index
a = [1,5,1,4,3,4,4] # First column
b = [9,4,0,4,0,2,1] # Second column
ind = np.lexsort((b,a)) # Sort by a, then by b
array([2, 0, 4, 6, 5, 3, 1])
[(a[i],b[i]) for i in ind]
[(1, 0), (1, 9), (3, 0), (4, 1), (4, 2), (4, 4), (5, 4)]
这里先按照预估值排序, 然后再按照index排序
"""
all = all[np.lexsort((all[:, 2], -1 * all[:, 1]))]
totalLosses = all[:, 0].sum()
giniSum = all[:, 0].cumsum().sum() / totalLosses
giniSum -= (len(actual) + 1) / 2.
return giniSum / len(actual)
def gini_normalized(actual, pred):
return gini(actual, pred) / gini(actual, actual)
两个变量的 gini
gini_predictions = gini(actual, predictions)
gini_max = gini(actual, actual)
ngini= gini_normalized(actual, predictions)
print('Gini: %.3f, Max. Gini: %.3f, Normalized Gini: %.3f' % (gini_predictions, gini_max, ngini))
Reference
[1]. https://www.statology.org/gini-coefficient-python/.
[2]. https://www.kaggle.com/code/batzner/gini-coefficient-an-intuitive-explanation/notebook
[3]. https://blog.csdn.net/qq_34418352/article/details/109514966
[4]. https://www.kilians.net/post/gini-coefficient-intuitive-explanation/#:~:text=The%20Normalized%20Gini%20coefficient%20is,could%20give%20you%20a%20better gini 更好的一个文章
[5]. norm gini faster 实现 https://www.kaggle.com/code/tezdhar/faster-gini-calculation/notebook
转载请注明来源, from goldandrabbit.github.io