Gini Coefficient 基尼系数

Ads_RecSys

Created At : 2023-08-20 20:08

Count:556 Comment:

Overview
基尼系数的设计思想
Gini Coefficient 的两种实现
Reference

Overview

1.基尼系数的设计思想
2.Gini Coefficient 的两种实现

基尼系数的设计思想

原始的 gini 的计算公式
https://kimberlyfessel.com/mathematics/applications/gini-use-cases/

Gini Coefficient 的两种实现

import numpy as np
import matplotlib.pyplot as plt
import scipy.interpolate
import scipy.integrate

predictions = [0.9, 0.3, 0.8, 0.75, 0.65, 0.6, 0.78, 0.7, 0.05, 0.4, 0.4, 0.05, 0.5, 0.1, 0.1]
actual      = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]
actual      = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0]

def gini_ori_numpy(x):
  total = 0
  for i, xi in enumerate(x[:-1], 1):
    total += np.sum(np.abs(xi - x[i:]))
  return total / (len(x)**2 * np.mean(x))

incomes = np.array([50, 50, 70, 70, 70, 90, 150, 150, 150, 150])
gini_ori_numpy(incomes)
0.226

np.c_按照列去 concat 向量

np.c_[np.array([1,2,3]), np.array([4,5,6])]
array([[1, 4],
      [2, 5],
      [3, 6]])

np.cumsum 按照维度去累积求和

a = np.array([[1,2,3], [4,5,6]])
np.cumsum(a,axis=0)      # sum over rows for each of the 3 columns
array([[1, 2, 3],
       [5, 7, 9]])
np.cumsum(a,axis=1)      # sum over columns for each of the 2 rows
array([[ 1,  3,  6],
       [ 4,  9, 15]])

然后是 gini 的实现

def gini(actual, pred):
  assert (len(actual) == len(pred))
  all = np.asarray(np.c_[actual, pred, np.arange(len(actual))], dtype=np.float)
  """
  np.lexsort((b,a)) 按照 a 先排序, 再按照 b 排序, 返回的是这种排序规则下的 index
  a = [1,5,1,4,3,4,4] # First column
  b = [9,4,0,4,0,2,1] # Second column
  ind = np.lexsort((b,a)) # Sort by a, then by b
  array([2, 0, 4, 6, 5, 3, 1])
  [(a[i],b[i]) for i in ind]
  [(1, 0), (1, 9), (3, 0), (4, 1), (4, 2), (4, 4), (5, 4)]

  这里先按照预估值排序, 然后再按照index排序
  """
  all = all[np.lexsort((all[:, 2], -1 * all[:, 1]))]
  totalLosses = all[:, 0].sum()
  giniSum = all[:, 0].cumsum().sum() / totalLosses

  giniSum -= (len(actual) + 1) / 2.
  return giniSum / len(actual)

def gini_normalized(actual, pred):
  return gini(actual, pred) / gini(actual, actual)

两个变量的 gini

gini_predictions = gini(actual, predictions)
gini_max = gini(actual, actual)
ngini= gini_normalized(actual, predictions)
print('Gini: %.3f, Max. Gini: %.3f, Normalized Gini: %.3f' % (gini_predictions, gini_max, ngini))

Reference

[1]. https://www.statology.org/gini-coefficient-python/.
[2]. https://www.kaggle.com/code/batzner/gini-coefficient-an-intuitive-explanation/notebook
[3]. https://blog.csdn.net/qq_34418352/article/details/109514966
[4]. https://www.kilians.net/post/gini-coefficient-intuitive-explanation/#:~:text=The%20Normalized%20Gini%20coefficient%20is,could%20give%20you%20a%20better gini 更好的一个文章
[5]. norm gini faster 实现 https://www.kaggle.com/code/tezdhar/faster-gini-calculation/notebook

转载请注明来源 goldandrabbit.github.io