Motivation
1.提出一个 DCN V2 的升级版本: gdcn, 引入了信息门控组件,自适应地学习上一层阶交叉结果的重要性
GDCN
1.GDCN 核心实现
x_{l+1}=x_0 * (w_l+b_l) * sigmoid(w_g * x_l) + x_l
intuitively,
1.GDCN 结构在 DCN 结构上引入了信息门控组件,自适应地学习上一层阶交叉结果的重要性: 我们期望该过程可以放大更重要特征,减轻不重要特征的影响; 随着交叉层数量的增加,每个交叉层的信息门过滤下一阶交叉特征,并有效地控制信息流
Code
自己实现的 tf 版本
"""
tensorlow online version
"""
def dcn_stack_net(self, input_layer, cross_layer_num, output_size):
"""
x_l = x_0 \odot (W \cdot x_l + b) + x_l
\odot 代表 element-wise 的tensor 乘法, 通常 tf.multiply() 实现
"""
name = 'dcn'
act_fun = tf.nn.tanh
with tf.variable_scope("dcn_stack_net"):
x_0 = input_layer
x_l = x_0
input_size = input_layer.shape1.value
for i in range(cross_layer_num):
w = tf.get_variable(shape=[input_size, input_size], name=f"{name}_kernerl_{i+1}", initializer=tf.random_normal_initializer(stddev=1.0 / math.sqrt(float(input_size))), trainable=True)
b = tf.get_variable(shape=[input_size], name=f"{name}_bias_{i+1}", initializer=tf.zeros_initializer, trainable=True)
dot_ = tf.multiply(x_0, tf.add(tf.matmul(x_l, w), b))
x_l = tf.add(dot_, x_l)
# x_l = act_fun(x_l)
output_w = tf.get_variable(shape=[input_size, output_size], name=f"{name}_output_w")
output_layer = tf.matmul(x_l, output_w)
return output_layer
def gdcn_stack_net(self, input_layer, cross_layer_num, output_size):
"""
formulaiton: c_{l+1} = c_0 \odot (w_l + b_l) \odot sigmoid(w_g \cdot x_l) + x_l
code: x_l = x_0 \odot (w_l + b_l) \odot sigmoid(w_g \cdot x_l) + x_l
"""
name = 'gdcn'
with tf.variable_scope("gdcn_stack_net"):
x_0 = input_layer
x_l = x_0
input_size = input_layer.shape1.value
for i in range(cross_layer_num):
wl = tf.get_variable(shape=[input_size, input_size], name=f"{name}_wl_{i+1}", initializer=tf.random_normal_initializer(stddev=1.0 / math.sqrt(float(input_size))), trainable=True)
wg = tf.get_variable(shape=[input_size, input_size], name=f"{name}_wg_{i+1}", initializer=tf.random_normal_initializer(stddev=1.0 / math.sqrt(float(input_size))), trainable=True)
b = tf.get_variable(shape=[input_size], name=f"{name}_b_{i+1}", initializer=tf.zeros_initializer, trainable=True)
dot1 = tf.multiply(x_0, tf.add(tf.matmul(x_l, wl), b))
dot2 = tf.nn.sigmoid(tf.matmul(x_l, wg))
x_l = tf.add(tf.multiply(dot1, dot2), x_l)
output_w = tf.get_variable(shape=[input_size, output_size], name=f"{name}_output_w")
output_layer = tf.matmul(x_l, output_w)
return output_layer
Reference
[1]. Towards Deeper, Lighter and Interpretable Cross Network for CTR Prediction.
转载请注明来源, from goldandrabbit.github.io