Overview
1.Feature Engineering Basic 基础特征工程
2.Embedding
3.Training & Optimization Tricks 训练 & 优化技巧
5.编码和分析原则
Feature Engineering Basic 基础特征工程
Discretizer, Bin continuous data into intervals
1.Uniform – All bins in each feature have identical widths.
2.Quantitle – All bins in each feature have the same number of points.
3.Kmeans – Values in each bin have the same nearest center of a 1D k-means cluster.
Binazier, Encode categorical features
OneHotEncoder, OrdinalEncoder, HashMap, etc.
Scaler
Linear Scaler
- Min-Max, Maxabs, Nomalize, Robust (QR,clip with quantile range)
Non-Linear Scaler
- Gaussian-like Scaler.
Box-Cox/Yeo-Johnson.
RankGauss. - XX-like Scaler
Use icdf of target distribution to transform (cdfX) - Non-linear transforms such as Box-Cox may be userful to transform y. Diversity for ensemble, sometimes with better performance.
- Yeo-Johnson transform.
- Linear transform. Like standard normalized work better for X.
Embedding
Learn lookup table as category representation
Category Embedding/ Entity Embedding
Numerical Embedding
- Bin numerical data into intervals.
- as category embedding
- add vicinal info
Key – Value Based
- df[A].groupby(B).aggregate(func) Func can be anything, F(subgroupby(A)) -> x
- Common aggregate func: mean, std, skew, kurt, entropy, min, max, median, frequency, size etc.
- Residues and weights of Least-squares fitting works well in sequential value.
- High order interaction, groupby().groupby()
Latent Representation
Use decomposition algorithm to extract latent features. Works better in neural network than GBDT. Just fit all handcrafted feature with different decomposition algorithm.
- Topic Models (LDA, SVD, NMF etc).
- Manifold Learning (T-SNE)
Symbolic Learning
Using symbolics ( + - * / cos sin tan ^ log min max neg…) as functions set to represent a relationship.
- Total different interaction feature.
- Automated feature learning.
- Works in many natural scene, such as astronomical, geological, biological tasks.
Auto Encoders
- Learn to construct clean samples from corrupted ones. (denoise)
- Regularization of the latent space to match a prior. (Always multivariate normal distribution)
Graphs
- FeatureA – FeatureB – aggregate. Everything can construct graph.
- Maximize the likelihood of preserving newwork neighborhoods of nodes.
- Unsupervised learning methods: Deepwalk, node2vec.
- Semi or supervised learning: GCN.
- Deepwalk, node2vec work in many tasks, like CTR, Fraud detection
End2End model
- Online learning models, FTRL, mini-batch fm, etc.
- DeepFM, DeepFFM,xDeepFM etc.
- Attention based interaction model.
Target mean Encoding
- Leave-one-out /K-Folds / history slide window schema to avoid overfitting
- Post-prior schema
Model based target encoding
- Base model to predict target, with k–folds, user prob as features.
- Base model to predict residuals (target - probs)
- Sometimes, user level-N stacking probs works better.
Training & Optimization Tricks 训练 & 优化技巧
Common Tricks
- Learning rate schedule. SGD lr=0.1, BS=256, lr bs / 256= 0.1 256
- Label smoothing.
- No bias decay.
- Bias initial with prior distribution.
- Focal loss is a good metric for estimating models’capabilities.
- Combine multi loss let the model converge rapidly towards better performance.
- User standart scaler transform First.
- Sometimes userful, but all depends on data.
NLP task traning Tricks
- Diverse pre-trained embeddings
- more than 90% of a model’s complexity resides the embedding layer.
- Contextual level embedding always helps.
- But now we have BERT.
- Dynamic Padding boosts training speed and model performance.
- Add test oov vocabs to embedding matrix.
- OOV replaced with “something”
- Adamw really works in many NLP tasks.
- Spatial Droupout, shuflle noise in classification task.
- Translation Aumentation. En-Fr, En-Fr-En.
CV task training Tricks
- ResNet is always a good baseline model to do experiments.
- Warm-up learning rate + cos learnging rate decay.
- Manual Learning rate schedule.
- Batch with different loss accumulate.
- User more shape relevance augmentation.
- CNN tend to learn texture information but not shape info.
- Auto augmentation.
Pseudo Learning
- Filtering hard examples in training dataset use OOF prediction.
- Global variance info leakage.
- Add the most confident test predictions to the training dataset.
Real Tricks
- Test distribution leakage, data extend
- Simple in Batch Traning Models.
- User single model to label test dataset.
- Retrain model with each batch add 10~30% test pseudo labeled data.
编码和分析原则
1.逻辑性 (代码架构清晰、自动化框架、特征工程MECE)
2.代码可复用 (不重复造轮子)
3.基于业务的分析 (bad case 分析、Kernels、Discussion、相似比赛)
4.反思总结
转载请注明来源, from goldandrabbit.github.io