Non-standard predictors
- Nonlinear regression
- Nonparametric
Classification Identifying to which category an object belongs to. Applications : Spam detection, Image recognition.
Algorithms :
Classification:
k-nearest neighbors - supervised
decision trees (c4.5) - supervised - noncontiguous, data is if/else
gradient boosted decision trees
Random Forest - Super/Unsuper - Best Split
classification and regresstion tree(cart)
SVM - Super/Unsuper - Maximum Margin
naive bayes - supervised
Regression Predicting a continuous-valued attribute associated with an object.
Applications : Drug response, Stock prices.
Algorithms :
- SVR, - ridge regression
- Lasso
- Simple linear regression
- OLS
- GLM
- Bayesian regression
- Examples
Automatic grouping of similar objects into sets.
Applications : Customer segmentation, Grouping experiment outcomes
Algorithms :
Dimensionality reduction Reducing the number of random variables to consider.
- Applications :
- Visualization
- Increased efficiency
- Algorithms :
Model selection Comparing, validating and choosing parameters and models.
Goal : Improved accuracy via parameter tuning
Modules :
Preprocessing Feature extraction and normalization.
Preprocessing Feature extraction and normalization.
- Application : Transforming input data such as text for use with machine learning algorithms.
- Modules :
ML
CLASSIFICATION AND REGRESSION PROBLEMS
There are numerous algorithms for predicting continuous variables or categorical variables from a set of continuous predictors and/or categorical factor effects. For example, in GLM (General Linear Models) and GRM (General Regression Models), we can specify a linear combination (design) of continuous predictors and categorical factor effects (e.g., with two-way and three-way interaction effects) to predict a continuous dependent variable. In GDA (General Discriminant Function Analysis), we can specify such designs for predicting categorical variables, i.e., to solve classification problems.
Regression-type problems. Regression-type problems are generally those where we attempt to predict the values of a continuous variable from one or more continuous and/or categorical predictor variables.
Classification-type problems. Classification-type problems are generally those where we attempt to predict values of a categorical dependent variable (class, group membership, etc.) from one or more continuous and/or categorical predictor variables. There are a number of methods for analyzing classification-type problems and to compute predicted classifications, either from simple continuous predictors (e.g., binomial or multinomial logit regression in GLZ), from categorical predictors (e.g., Log-Linear analysis of multi-way frequency tables), or both (e.g., via ANCOVA-like designs in GLZ or GDA).
Tree methods are nonparametric and nonlinear + Simplicity of results. -Specify Criteria for Predictive Accuracy, Selecting Splits, When to Stop Splitting.
Data Classification -> Effectiveness Data_classification
Learning Models : We can think of a model as a template. When data is processed through a learning model, what will come out the other end is insight. The model is nothing more than a set of operations performed on the data. Models are typically made in a static environment by (drilling/ rolling, pivoting, slicing/ dicing, Etc.) through data and may involve Integrating multiple mining functions (ex. Classifying than Clustering).
According to Golfarelli and Rizzi, these are the measures of effectiveness of the classifier:
- _ Predictive accuracy _ : How well does it predict the categories for new observations?
- _ Speed _ : What is the computational cost of using the classifier?
- _ Robustness _ : How well do the models created perform if data quality is low?
- _ Scalability _ : Does the classifier function efficiently with large amounts of data?
- _ Interpretability _ : Are the results understandable to users?
Types of Models
Hierarchical Generative
- Gaussian Mixture Models
- Hidden Markov Models
- Naive Bayes
- GANS
Discriminative
- NN
- SVM
- Logistic Regression
Descriptive
- Derived from the Attributes of Data (mean, median, mode, avg)
MLE is the workhorse estimation technique of frequentist statistics latent variables :
expectation maximization
methods of moments
signal separation
-- principal component analysis
-- singular value decomposition
properties of estimators (bias, consistency, efficiency, sufficiency, robustness).
Testing: Type I and II errors, power, likelihood ratios
Methodology of probabilistic process models:
- Dirichlet
- Gaussian
- basis/kernel expansion
- splines, wavelets
- support vector machines
- other local regression models
latent variables:
- expectation maximization
- methods of moments
- signal separation
- principal component analysis
- singular value decomposition
Feature Reduction:
PCA - describe greatest features
Cross Correlation Analysis
Linear Discrimination, Combines Features
concepts:
accuracy (tp+tn/p+n)
precision: tp/(tp+fp)
specificity: tn/(fp+tn)
sensitivity: tp/(tp+fn)
Regression:
linear regressions - numeric
logistic regressions - categorical
ensemble learning - bagging boosting stacking additive regression
Neural Nets: (super/unsuper)
autoencoders
deep beliefe nets
hebbian learning
gans
implicit density model
som
Clustering:
Hierarchal
Kmeans (Euclidean, Mikowski, Manhattan, #Clusters)
super/unsuper - sort by centroid
anomoly-detection - outliers, super/unsuper
RNN
- LSTM
- Hierarchal
- Stochastic
FeedForward :
- MIP
- Autoencoder
- Probablistic
- Convolusional
- Time Delay
Online Learning :
- Data Efficient and Adaptable
- No Data storage needed
- Stochastic Gradient Descent
- No data storage needed
T-Distribution - Visualize high density
RISK :Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g.,linear regression,logistic regression, anddistance based methods) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form ofregularization. Presence of interactions and non-linearities. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g.,linear regression,logistic regression,Support Vector Machines,naive Bayes) and distance functions (e.g.,nearest neighbor methods,support vector machines with Gaussian kernels) generally perform well. However, if there are complex interactions among features, then algorithms such asdecision trees andneural networks work better, because they are specifically designed to discover these interactions. Linear methods can also be applied, but the engineer must manually specify the interactions when using them.
concepts :
accuracy (tp+tn/p+n)
precision: tp/(tp+fp)
specificity: tn/(fp+tn)
sensitivity: tp/(tp+fn) properties of estimators (bias, consistency, efficiency, sufficiency, robustness).
Testing : Type I and II errors, power, likelihood ratios
Critical Review Configuration and Risk Rational for Complexity Weighted parameters Weighted performance metrics Risk assessments and mitigations Technologies Roadmaps
Common Error Measures:
- (root) Mean Squared Error - continuous data, sensitivity to outliers
- median absolute deviation - continuous data, often more robust
- sensitivity (Recall) - if you want few missed positives
- specificity - if you want few negatives called positives
- accuracy - weights false positives / negatives equally
- Concordance - one example is kappa
Key issues: accuracy, overfitting, interpretability, computational speed. Pay attention to - confounding variables, complicated interactions, skewness, outliers, nonlinear patterns, variance changes, units/scale issues, overloading, regression, correlation and causation Confounder: a variable that is correlated with both the outcome and covariates
- confounders can change the regression line.
- detected with exploration
Hierarchical - Distance or similarity? - continuous(euclidean/correlation), binary - manhattan Graphs - help understand properties, find patterns, suggest future modelings, debug, and communicate. Bagging and Boosting - Combine classifiers to improve accuracy but make harder to interpret. Predictive