https://www.itl.nist.gov/div898/handbook/

Non-standard predictors

Classification Identifying to which category an object belongs to. Applications : Spam detection, Image recognition.

Algorithms :

Classification:

k-nearest neighbors - supervised
decision trees (c4.5) - supervised - noncontiguous, data is if/else
gradient boosted decision trees
Random Forest - Super/Unsuper - Best Split
classification and regresstion tree(cart)
SVM - Super/Unsuper - Maximum Margin
naive bayes - supervised

Regression Predicting a continuous-valued attribute associated with an object.

Applications : Drug response, Stock prices.
Algorithms :

SVR, - ridge regression
Lasso
Simple linear regression
OLS
GLM
Bayesian regression
Examples

Clustering

Automatic grouping of similar objects into sets.

Applications : Customer segmentation, Grouping experiment outcomes
Algorithms :

Dimensionality reduction Reducing the number of random variables to consider.

Applications :

Visualization
Increased efficiency

Algorithms :

Model selection Comparing, validating and choosing parameters and models.

Goal : Improved accuracy via parameter tuning
Modules :

Preprocessing Feature extraction and normalization.

Application : Transforming input data such as text for use with machine learning algorithms.
Modules :

ML

CLASSIFICATION AND REGRESSION PROBLEMS

There are numerous algorithms for predicting continuous variables or categorical variables from a set of continuous predictors and/or categorical factor effects. For example, in GLM (General Linear Models) and GRM (General Regression Models), we can specify a linear combination (design) of continuous predictors and categorical factor effects (e.g., with two-way and three-way interaction effects) to predict a continuous dependent variable. In GDA (General Discriminant Function Analysis), we can specify such designs for predicting categorical variables, i.e., to solve classification problems.

Regression-type problems. Regression-type problems are generally those where we attempt to predict the values of a continuous variable from one or more continuous and/or categorical predictor variables.

Classification-type problems. Classification-type problems are generally those where we attempt to predict values of a categorical dependent variable (class, group membership, etc.) from one or more continuous and/or categorical predictor variables. There are a number of methods for analyzing classification-type problems and to compute predicted classifications, either from simple continuous predictors (e.g., binomial or multinomial logit regression in GLZ), from categorical predictors (e.g., Log-Linear analysis of multi-way frequency tables), or both (e.g., via ANCOVA-like designs in GLZ or GDA).

Tree methods are nonparametric and nonlinear + Simplicity of results. -Specify Criteria for Predictive Accuracy, Selecting Splits, When to Stop Splitting.

Data Classification -> Effectiveness Data_classification

Learning Models : We can think of a model as a template. When data is processed through a learning model, what will come out the other end is insight. The model is nothing more than a set of operations performed on the data. Models are typically made in a static environment by (drilling/ rolling, pivoting, slicing/ dicing, Etc.) through data and may involve Integrating multiple mining functions (ex. Classifying than Clustering).

Data classification

According to Golfarelli and Rizzi, these are the measures of effectiveness of the classifier:

_ Predictive accuracy _ : How well does it predict the categories for new observations?
_ Speed _ : What is the computational cost of using the classifier?
_ Robustness _ : How well do the models created perform if data quality is low?
_ Scalability _ : Does the classifier function efficiently with large amounts of data?
_ Interpretability _ : Are the results understandable to users?

Types of Models

Hierarchical Generative

Gaussian Mixture Models
Hidden Markov Models
Naive Bayes
GANS

Discriminative

NN
SVM
Logistic Regression

Descriptive

Derived from the Attributes of Data (mean, median, mode, avg)

MLE is the workhorse estimation technique of frequentist statistics latent variables :

expectation maximization
methods of moments
signal separation

-- principal component analysis

-- singular value decomposition

properties of estimators (bias, consistency, efficiency, sufficiency, robustness).

Testing: Type I and II errors, power, likelihood ratios

Methodology of probabilistic process models:

Dirichlet
Gaussian
basis/kernel expansion
splines, wavelets
support vector machines
other local regression models

latent variables:

expectation maximization
methods of moments
signal separation
principal component analysis
singular value decomposition

Feature Reduction:

PCA - describe greatest features
Cross Correlation Analysis
Linear Discrimination, Combines Features

concepts:

accuracy (tp+tn/p+n)

precision: tp/(tp+fp)

specificity: tn/(fp+tn)

sensitivity: tp/(tp+fn)

Regression:

linear regressions - numeric
logistic regressions - categorical

ensemble learning - bagging boosting stacking additive regression

Neural Nets: (super/unsuper)

autoencoders
deep beliefe nets
hebbian learning
gans
implicit density model
som

Clustering:

Hierarchal
Kmeans (Euclidean, Mikowski, Manhattan, #Clusters)
super/unsuper - sort by centroid
anomoly-detection - outliers, super/unsuper

RNN

LSTM
Hierarchal
Stochastic

FeedForward :

MIP
Autoencoder
Probablistic
Convolusional
Time Delay

Online Learning :

Data Efficient and Adaptable
No Data storage needed
Stochastic Gradient Descent
No data storage needed

T-Distribution - Visualize high density

RISK :Redundancy in the data. If the input features contain redundant information (e.g., highly correlated features), some learning algorithms (e.g.,linear regression,logistic regression, anddistance based methods) will perform poorly because of numerical instabilities. These problems can often be solved by imposing some form ofregularization. Presence of interactions and non-linearities. If each of the features makes an independent contribution to the output, then algorithms based on linear functions (e.g.,linear regression,logistic regression,Support Vector Machines,naive Bayes) and distance functions (e.g.,nearest neighbor methods,support vector machines with Gaussian kernels) generally perform well. However, if there are complex interactions among features, then algorithms such asdecision trees andneural networks work better, because they are specifically designed to discover these interactions. Linear methods can also be applied, but the engineer must manually specify the interactions when using them.

concepts :

accuracy (tp+tn/p+n)

precision: tp/(tp+fp)

specificity: tn/(fp+tn)

sensitivity: tp/(tp+fn) properties of estimators (bias, consistency, efficiency, sufficiency, robustness).

Testing : Type I and II errors, power, likelihood ratios

Critical Review Configuration and Risk Rational for Complexity Weighted parameters Weighted performance metrics Risk assessments and mitigations Technologies Roadmaps

Common Error Measures:

(root) Mean Squared Error - continuous data, sensitivity to outliers
median absolute deviation - continuous data, often more robust
sensitivity (Recall) - if you want few missed positives
specificity - if you want few negatives called positives
accuracy - weights false positives / negatives equally
Concordance - one example is kappa

Key issues: accuracy, overfitting, interpretability, computational speed. Pay attention to - confounding variables, complicated interactions, skewness, outliers, nonlinear patterns, variance changes, units/scale issues, overloading, regression, correlation and causation Confounder: a variable that is correlated with both the outcome and covariates

confounders can change the regression line.
detected with exploration

Hierarchical - Distance or similarity? - continuous(euclidean/correlation), binary - manhattan Graphs - help understand properties, find patterns, suggest future modelings, debug, and communicate. Bagging and Boosting - Combine classifiers to improve accuracy but make harder to interpret. Predictive

Datascience

Charles Karpati | Machine Learning

ML