Sunday, April 28, 2019

Coursera Andrew Ng course: Machine learning

Lessons learned

Linear Regression:
  • Trying to fit data into a polynomial theta*x
  • The cost function
  • Gradient descent update of the theta

Logistic regression:
  • Classification of data
  • Cost function
  • Logistic function
  • Gradient descent update
Regularization
  • To prevent overfitting
  • Linear regression cost function
  • Gradient descent
  • Logistic regression cost function
  • Steepest descent update

Neural networks

  • Cost function
Backpropagation
  • Starting with the error of the output layer, propagate backwards to find error of each layer to compute the gradient

  • It is easy to make a mistake in estimating the gradient, hence one can check the gradient every several hundred iterations to make sure its falling down.  Cross check function (slow)
  • Thetas should be randomly initialized but NOT zero
  • Summary:


Diagnostics
  1. Split data to 60% training 20% Cross validation and 20% test
  2. Use different parameters of the algorithm (Number of features, polynomial orders, regularization parameters) on the training set.
  3. Choose the one which minimizes the Cross Validation set error
  4. Generalize on the test set
  • Bias and variance

  • Summary of diagnostics

SVM
Linear Kernel
  • Cost function
  • With cost defined as
Gaussian Kernel
  • Distance between each point and training set is considered a ‘feature’

  • Overfitting with Gaussian Kernels


Logistic vs SVM vs Neural network

Unsupervised learning
K means

  • Random initialization of the centroids can be done by picking up data points. We can try 100 random initializations and get the one with the lowest cost
Dimensionality reduction using PCA
  1. Data must be normalized first
  2. Compute covariance matrix
  1. Compress your data
  • K should be chosen so that


Precision/Recall
Online learning
  • As a data set arrives, update gradient descent to get new theta
MapReduce
  • Distribute tasks over separate cores/computers to perform the task and combine them at the end

  • To check which part of the pipeline to be investigated better do a ceiling analysis








No comments:

Post a Comment

Loud fan of desktop

 Upon restart the fan of the desktop got loud again. I cleaned the desktop from the dust but it was still loud (Lower than the first sound) ...