ML and DL in Nutshell

8 min readNov 7, 2023

Machine Learning

Machine learning is a branch of artificial intelligence that enables computers to learn and make predictions from data without explicit programming. It involves the development of algorithms and models that improve their performance as they analyse and generalize from large datasets.

Types of Machine Learning-

Supervised, Unsupervised, Reinforcement Learning

1. Supervised Machine Learning

· Supervised machine learning is a type of machine learning where the algorithm is trained on a labelled dataset.

· In this approach, the algorithm learns to make predictions or classifications based on input data while being provided with the correct output (labels) during training.

· The goal is to generalize from the training data and make accurate predictions or classifications on new, unseen data.

Supervised ML is classified into 2 types:

Classification
Regression

1. Classification: (Continuous data )

Classification is a type of supervised learning where the algorithm is trained to assign data points to predefined categories or classes. It is used when the target variable is categorical, such as determining whether an email is spam or not spam, or classifying images of animals into different species.

1.1 SVM

SVM is primarily used for classification but can also be applied to regression. The main objective is to find a hyperplane that best separates data into distinct classes while maximizing the margin between the classes. It creates a decision boundary called a Hyperplane in a N-dimensional space to classify the data point. N is the number of features. Extreme data points are chosen to create a hyperplane. These data points are called Support Vector. The margin can be maximized using support vectors. Deleting support vectors will change the position of the hyperplane.

· The output of the linear function ranges from [-1, 1].

1. Hyperplane

2. Margin

3. Support Vectors

4. Kernel Trick

Types of SVM :

1. Linear SVM:

Linear SVM is used for linearly separable data, which means if a dataset can be classified into two classes by using a single straight line, then such data is termed as linearly separable data, and classifier is used called as Linear SVM classifier.

2. Non-Linear SVM:

Non-Linear SVM is used for non-linearly separated data, which means if a dataset cannot be classified by using a straight line, then such data is termed as non-linear data and classifier used is called as Non-linear SVM classifier.

Applications of SVM

Face detection, Text and hypertext categorization, Classification of Images, Bioinformatics, Handwriting recognition, etc.

1.2 Decision Tree

A decision tree is a supervised machine learning algorithm used for both classification and regression tasks.

This algorithm model utilizes conditional control statements.

It is a graphical representation of a decision-making process that resembles an inverted tree, where each node represents a decision, each branch represent an outcome of the test, and each leaf node represents a class label or a numerical value.

Decision Tree assumptions:

Binary Splits, Recursive Portioning, Feature Independence, Homogeneity, Top-Down Greedy Approach, Categorical and Numerical Features, Overfitting, No missing Values, No Outliers

Decision Tree Concepts:

1. Entropy:

Entropy is a measure of impurity or disorder in a dataset. In the context of Decision Trees, it quantifies the uncertainty or randomness of the class labels in a set of data. Mathematically, for a binary classification problem (two classes, e.g., 0 and 1), the entropy of a node is calculated as:

Entropy = − p1 ∗ log2 ( p1 ) − p2 ∗ log2 ( p2 )

Where p1 and p2 are the proportions of data points belonging to the two classes. Lower entropy indicates a purer (homogeneous) set of data.

2. Information Gain:

Information gain measures the reduction in entropy (or impurity) achieved by partitioning the data based on a particular attribute. It helps determine which attribute provides the most valuable information for splitting the data and creating a more organized tree. The information gain for an attribute is calculated as the difference between the entropy of the parent node and the weighted average of the entropies of the child nodes created by the attribute split. The attribute with the highest information gain is selected as the splitting criterion for the next node in the tree.

3. Gini Index:

Gini index says, if we select two items from a population at random then they must be of same class and probability for this is 1 if population is pure. It works with categorical target variable “Success” or “Failure”. It performs only Binary splits. The attribute with the lowest Gini Index (i.e., the attribute that minimizes impurity) is chosen for splitting the data.

1.3 Random Forest

Random Forest is an ensemble learning technique in machine learning, which is used for both classification and regression tasks.

It is a group of decision trees where each tree is built from bootstrap samples and randomly selected subset of features without replacement.

It’s a versatile and powerful algorithm that combines the predictions from multiple decision trees to make more accurate and robust predictions.

2. Regression: (discrete data)

Regression is a type of supervised learning where the algorithm is trained to predict a continuous output or numerical value. It is used when the target variable is a real number, such as predicting house prices based on features like square footage, number of bedrooms, and location.

2.1 Linear Regression

· LR is used for modelling the relationship between a dependent variable (target) and one or more independent variables (features) by assuming a linear relationship between them.

· The goal of linear regression is to find a linear equation that best fits the data and can be used for making predictions or understanding the relationship between variables.

o y=mx+b

· The goal of linear regression is to find the best-fitting line that minimizes the difference between the predicted values and the actual data points.

· This is typically achieved using techniques like the least squares method.

Example:-

· Insurance companies needs to know the association between age and healthcare cost.

2.2 Logistic Regression

· LR is used for binary classification problems, where the goal is to predict whether a data point belongs to one of two possible classes (e.g., yes/no, true/false, spam/not spam).

· In logistic regression, the algorithm models the probability that a data point belongs to a particular class.

· The output of logistic regression is a value between 0 and 1, representing the probability of the data point belonging to the positive class.

· It uses the logistic (or sigmoid) function to transform a linear combination of input features into a value between 0 and 1.

· In logistic regression, multiple independent variables are mapped to a single dependent variable.

· Logistic regression is commonly used in various fields, including medicine ( predicting disease outcomes), finance ( credit scoring), and marketing ( customer churn prediction), where binary classification is a common task.

2. Unsupervised Machine Learning

Unsupervised Machine Learning is a type of machine learning that learns from data without human supervision. Unsupervised machine learning models are given unlabelled data and allowed to discover patterns and insights without any explicit guidance or instruction.

2.1 K-means Clustering

· The key objective of a k-means algorithm is to organize data into clusters such that there is high intra-cluster similarity and low inter-cluster similarity.

· An item will only belong to one cluster, not several, that is, it generates a specific number of disjoint, non-hierarchical cluster.

· The most popularly used clustering techniques are K-means clustering and Hierarchical.

· K-means uses the strategy of divide and concur, and it is a classic example for an expectation maximization (EM) algorithm.

Algorithm Workflow:

Step 1:

K centroids are randomly picked and all the points that are nearest to each centroid point are assigned to that specific cluster. Centroid is arithmetic mean or average position of all the points.

Step 2:

The centroid point is recalculated using the average of the coordinates of all the points in that cluster. Then Step 1 is repeated until the cluster converge.

K-means applications:

Image Segmentation, News Article Clustering, Anomaly Detection.

3. Reinforcement Learning

Reinforcement Learning is a feedback-based Machine learning technique in which an agent learns to behave in an environment by performing the actions and seeing the results of actions.

For each good action, the agent gets positive feedback, and for each bad action, the agent gets negative feedback or penalty.

In Reinforcement Learning, the agent learns automatically using feedbacks without any labelled data.

Since there is no labelled data, so the agent is bound to learn by its experience only.

Approaches to implement Reinforcement Learning

There are mainly three ways to implement reinforcement-learning in ML, which are:

Value-based:
The value-based approach is about to find the optimal value function, which is the maximum value at a state under any policy. Therefore, the agent expects the long-term return at any state(s) under policy π.
Policy-based:
Policy-based approach is to find the optimal policy for the maximum future rewards without using the value function. In this approach, the agent tries to apply such a policy that the action performed in each step helps to maximize the future reward.
The policy-based approach has mainly two types of policy:

Deterministic: The same action is produced by the policy (π) at any state.
Stochastic: In this policy, probability determines the produced action.

Model-based: In the model-based approach, a virtual model is created for the environment, and the agent explores that environment to learn it. There is no particular solution or algorithm for this approach because the model representation is different for each environment.

4. DEEP LEARNING

Deep learning is a subfield of machine learning and artificial intelligence that focuses on training artificial neural networks with many layers, often referred to as deep neural networks. These networks are designed to automatically learn and represent data through a hierarchy of increasingly abstract and complex features. Deep learning algorithms for complex neuron networks are the state-of-the-art solutions for complex problem solving like image and voice recognition.

Fundamentals of Deep Networks:

· Neural Network: Deep networks are based on artificial neural networks, which are composed of interconnected nodes called neurons. Neurons are organised into layers-input, hidden, output. Each neuron receives inputs, applies a transformation to produce an output and passes it to next layer.

· Parameters: Internal variables of a neural network that are learned during the training process.

· Layers: A neural network consists of input, hidden and an output layer. Each layer composed of multiple neurons.

· Activation Function: Introduces non-linearity into the NN, allowing it to learn complex relationships in the data. Ex- Sigmoid function.

· Loss Function: Quantify the difference between the predicted outputs of a NN and the true or desired outputs.

Types of DLN

1. ANN

Artificial neurons are the building blocks of the multilayer artificial neural networks. ANN consists of interconnected nodes organized into layers. These networks are designed to process and transform data in a way that allows them to learn patterns and make predictions.

Components of ANN:

Perceptrons, Activation Functions, Multilayer Perceptron, Back Propagation

2. CNN

3. RNN

4. GNN

5. NLP