Tải bản đầy đủ (.pdf) (15 trang)

Data Driven computing AI The journey into the world of artificial inte

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (3.64 MB, 15 trang )

Draft of Chapter 1
With Regards

Ritesh Bhagwat


Note: Following is a draft version
Introduction to Data Driven computing & AI
The journey into the world of artificial intelligence is extraordinary. It is extraordinary
because it shows us that by just changing our perspective towards something we already
know; we can learn something new and amazing. The base of all Artificial Intelligence is built
on things that we all have probably already studied in our school and college. If someone
has studied math up to high school level, then chances are that there would be nothing new
in this book. But I can assure you that we will learn something new from all the things that
we already know. Everything will be built on things that we know. This is the very beauty of
Artificial Intelligence. So, let us get started with this journey of a lifetime.
When I was growing up as a teenager in the 1990’s the way to stand out in a conversation
was by talking intelligent scientific things. If we knew what is a “light year” or what does
supersonic mean, we appeared Intelligent. If we could do a complex Math calculation
quickly, we were hailed as a genius. All these traits were essentially about remembering
data and manipulating it. For a very long-time human intelligence is judged on our memory
and how we process the memory (data) stored in our brains. With the arrival of
smartphones and similar technology, the definition of Human intelligence is going to
change. With Artificial Intelligence, all the data processing is done in a gadget as small as our
phone, the need to remember or memorize data will not be so meaningful. But we as
human beings will have to evolve at a higher level. May be Knowledge and information will
be for a Machine and wisdom will be for us humans which is a good thing as we will evolve
in consciousness.
If you are reading this book, I’m sure you must have heard that AI is taking over and AI is
going to change the world. Have you ever wondered what is AI? How is this AI or modern
computing different from traditional computing? Let us do a fun activity to understand why


we need AI and what type of question does AI try to solve.
The following table lists down 10 famous personalities of the world along with their domain
of work and gender.
S.
No.

Name

Domain of Work

Gender

1 Roger Federer
2 Sachin Tendulkar
3 Mahatma Gandhi

Sports
Sports
Leadership

M
M
M

4 Steffi Graf
5 Nelson Mandela

Sports
Leadership


F
M

6 Robert Downey Junior

Art & Movies

M

7 Tom Cruise

Art & Movies

M


8 Steve Jobs

Tech

9 Scarlett Johansson

Art & Movies

F

Tech &
Philanthropy

M


10 Bill gates

M

Now try to answer the following questions.
1. How many of the above personalities are female?
2. Name the personalities whose domains of work is Tech.
3. Are there any female personalities included in the above list whose domain of work
is Art and movies?
4. Who are the top two most popular personalities from the list?
5. Whose domain of work is the best from the above personalities?
The answers to the first 3 questions are very simple.
1. 2
2. Steve Jobs & Bill Gate
3. Yes (Scarlett Johansson)
How about the fourth and the fifth questions? Do we have a universal answer to the fourth
and the fifth question? No, we don’t. These questions are subjective, not well defined or are
vague in nature. For every individual the definition of a famous personality or best domain
of work is different.
If we have 1000 respondents to answer these 5 questions:



When correct, all the 1000 respondents will give the same answers to 1st three
questions.
We will most probably get different answers for the 4th & 5th question.

Let us talk about the example in a different way. We can get the answers to the first three
questions by setting rules. We can write a simple program that will scan the domain of work

and gender of our personalities and we will get the answers. So, we have data we set rules
and we get answers.


Now how can we answer the fourth and the fifth question? The best we can do is ask all our
1000 respondents to vote with their answers. The most common answer becomes the rule.
If for the fifth question the most common answer is that the best line of work is “Art &
Movies” then it becomes our answer. Do remember if we change the number of our
respondents then the answer can also change. So, what we are doing here is we have data
we give answers to the data and we get a rule

We can see the difference between the two approaches. One is rule-driven, and another is
answer driven.

A problem to which we cannot set rules to get answers qualifies to be a problem
which should be solved by Artificial Intelligence

In the context of AI, the problems to which we can give solutions with rules come under the
category of traditional or classical computing and the one where we can’t set rules are the
area of Artificial Intelligence which I loosely term as Modern computing.
Do note that the rule-based questions can also be solved by artificial intelligence, but it is
not worth it to solve those with AI as it is computationally expensive. It is like you have a
pizza, a knife and a sword. You should always use a knife to cut the pizza and not the sword.
You can cut the pizza with a sword, but it is not worth it.
Now that we have a bit of understanding about AI, let us try and understand with a simple
example of what we mean when we say that everything in AI is built of things that we
already know.
Suppose you run an OTT platform like Netflix of Amazon Prime and you have three loyal
customers Steve, Natasha and Tony. There customers watch movies and give them ratings.
A fourth customer Scott logs into the platform and watches two movies, Iron Man and Jerry

Maguire and has given his rating to the movies. We have to recommend Scott more movies
to keep him hooked to our platform.


How can we do this using the data that we have? If we can figure out a way by which we can
know who out of Steve, Tony and Natasha has movie preferences like Scott then we can
recommend Scott other movies watched by that customer. To our surprise we can use high
school math to do this! Let us see how it works

Let’s assume the ratings given by customers is represented in the following table

Iron Man
Steve
Natasha
Tony
Scott

2
3
5
Scott logs in
4

Jerry
Maguire
1
3
4
2



Easiest way to see which of the three customers’ preference is closest to Scott is by
subtracting the scores given by Scott from the scores given by other customers. In other
words, we can find the “distance” between Scotts’ score from score of Steve, Natasha &
Tony. The simplest way to see the distance is by subtracting the corresponding values of the
movie rating given by Scott and given by other customers.
Let us say the rating of Iron man is represented by x and Jerry Maguire by y. So, the distance
can be calculated by the formula:


|x1-x2| + |y1-y2|
Where:
x1 = Rating of Iron man by old customer (Steve or Natasha or Tony).
x2: Rating of Iron man by Scott
y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony).
y2: Rating of Jerry Maguire by Scott

|K| represents the absolute value of K which will always be positive. If K= 3 then



|3| =3
|-3| =3

This process of computing distances using absolute value is known as Manhattan Distance.
Jerry
Maguire

Iron Man
Steve

Natasha
Tony
Scott

2
3
5
Scott logs in
4

1
3
4
2

Referring the above table, the Manhattan distance between:




Scott and Steve = |4-2| + |2-1| = 3
Scott & Natasha = |4-3| + |2-3| =2
Scott & Tony = |4-5| + |2-4| =2

So, we can see that the Manhattan distance between Natasha and Scott is the lowest of all
three hence Natasha’s movie preferences should be closer to that of Scott. We can go ahead
and recommend Scott, all the other movies watched by Natasha and highly rated by her.
Chances are that he will also like those movies



This was the Manhattan distance. We all are more familiar with something known as
Euclidean distances. Euclidian distance can also be used to solve the same problem.
Euclidian distance is calculated by formula:






x1 = Rating of Iron man by old customer (Steve or Natasha or Tony).
x2: Rating of Iron man by Scott
y1: Rating of Jerry Maguire by old customer (Steve or Natasha or Tony).
y2: Rating of Jerry Maguire by Scott

Euclidean distance between




Scott and Steve: Sqrt (5) = 2.24
Scott and Natasha: Sqrt (2) = 1.41
Scott and Tony: Sqrt (5) = 2.24

We can see that the Euclidean distance between Natasha and Scott is the lowest. In the case
of Euclidean distance, we have another way of representing the dataset. We can represent
the dataset in an X-Y coordinate system

Here x axis represents Iron man and y axis represents Jerry Maguire. Steve has given a rating
of 2 to iron man and 1 to Jerry Maguire so the coordinate representation is (2 ,1). Same



concepts follow to everyone’s ratings. Just looking at the plot here, we can see that Natasha
and Scott are closest to each other and hence they have same preferences of the movies.
We have two movies so we have a two-dimensional space, if we had three movies, we
would have moved up to a three-dimensional space and if we have n number of movies, we
can move to n dimensional space.
By the way, the problem that we just solved is known as Collaborative filtering. Essentially,
we just built a recommendation engine using collaborative filtering. And all that by just
using school math! How cool is that!

Fun Fact
“We studied about two distances namely Manhattan distance and Euclidean
distances. These distances come from a family of distances known as Minkowski
distance. A general formula for Minkowski distances is:

In this formula of Minkowski distance if:
p= 1 then it is known as Manhattan distance
p= 2 then it is known as Euclidean distance
There are many other distances in Minkowski family like hamming distance where
p =0 and so on. You can google about other distances as those distances are
beyond the scope of our book. The most used distance is the Euclidean distance”.

Machine Learning: What does it mean?
At its core machine learning is the ability of a system to learn on its own without being
explicitly programmed. What sets Machine learning apart from traditional computing is its
“human-like” ability to learn on its own.
As kids, we all have made that mistake of touching something that is very hot. That burning
sensation is unforgettable. But what we learn from that experience is never touch
something that is hot. In a similar way when a machine is exposed to some data it
remembers that data and makes its decisions based on that memory that it gathered by that

data.


What do we mean when we say human like ability to make decisions? Let’s say it is raining
heavily outside and a friend comes to our home as asks you to go for a picnic. How will we
decide whether it’s worth going for the picnic in heavy rains? You do it on your past
experience right? To put it down into a process there are roughly three steps involved




Recall: Recall what happened in a scenario
Process: Think of the scenario
Decide: Take a decision
Applying this 3-point technique to our Picnic decision





Recall : Whenever it rains very heavily, traffic is hit badly. It happened last time.
Process : It is raining very heavily now so traffic should be hit badly.
Decide : Let us stay at home and have a hot cup of green tea!

Humans make decisions based on experience. The experience of machines is Data.
Machines make decisions based on Data

But how does a machine get experience. Let’s try to understand this from the following
example. Suppose we have a dataset of 5 patients with their blood pressure and whether
the patient has a heart disease or not.


Patient No.
1
2
3
4
5
6

Blood
Pressure

Heart
Disease

High
High
High
High
Normal
High

Yes
Yes
Yes
Yes
No
???



Based on the 5 data points we want to predict the heart condition of a 6th patient who has
high Blood pressure. We pass the data to the machine and ask for an answer. The machine
will scan this data and find a “pattern” that whoever has a high blood pressure also has
heart disease so it is highly likely the machine will tell us that the 6th patient has a heart
problem. So, the answer is Yes.
You can also notice here that we could have also come up to this conclusion by just using
the statistical concept of correlation between Blood Pressure and Heart Disease. Statistics
and Statistical modeling play a very important role in the field of Machine learning.
In the context of Artificial Intelligence, which as we studied earlier, is answer driven and not
rule-driven, it is also important to note here that to identify whether the sixth patient has
heart disease or not, we



Gave the machine answers in the form of the records of 5 patients.
The machine in response gave us a rule that as per the data whoever has high blood
pressure also has heart disease.

Important to note here that it may not be medically correct but is correct with respect to
the data to which the machine is exposed to. If the data was different the outcome would
have been different. It is generally perceived that the more data a machine has the better
the outcomes.
To sum up the above activity what we did:



We “trained” the machine with a data set. This data set is called as training data
Asked for answers to the machine on new data on which it was not trained.

This is exactly how all machine learning works. You train your algorithm (Machine) on huge

datasets, the algorithm learns obvious and not so obvious (hidden) patterns in the dataset.
When you expose the algorithm to a new dataset which it has not seen earlier, the
algorithm tries to answer your question on the new data set based on the learning it has
acquired from the training data.
In practical scenarios, there is one more step before exposing your algorithm to a new-data
set. This is called the testing stage. We break up our original dataset into 2 parts (ratio of
80:20 or 75:25 etc.)



Training Data
Testing Data

You train the algorithm on training data and validate/test your algorithm on the testing
data. In the testing data we already have the answers. So first we hide all the answers as if
they are not present. We expose out testing data to the model which was built on the
training data and we predict the outcomes on the testing data. Now we compare the
outcomes of this testing data from the actual outcomes that we kept hidden. By comparing
the predicted outcomes with the actual outcomes, we can evaluate





How accurate is our model?
How big is the error in our model?

Once the algorithm gives good results on testing data then the algorithm is good for being
used in real-life problems.


Types of Machine Learning
As a Beginner we need to know that there are two types of Machine Learning:



Supervised Machine Learning
Unsupervised Machine Learning

There is another type of Machine learning known as reinforcement learning. Let us leave
that for now as it is outside the scope at the beginning level .
To understand the difference between supervised machine learning and unsupervised
machine learning we have to understand what is labelled and unlabelled data.

Labelled Data and Unlabelled Data

Labelled data means it has a tag attached to itself. The tag can be anything like a name, a
number, a class, a type. Unlabelled data does not have A tag attached to it .
In the above picture unlabelled data is just bunch of fruits (objects). Imagine if we did not
know how fruits look then for us those would be just a bunch of objects as there is no
description of those objects available. For a machine (computer) the unlabelled data set is
just a bunch of objects.
On the other hand label data has clear classification that those objects are Apples and Pears.
If someone doesn't even know how Apple or Pear looks she can just read the label and
understand that it is something called an Apple and something called as a Pear. For a
machine these are not just any objects but 2 distinct type of objects one is Apple and one is
a Pear.


Now that we've understand what is labelled and unlabelled data let us pay our attention to
the following 2 points .




Supervised learning is all about working with labelled data
Unsupervised learning is all about working with unlabelled data.

Let us explore the about 2 points in more detail

Supervised Machine learning

Supervised machine learning is answering question like the above. Based on the colour
shape and size the machine learning algorithm will identify whether the fruit is an Apple ,
grape or banana .These attributes like colour, shape, size are known as the features of the
data.
Consider credit card transaction ,based on Location, time of transaction, amount of
transaction a machine learning algorithm can try to identify whether a transaction is a
fraudulent transaction or a legitimate transaction . Here fraudulent transaction is labelled as
one and legitimate transaction is labelled as zero .
Consider another example of x-rays of lungs. A machine learning algorithm will study the
patterns of the x-ray and try to predict whether the patient has lung infection or not. an xray of an infectious lung is classified as one and the x-ray of a healthy lung is classified as
zero .
Consider another example where you are trying to buy a house and you want to predict the
price of the house. You can take features like number of bedrooms, square feet area ,
nearness to hospital , location and city and accordingly predict the price of house .
All the above examples represent supervised learning where we know exactly what we are
looking for and we have a label attached to it .
Further Supervised learning can further be classified into 2 types




Regression
Classification


Regression:
When the label attached to a supervised learning algorithm is a continuous value which can
be represented on a number line then we say that we are working on the regression
problem . Predicting the cost of a house is a continuous value similarly predicting the price
of a company stock is a continuous value , predicting tomorrow's temperature is the
continuous value , predicting salary of an employee with a particular skill set is again a
continuous value. All these values can be represented on a number line hence all these are
regression problems .

Classification:
When the label attached to a supervised learning algorithm is a discrete value then it is a
classification problem. When we are trying to predict if a patient has a disease or not the
outcome is represented in discrete values of zero and one, when we are trying to predict
whether a credit card transaction is fraudulent or not then the outcome is discrete, when
we are trying to predict whether a customer is going to buy a product or not the outcome is
again discrete. So all these are classification problems .

Regression vs Classification

In the above figure if you want to predict the temperature it can take any value on the
number line so it's a regression problem. While if we just want to find out about the
weather tomorrow whether it's going to be rainy or sunny then it's a classification problem
which takes 2 discrete values

Binary and Multiclassification:
One more distinction in classification is binary versus multi classification .When the output is

discrete and can have only two outcomes like either yes or no ,one or zero ,Rainy or sunny it


is a binary classification problem . Predicting whether a patient has a disease or not is a
binary classification problem because it can have only 2 outcomes . If we are trying to
predict a fruit weather the fruit is an Apple or a grape or a pineapple then this problem is a
multi-classification problem because there are more than 2 discrete outcomes. Multi
classification problems are very common in the field of computer vision where you need to
detect objects in the real world .

Unsupervised Learning:

Now that we are clear about supervised learning let's talk a bit about unsupervised learning
Imagine a bag has there 3 types of fruits (Objects). When we pass pictures of these fruits
through an unsupervised machine learning algorithm the algorithm will create 3 segments
one representing the fruit 1, second the fruit 2 and 3rd the fruit 3. We can give them any
labels like A, B and C. Important point to note here is that the machine learning algorithm


has not called those fruits as Apple grapes and bananas rather just label them in 3 different
categories namely A, B and C. It could have been any labels 0,1, 2 all red, green orange etc.
These 3 segments are created based on the colour size and shape of the fruit. As stated
earlier colour, shape , size are the features of the objects . This is what precisely
unsupervised learning is.
Unsupervised learning has many use cases in real world. Imagine that you are a phone
manufacturer and you are planning to launch a premium phone. You have a large database
of customers and you would only like to target your premium customers for that phone. If
you have the spending patterns of the customers you can easily categorise them into
premium and non-premium customers. What you need to do is pass an unsupervised
machine learning algorithm on the customer data set and ask it to return two different

categories based on the spending patterns . The feature used for segmentation here is
spending pattern.
Many times unsupervised learning is used as a precursor to supervises learning. Imagine if
you want to find out whether a fruit is an Apple or not. But the data set that we have are
images of 3 different types of fruits. So what we can do is run an unsupervised learning
algorithm and cluster these into 3 different segments . One will be labelled as Apple another
will labelled grapes and the 3rd one will be labelled pineapple. As we already have labelled
apple data we can take the cluster of the images which are labelled as Apple and build a
supervised learning algorithms which tries to identify an Apple
Similarly, Imagine you are working on a healthcare problem where you need x-rays of lungs
to identify a respiratory problem. However the data set that you have is a huge repository of
MRI and X Ray scans of distinct body parts which include lungs, liver, brain and kidney. So
before working on the actual problem you need to first segregate the lung X Rays and nonlung X Rays. This can be done with unsupervised learning where the algorithm will study
how the patterns of a images in X rays and create a separate clusters for lung ,Kidney, brain,
liver. Now you can just take the scans related to lung and start working on your problem.

****************More to come Soon*****************



×