NPTEL :: Computer Science and Engineering - NOC:Deep Learning

Modules / Lectures

Video Player is loading.

Current Time 0:00

Duration -:-

Loaded: 0%

Stream Type LIVE

Remaining Time -:-

Language for Video Transcript:

Video Transcript:

So, we'll start with a quick recap of probability theory. The assignment is also designed to kind

of make you just go back and read about these things which you would have done at

some point in your life and I'll just quickly go over the first module or the zero th module,

which is a recap of probability theory. That's, why I said embarrassingly back rate,

so axioms of probability, so for any event we know that probably of the event should be greater than

equal to zero and, if you have the Universal set which contains all the events in your,

all the events, then the probability of the universal set is going to be one.

These are the basic axioms of probability. Now, random variables so, so here's the intuition

behind random variables, right? Suppose a student can get one of three possible grades which is A,

B, C. one way of looking at it is that of all the possible events there are these three events, that

the student gets a grade A, all the student gets a grade B, and the student gets a grade C and,

there would be students in each of these events and you're trying to find the probability of this

event, right? The other way of looking at it is that you have, this set of students and you

have a random variable, which unfortunately is not a variable; it's a function actually,

which maps each of these students from your set to a particular value weight right. so,

that's what a random variable is. The random variable is actually a function, which Maps

your outcomes to your values, right? So, from for each of these students we have a function

which connects them to one of these three possible grades so, that's another way of looking at it so,

one way was to think of these grades themselves as even, the other way is to think that you have

a set which has a lot of outcomes and for each of these elements of the set you can map them to some

value which is a green. Okay. So, we will see why this is the more better way of doing it so,

irrespective of the first view or the second view, everything remains the same the answers

that you are going to get if I ask you what is the probability of the grade being a certain

value at grade being A or, B or, C. whether you take the first view or the second view, the

answer is going to remain the same that doesn't matter but, why do we focus on random variables

other than the first view is that You might be interested in several things about

a student, right? You might be interested in what are the heights, of different students how many of

them are short, how many of them are tall and so on. How many are adults young and, so on. it have

various things about a student that you could add each of these random variables actually operates

on the same set and maps them to different values, right? So, this view is more modular or,

more reusable in that sense, right? You have this set of possible outcomes and for each of them you

are trying to map them to certain values and these values could be different it could be grades,

height, age, what not, right? Everything could be possible, right? So, you could have a random

variable for each of these quantities that you are interested in and then you could ask questions,

right? Give me all the outcomes for which the grade is a certain value, the height is a certain

value and the age is a certain value, right? So, the more formal definition is a random

variable is a function which maps each outcome in your Universal set, to a value, right? And

the previous example the F grade, which is in shorthand represented as the random variable

capital G, is the random variable or the function, which maps each student to one of these three

possible grades A, B and C, right? So, remember random variable is a function it's not a variable

I don't know, why it is called a variable but it is called a variable, okay? And then you could

have a random variable, which maps at two ages and, a random variable which maps it to Heights

and so on, right? And the event grade is equal to A is actually a shorthand for the following even,

give me all those outcomes from my Universal set for which when I apply the function to this

outcomes the answer should be grade A, right? So, when I say I want the probability of grade

equal to A, this is what I actually mean, or if I ask for the set grade equal to A,

this is the set that I am looking at. Everyone is fine with this, okay? So, all of you should

be comfortable with this definition of random variable this is not my definition just the

generic definition I guess that, okay? Now, random variable can either be continuous

or discrete, right? So, discrete is the example of grades, where you have grades A,

B, C, D and so on, while it's a continuous random variable, height, weight and so on.

Which can take on any real value it's not discreet. Okay? For this discussion and for

the rest of the discussion on this remaining 30%, of the course we'll be focusing only on discrete

random variables unless, otherwise mentioned I don't think I'll ever look at continuous random

videos you'll only focus on discrete random variables, right? Okay? So, now that's what

a random variable is now that we understand random variables, we can talk about different

things related to random variables. The first thing that we can talk about is,

marginal distribution so what do we mean by a marginal distribution of a random variable? So,

if I ask you, give me a distribution for the grade, the random variable grade, what will

you actually give me? What are the marginal distributions in the discrete case actually

mean? If I ask you the marginal distribution of a random variable what do you need to actually give

me? Probability of each setting of the random variable, right? So, for if the random variable

can take values A, B, C suppose, the grade can take values A,B,C then you need to give me the

table that you see on the whatever side it is the table, right? The only table which is there. Okay?

And we denote this marginal distribution compactly as P of G, so when I say P of G, I actually mean

this entire vector or this entire table which is P of G, is equal to A, P of G, is equal to B and,

P of G is equal to C and, so on. That's what a marginal distribution, is specifying all

the values that the random variable can take probability for all the random values that a

random variable I know, this is very elementary but, it's very important for understanding how,

many number of parameters do you need to learn in the particular joint distribution or,

modular distribution and so on, right? Now what's a Joint Distribution suppose in

addition to grade which can take on values A, B, C you also have this random variable intelligence

which unfortunately can take only two values in our world which is high or low? Okay? what is a

Joint Distribution of our grade and intelligence it's specifying every, is specifying a probability

for, every combination of the grade and, so you have this cross product there are three possible

values for grades and two possible values for intelligence, for each of these six values,

you are going to specify a probability value, right? So, this table that you see is the Joint

Distribution, right? So, remember that when we always used to saying that Joint Distribution is

P of G comma I, right? But, that means that you have P of G comma I, for every value of G and

every value of I that's what you need to specify. Now again I am repeating this because when I asked

you to give me a joint distribution or, learn joint distribution from a data, from a given set

of training data, this table is what I expect, I expect you to give me values for all possible

combinations of the input variables or the input random variable that's why this is important,

okay? Now what's a conditional distribution so, if I ask you this is what we typically write it,

I want P of G given I, what does that mean? How, many values do I need to give you? And

again assume that G can take three values and, I can take two values, right? So, if I ask you

that give me this conditional distribution how many values do I need to give you? Six values,

it's the same as the Joint Distribution what will I have to give you?

So, I'll have to give you these tables, I'll assume that I is equal to High,

given that I is equal to high, what are the different properties for P of G, equal to A,

B and C and, the other table is given I equal to low, what are the priorities for A G equal to,

A, B and C, right? Okay? And there's some other simple stuff that this is how you

write the conditional distribution, is the joint distribution, over the marginal distribution,

right? So, this equation actually connects all the things that we have seen so far.

The joint distribution, is the conditional distribution, into the marginal distribution,

is that fine? Okay? Fine. So, you should be comfortable with if I ask you give me

a joint distribution, if I tell you how many values my random variables, can tell me you,

can take you should be able to tell me how many parameters I need to specify that

distribution that's what this, a basic material is meant to stimulate you to do, okay?

Fine. And what's, the joint distribution of n random variables the table on the next extra

table never on the first in all cases then tables should never be on the first what's the joint

distribution for n random variables, how many values do I need to give you? If each of these

random variables can take K values, how many values will join distribution have? K power n,

right? So, far and that's you're used to this because, you have done a lot of logic,

right? where you assume Boolean, variables and for all combinations you try to, write down some truth

table and solve it so, it's very similar to that so in other words that assigns P of X 1, equal to,

X 1, X 2 equal to X 2, for all possible values that the variable X I can take,

okay? And if each random variable can take two values you'll have two raise to n by

entries in the joint distribution, okay? And the other thing is, just as for two random

variables, you could write the joint distribution as a product of a conditional and a marginal, how

do you write the joint distribution of n random variables? So, I am going to start using some

terminology the Joint Distribution of two random variables factorizes as a conditional distribution

and, a marginal distribution, what about the Joint Distribution of n random variables? What's the one

rule which has stayed with us so, far and once continue to go into chain rule, right? So, again

we'll have the chain rule here so we have you can assume, that all of these variables are clubbed

together so given X 1 and, then probability of X 1, that's the same as this form, right? And then

just keep doing this recursively, till you get the following right? The ith variable, depends on all

the I minus 1 variables, before that and you'd have a product of these all, right? Fine this

is known as the chain rule and, you can clearly see that this is just a special case of this form

right? so, just be very comfortable with the chain rule, this is going to be very important, when

you are talking about various things it directed graphical models, or undirected graphical models,

or whatnot, right? So, it's very essential that you completely understand the chain rule and maybe

I'll, get back to later, okay? So, now from joint distributions to, marginal

distributions, suppose I'm given the joint distribution, over two random variables A and, B,

okay? So, the first table that you see here, what kind of a distribution is it? Joint, conditional,

marginal? Joint distribution, now from here, I want to find the conditional distribution for

A and, B what does that actually mean what am I given? And what am I asking for? P of A, P of P,

so how do I get the marginal distribution, from the joint distribution sum over what, okay? Fine

so, now first of all if I have to give you the marginal distribution of A, how many values do

I need to give you two values that I'm assuming that all my random variables are binary so two

values so, from the joint distribution how will I get these two values I'll sum up with two rows,

I'll keep the value of a same and sum over the B values and, same for the other great this

is again straightforward all of you know? That but just be comfortable with this that you can

obtain the marginal distribution, from the joint distribution by, summing over the variables which

are not of interest, right? So, when you want P of A, you will sum over the B's when you want P

of B will sum over these, okay? So, this is and in general now if I give you a joint distribution of,

okay? This is more compactly, right? So, this is like for all possible values that B,

can take you were going to sum this but compactly this is how we write,

right? We always ignore the value assignment and we just talk about P of a comma B, okay?

Now, from here, if you are given n random variables how, are you going to find the marginal

distribution from this joint distribution sum over all other variables, right? So,

do you see a problem with the summation you do see a problem with this summation,

right? There s a problem with the basic joint distribution itself, we'll come back to it but

we'll focus on these things but if you just kind of vaguely appreciate at this point it's,

fine we'll come back to it in a few more slides, okay? So, even if you are given n random variables

and a joint distribution, you can get the marginal distribution, for each of these n random variables

by summing over all those other variables that you don't care about, okay? Fine and again this

is more compactly written as this what is conditional independence when do I say,

that a variable X is independent, of the variable Y in terms of probability what's the equation

that you write P of x given by, is equal to P of X, knowing the value of Y does not change your

belief about X, that's the English way of saying it, right? and we denote this as X independent

of Y so, just this is a standard notation again and we would expect the grade, to be dependent on

intelligence but perhaps not dependent on weight or height or something this is probably not any

connection between them, okay? And recall that by the chain rule for two variables, we have P of X

comma Y, is equal to P of X, into P of Y given so, what will this simplify to so, combination

of the chain rule and the independence definition gives you this form for the Joint Distribution of

two variables if the variables are independent, okay? Fine. So, that's all the basic stuff from

probability that we need, I would encourage you to go back and just be comfortable with all of this

and with this. And with this we can now start discussing about Directed Graphical Modules.

Auto Scroll

Lecture Notes (1)
Assignments

Name	Download	Download Size
Lecture Note	Download as zip file	1.2G

Module Name	Download
noc19_cs18-assessmentid-36	noc19_cs18-assessmentid-36
noc19_cs18-assessmentid-50	noc19_cs18-assessmentid-50
noc19_cs18-assessmentid-53	noc19_cs18-assessmentid-53
noc19_cs18-assessmentid-67	noc19_cs18-assessmentid-67
noc19_cs18-assessmentid-71	noc19_cs18-assessmentid-71
noc19_cs18-assessmentid-73	noc19_cs18-assessmentid-73
noc19_cs18-assessmentid-78	noc19_cs18-assessmentid-78
noc19_cs18-assessmentid-84	noc19_cs18-assessmentid-84

Show entries

Search:

Sl.No	Chapter Name	MP4 Download
1	Recap of Probability Theory	Download
2	Why are we interested in Joint Distributions	Download
3	How do we represent a joint distribution	Download
4	Can we represent the joint distribution more compactly	Download
5	Can we use a graph to represent a joint distribution	Download
6	Different types of reasoning encoded in a Bayesian Network	Download
7	Independencies encoded by a Bayesian Network(Case 1: Node and it's parents)	Download
8	Independencies encoded by a Bayesian Network(Case 2: Node and it's non-parents)	Download
9	Independencies encoded by a Bayesian Network(Case 3: Node and it's descendants)	Download
10	Bayesian Networks : Formal Semantics	Download

Showing 1 to 10 of 37 entries

Previous1 2 3 4Next

English

Show entries

Search:

Sl.No	Chapter Name	English
1	Recap of Probability Theory	Download Verified
2	Why are we interested in Joint Distributions	Download Verified
3	How do we represent a joint distribution	Download Verified
4	Can we represent the joint distribution more compactly	Download Verified
5	Can we use a graph to represent a joint distribution	Download Verified
6	Different types of reasoning encoded in a Bayesian Network	Download Verified
7	Independencies encoded by a Bayesian Network(Case 1: Node and it's parents)	Download Verified
8	Independencies encoded by a Bayesian Network(Case 2: Node and it's non-parents)	Download Verified
9	Independencies encoded by a Bayesian Network(Case 3: Node and it's descendants)	Download Verified
10	Bayesian Networks : Formal Semantics	Download Verified

Showing 1 to 10 of 37 entries

Previous1 2 3 4Next

Sl.No	Language	Book link
1	English	Not Available
2	Bengali	Not Available
3	Gujarati	Not Available
4	Hindi	Not Available
5	Kannada	Not Available
6	Malayalam	Not Available
7	Marathi	Not Available
8	Tamil	Not Available
9	Telugu	Not Available