Modules / Lectures
Video Player is loading.
Current Time 0:00
Duration -:-
Loaded: 0%
Stream Type LIVE
Remaining Time -:-
 
1x

Video Transcript:

So, we'll start with a quick recap of probability theory. The assignment is also designed to kind
of make you just go back and read about these things which you would have done at
some point in your life and I'll just quickly go over the first module or the zero th module,
which is a recap of probability theory. That's, why I said embarrassingly back rate,
so axioms of probability, so for any event we know that probably of the event should be greater than
equal to zero and, if you have the Universal set which contains all the events in your,
all the events, then the probability of the universal set is going to be one.
These are the basic axioms of probability. Now, random variables so, so here's the intuition
behind random variables, right? Suppose a student can get one of three possible grades which is A,
B, C. one way of looking at it is that of all the possible events there are these three events, that
the student gets a grade A, all the student gets a grade B, and the student gets a grade C and,
there would be students in each of these events and you're trying to find the probability of this
event, right? The other way of looking at it is that you have, this set of students and you
have a random variable, which unfortunately is not a variable; it's a function actually,
which maps each of these students from your set to a particular value weight right. so,
that's what a random variable is. The random variable is actually a function, which Maps
your outcomes to your values, right? So, from for each of these students we have a function
which connects them to one of these three possible grades so, that's another way of looking at it so,
one way was to think of these grades themselves as even, the other way is to think that you have
a set which has a lot of outcomes and for each of these elements of the set you can map them to some
value which is a green. Okay. So, we will see why this is the more better way of doing it so,
irrespective of the first view or the second view, everything remains the same the answers
that you are going to get if I ask you what is the probability of the grade being a certain
value at grade being A or, B or, C. whether you take the first view or the second view, the
answer is going to remain the same that doesn't matter but, why do we focus on random variables
other than the first view is that You might be interested in several things about
a student, right? You might be interested in what are the heights, of different students how many of
them are short, how many of them are tall and so on. How many are adults young and, so on. it have
various things about a student that you could add each of these random variables actually operates
on the same set and maps them to different values, right? So, this view is more modular or,
more reusable in that sense, right? You have this set of possible outcomes and for each of them you
are trying to map them to certain values and these values could be different it could be grades,
height, age, what not, right? Everything could be possible, right? So, you could have a random
variable for each of these quantities that you are interested in and then you could ask questions,
right? Give me all the outcomes for which the grade is a certain value, the height is a certain
value and the age is a certain value, right? So, the more formal definition is a random
variable is a function which maps each outcome in your Universal set, to a value, right? And
the previous example the F grade, which is in shorthand represented as the random variable
capital G, is the random variable or the function, which maps each student to one of these three
possible grades A, B and C, right? So, remember random variable is a function it's not a variable
I don't know, why it is called a variable but it is called a variable, okay? And then you could
have a random variable, which maps at two ages and, a random variable which maps it to Heights
and so on, right? And the event grade is equal to A is actually a shorthand for the following even,
give me all those outcomes from my Universal set for which when I apply the function to this
outcomes the answer should be grade A, right? So, when I say I want the probability of grade
equal to A, this is what I actually mean, or if I ask for the set grade equal to A,
this is the set that I am looking at. Everyone is fine with this, okay? So, all of you should
be comfortable with this definition of random variable this is not my definition just the
generic definition I guess that, okay? Now, random variable can either be continuous
or discrete, right? So, discrete is the example of grades, where you have grades A,
B, C, D and so on, while it's a continuous random variable, height, weight and so on.
Which can take on any real value it's not discreet. Okay? For this discussion and for
the rest of the discussion on this remaining 30%, of the course we'll be focusing only on discrete
random variables unless, otherwise mentioned I don't think I'll ever look at continuous random
videos you'll only focus on discrete random variables, right? Okay? So, now that's what
a random variable is now that we understand random variables, we can talk about different
things related to random variables. The first thing that we can talk about is,
marginal distribution so what do we mean by a marginal distribution of a random variable? So,
if I ask you, give me a distribution for the grade, the random variable grade, what will
you actually give me? What are the marginal distributions in the discrete case actually
mean? If I ask you the marginal distribution of a random variable what do you need to actually give
me? Probability of each setting of the random variable, right? So, for if the random variable
can take values A, B, C suppose, the grade can take values A,B,C then you need to give me the
table that you see on the whatever side it is the table, right? The only table which is there. Okay?
And we denote this marginal distribution compactly as P of G, so when I say P of G, I actually mean
this entire vector or this entire table which is P of G, is equal to A, P of G, is equal to B and,
P of G is equal to C and, so on. That's what a marginal distribution, is specifying all
the values that the random variable can take probability for all the random values that a
random variable I know, this is very elementary but, it's very important for understanding how,
many number of parameters do you need to learn in the particular joint distribution or,
modular distribution and so on, right? Now what's a Joint Distribution suppose in
addition to grade which can take on values A, B, C you also have this random variable intelligence
which unfortunately can take only two values in our world which is high or low? Okay? what is a
Joint Distribution of our grade and intelligence it's specifying every, is specifying a probability
for, every combination of the grade and, so you have this cross product there are three possible
values for grades and two possible values for intelligence, for each of these six values,
you are going to specify a probability value, right? So, this table that you see is the Joint
Distribution, right? So, remember that when we always used to saying that Joint Distribution is
P of G comma I, right? But, that means that you have P of G comma I, for every value of G and
every value of I that's what you need to specify. Now again I am repeating this because when I asked
you to give me a joint distribution or, learn joint distribution from a data, from a given set
of training data, this table is what I expect, I expect you to give me values for all possible
combinations of the input variables or the input random variable that's why this is important,
okay? Now what's a conditional distribution so, if I ask you this is what we typically write it,
I want P of G given I, what does that mean? How, many values do I need to give you? And
again assume that G can take three values and, I can take two values, right? So, if I ask you
that give me this conditional distribution how many values do I need to give you? Six values,
it's the same as the Joint Distribution what will I have to give you?
So, I'll have to give you these tables, I'll assume that I is equal to High,
given that I is equal to high, what are the different properties for P of G, equal to A,
B and C and, the other table is given I equal to low, what are the priorities for A G equal to,
A, B and C, right? Okay? And there's some other simple stuff that this is how you
write the conditional distribution, is the joint distribution, over the marginal distribution,
right? So, this equation actually connects all the things that we have seen so far.
The joint distribution, is the conditional distribution, into the marginal distribution,
is that fine? Okay? Fine. So, you should be comfortable with if I ask you give me
a joint distribution, if I tell you how many values my random variables, can tell me you,
can take you should be able to tell me how many parameters I need to specify that
distribution that's what this, a basic material is meant to stimulate you to do, okay?
Fine. And what's, the joint distribution of n random variables the table on the next extra
table never on the first in all cases then tables should never be on the first what's the joint
distribution for n random variables, how many values do I need to give you? If each of these
random variables can take K values, how many values will join distribution have? K power n,
right? So, far and that's you're used to this because, you have done a lot of logic,
right? where you assume Boolean, variables and for all combinations you try to, write down some truth
table and solve it so, it's very similar to that so in other words that assigns P of X 1, equal to,
X 1, X 2 equal to X 2, for all possible values that the variable X I can take,
okay? And if each random variable can take two values you'll have two raise to n by
entries in the joint distribution, okay? And the other thing is, just as for two random
variables, you could write the joint distribution as a product of a conditional and a marginal, how
do you write the joint distribution of n random variables? So, I am going to start using some
terminology the Joint Distribution of two random variables factorizes as a conditional distribution
and, a marginal distribution, what about the Joint Distribution of n random variables? What's the one
rule which has stayed with us so, far and once continue to go into chain rule, right? So, again
we'll have the chain rule here so we have you can assume, that all of these variables are clubbed
together so given X 1 and, then probability of X 1, that's the same as this form, right? And then
just keep doing this recursively, till you get the following right? The ith variable, depends on all
the I minus 1 variables, before that and you'd have a product of these all, right? Fine this
is known as the chain rule and, you can clearly see that this is just a special case of this form
right? so, just be very comfortable with the chain rule, this is going to be very important, when
you are talking about various things it directed graphical models, or undirected graphical models,
or whatnot, right? So, it's very essential that you completely understand the chain rule and maybe
I'll, get back to later, okay? So, now from joint distributions to, marginal
distributions, suppose I'm given the joint distribution, over two random variables A and, B,
okay? So, the first table that you see here, what kind of a distribution is it? Joint, conditional,
marginal? Joint distribution, now from here, I want to find the conditional distribution for
A and, B what does that actually mean what am I given? And what am I asking for? P of A, P of P,
so how do I get the marginal distribution, from the joint distribution sum over what, okay? Fine
so, now first of all if I have to give you the marginal distribution of A, how many values do
I need to give you two values that I'm assuming that all my random variables are binary so two
values so, from the joint distribution how will I get these two values I'll sum up with two rows,
I'll keep the value of a same and sum over the B values and, same for the other great this
is again straightforward all of you know? That but just be comfortable with this that you can
obtain the marginal distribution, from the joint distribution by, summing over the variables which
are not of interest, right? So, when you want P of A, you will sum over the B's when you want P
of B will sum over these, okay? So, this is and in general now if I give you a joint distribution of,
okay? This is more compactly, right? So, this is like for all possible values that B,
can take you were going to sum this but compactly this is how we write,
right? We always ignore the value assignment and we just talk about P of a comma B, okay?
Now, from here, if you are given n random variables how, are you going to find the marginal
distribution from this joint distribution sum over all other variables, right? So,
do you see a problem with the summation you do see a problem with this summation,
right? There s a problem with the basic joint distribution itself, we'll come back to it but
we'll focus on these things but if you just kind of vaguely appreciate at this point it's,
fine we'll come back to it in a few more slides, okay? So, even if you are given n random variables
and a joint distribution, you can get the marginal distribution, for each of these n random variables
by summing over all those other variables that you don't care about, okay? Fine and again this
is more compactly written as this what is conditional independence when do I say,
that a variable X is independent, of the variable Y in terms of probability what's the equation
that you write P of x given by, is equal to P of X, knowing the value of Y does not change your
belief about X, that's the English way of saying it, right? and we denote this as X independent
of Y so, just this is a standard notation again and we would expect the grade, to be dependent on
intelligence but perhaps not dependent on weight or height or something this is probably not any
connection between them, okay? And recall that by the chain rule for two variables, we have P of X
comma Y, is equal to P of X, into P of Y given so, what will this simplify to so, combination
of the chain rule and the independence definition gives you this form for the Joint Distribution of
two variables if the variables are independent, okay? Fine. So, that's all the basic stuff from
probability that we need, I would encourage you to go back and just be comfortable with all of this
and with this. And with this we can now start discussing about Directed Graphical Modules.
Auto Scroll Hide
NameDownloadDownload Size
Lecture NoteDownload as zip file1.2G
Module NameDownload
noc19_cs18-assessmentid-36noc19_cs18-assessmentid-36
noc19_cs18-assessmentid-50noc19_cs18-assessmentid-50
noc19_cs18-assessmentid-53noc19_cs18-assessmentid-53
noc19_cs18-assessmentid-67noc19_cs18-assessmentid-67
noc19_cs18-assessmentid-71noc19_cs18-assessmentid-71
noc19_cs18-assessmentid-73noc19_cs18-assessmentid-73
noc19_cs18-assessmentid-78noc19_cs18-assessmentid-78
noc19_cs18-assessmentid-84noc19_cs18-assessmentid-84





Sl.No Language Book link
1EnglishNot Available
2BengaliNot Available
3GujaratiNot Available
4HindiNot Available
5KannadaNot Available
6MalayalamNot Available
7MarathiNot Available
8TamilNot Available
9TeluguNot Available