In this lecture we will discuss about, different
Fundamental operations of Image Processing
and an overview of image processing.
So, let us understand first how images are
represented in computer. Let us consider an
image which is displayed on your screen; and
consider a small portion of the image as it
is shown by this rectangle, if I zoom this
portion then you will find the enlarging portions
of the image. And you can see that some of
the details are more visible here and, but
there are certain kind of rectangular regions
of the pixels regions which are also visible.
And, if I further zoom this area, then you
will observe that further zooming those areas
there are small squares of uniform illumination,
those values are shown here by the numbers
in those squared. So, finally, as you can
see that, in the image at the very bottom
level of representation, every point has a
number and that number is representing the
brightness value at that particular point.
However, while displaying it on a screen it
is a small area which is represented; which
is representing that point and this brightness
value is also shown at that point.
So, the image is represented as a 2D array
of integers. So, these numbers are all those
integers; and in a 2 dimensional array as
you know you require to also mention the array
size. So, those array size will be the width
and height of the image for example, in this
case for this image the width is 256, which
means there are 256 points along its width
and height is 384, there are 384 points along
its height.
When we consider representing a colour image,
there would be three such 2 dimensional arrays
each for representing one of the primary colours
like red, green and blue. If you consider
any particular point of these array, all the
respective locations all the respective array
elements corresponding array elements with
the same array indices, they will represent
a combination of colour by this three primary
colours. So, you will have a red component
of the colour, if I can display only with
the red colour in the screen.
And a green component of the image, which
is displayed here by the green colour; and
the blue component of the image which is displayed
by blue colour. Now, once we superimpose all
these colours on a screen, then you can get
the colour representation of the image itself.
So, an image in this case is represented by
three, 2 dimensional array.
And when this information is stored in a computer
hard disk, in the secondary storage you know
that any information in the secondary storage
is stored as a file. Similarly, image has
to be stored as a file and a file it consists
of a stream of bytes. So, in this case every
byte or a collection of byte will represent
a pixels and we can consider that is as a
stream of pixels.
However, to represent the image in the in
your program in your computation, you require
other associated information’s of the images
of that image and which has to be stored also
with this stream. Usually, they are stored
a head of the stream in a very predefined
format and which is called header of the image
file. For example, a header can consist of
this kind of information’s like, it could
it should contain the width information, height
then number of components which means in this
case for a colour channel it should be three
for a colour image.
Number of bytes per pixel as I mentioned that,
a pixel could be represented by number of
bytes; in a very elementary representations
usually it is 1 byte per pixel. And where
the values it you know unsigned integer values
vary from, 0 to 255 per pixel. of course,
there should be an end of file representation
for a file.
And there are different standard file formats
which are available for an image representation,
they are very much standardized and documents
are available. So, when you get an image file
in these formats, you should know the corresponding
format and you should parts the header and
get the images. So, these are some example
formats like TIFF, BMP, GIF etc.
So, let us consider, how an image is formed
in an optical camera? Let us consider in a
camera which is represented here, a lens of
the camera which plays a very critical role.
And, there is a plane where the image points
are projected where the points of the 3 dimensional
scene, they are projected on a 2 dimensional
plane and that is actually the focal plane
usually of these particular lens. And where
the images are found and those points are
finally, sensed by corresponding sensors and
they are digitised and you get a digital representation.
So, if I consider that, how a point in a 3
dimensional space is mapped to an image point;
so, let us see how it can take place, if you
consider a point P where a light is reflected
and that light is passing through the centre
of the lens and intersecting at the plane
of projections or the image plane. So, this
point which is represented here by small p
is a image of the scene point which is represented
here by this capital P. So, image has been
in this case image formation, has taken place
due to the phenomena of reflection as you
can see.
So, there is another information, which is
also encoded at this image point that is a
amount of reflection that amount of energy
that is received at this point which is reflected
from the point P. So, that is the action of
lens, which takes care of this particular
but it tries to get as much as energy reflected
energy tries to put them together on this
plane p, collects it and focused it on this
point p and that is what is called also focusing.
And that is why you get a very sharp picture
if it is properly focused, a sharp point representation
of the scene point.
So, this is another encoding. So, the amount
of energy which is reflected from this point
that should be received here and that is sensed.
So, the interpretation of this image is that
it keeps a brightness distribution in this
2 dimensional plane; where at each point you
get the corresponding it is proportional to
the amount of energy reflected from its corresponding
scene point.
Let us look at minutely that once again; what
is a rule of projection that I mentioned here
and that provides you a very simple mathematical
tool to compute given a point P in the 3 dimensional
world, what should be its corresponding image
point in a 2 dimensional plane. So, as we
can see in this case that, this can be formed
if I draw a line from this point through a
particular fixed point O which is here the
centre of the lens and extend that ray which
hits the corresponding image plane and that
intersection point defines the image point
of the 3 dimensional scene point.
So, this is a centre of projection as I mentioned
and this is the image plane. And so, we can
summarize this rule as image points formed
by intersection of the ray from a point P
and passing through the centre of projection
O with the image plane. This kind of projection
is known as perspective projection.
But, this is not the only way images are formed
there could be other kinds of imaging principles
other rules of projections can take place.
For example, in this case what I am going
to show here, there are image formed image
of this cube where you know one of the planes
of the cubes have been projected here. And,
as you can see all these points are parallely
projected to this plane, there is a particular
direction which has to be considered in this
case for this projection it could be normal
to the plane, it could be any other directions
in a 3 dimensional plane.
So, this rule we can summarize in this way
that image points formed by intersection of
parallel rays with the image plane. And, this
kind of one example of this kind of imaging
is X ray imaging where X ray beams parallel
X ray beams, it passes through our know body
through bones through tissues. And then it
intersects the corresponding X ray plate,
which acts likes a image plane in this case
and forms the image and this projection is
known as parallel projections.
Let us take another imaging principle. Let
us consider you have a surface of an object
and your imaging sensor there is a transmitter,
which transmits the transmits a electromagnetic
wave or some acoustic wave and then, the reflected
wave is received by receiver. So, the duration
the time interval between transmission and
reception that can be measured and if you
know the velocity of the wave you can compute
the corresponding distance from this point.
And consider it scan radially, you are taking
this at every regular interval and you are
scanning it radially over the surface points.
And you can get or you can consider also,
you can translate this transmitter receiver
transmitter receiver along certain directions
and performs its action repeatedly. So, for
every surface point in that path you will
get a distance. So, you can measure the distance
not only that the amount of reflection, what
you get from the surface that would also determine
the orientation and surface property of this
particular material.
So, one example is this echocardiograms, where
acoustic waves are used and ultra sound waves
are used and so, this is one kind of example.
So, if I summarize that how what is an image,
how do I define an image? It is in a very
short sentence we can say, it is an impression
of the physical world. And just to make it
a little more just to elaborate it we can
say that, it is a special distribution of
a measurable quantity encoding the geometry
and material properties of objects.
So, now I will be discussing a few concepts
and operations, which are which are there
in the image processing. And in this course
we would require some of these concepts as
I mentioned earlier that know you do not required
to go through the first level image processing
course to attend this particular computer
mission course, these are the primers that
I will be discussing here. However, it would
be better if you follow some image processing
textbooks and know further know more details
about this concepts.
So, let us consider the first a very simple
concept of images a very first level statistics
of the distribution of this pixel values and,
which can be captured in the form of a frequency
distribution of the brightness values in this
particular image. Here I have shown an image
of a scanned page we can say this classes
of images are document images. Once again
in this image also you have those brightness
values at every pixel.
And as you can see there are two types of
pixels are there mostly, one is one is a text
it depicts a text of the document the other
one belongs to background. Usually in the
histogram you should have found the this you
should have obtained, a bimodal kind of characteristics.
But in this case since you get so many white
pixels, so it is more skewed and particularly
the distribution in the text zone it looks
little flat.
So, we will come to this point know how this
could be processed further to make it more
bimodal, but presently let us consider this
is the let us concentrate on this fact that
a. An image histogram is nothing, but the
frequency distribution of this brightness
value. And from this frequency distribution
you can get the probability distribution you
can convert it into a probability distribution
of the brightness value.
If I normalize the histogram; that means,
all this frequencies should be divided by
the total number of pixels of the image, then
you get the probability distribution of the
brightness values.
So, one of the problem of document image analysis
is to separate the foreground from background
and this process is called binarization process.
So, and one of the simple technique of binarization
is using a threshold value to declare that,
whether a pixel is foreground or whether it
is background ah.
So, in our present context the example what
I have given here foreground is the dark pixels
and background is the bright pixels; so which
are white region of the document. And pixels
after binarization, it could be set to one
of the two values for example, we can consider
255 represents the white region and 0 represents
the text portions or dark pixels say this
is what we can consider.
One of the simple algorithms at of this you
know binarization could be as follows. Say,
you can choose a threshold value T that is
one of the brightness value in that intervals
some value in the brightness interval. And
a pixel greater than T is set to 255 otherwise
it is set to 0.
So, this is a very simple algorithm and let
us see, what is the affect of this algorithm.
Say, you consider that know this document
and this is represented this is displaying
the particular a histogram. Let us consider
a particular value say 156 where you perform
this thresholding and then you get an image
like this. So, you see that there are only
two types of pixels, pixels with the value
0 and pixels with value 255 in this case when
your threshold value is 156.
If I choose another value say 192 you get
also another kind of binarized image; and
you can see the difference between these two
images. So, if the threshold value is higher
than you get more foreground pixels your text
becomes sharper here, but then there are more
spurious noise in your in your document also
which is not desirable. So, what is the optimum
threshold value? What is a desirable threshold
value which will make my text sharper which
will look the foreground also sharper or they
take the proper program pixels and also it
should not contain the it should remove the
noisy part of the pixels also.
So, this kind of manual choice of thresholding
may not help when you are trying to process
various documents. And one of the objective
would be that to automate this operation of
this thresholding ah.
So, one of the techniques, that I would be
discussing here, is a Bayesian classification
of foreground and background pixels of a particular
image. In this case we consider that our histogram
of the image is a bimodal histogram, a schematic
diagram is shown here it is. Say, there are
two modes, there are two peaks in this histogram
and we our assumption is that most of the
pixels which are around this mode around its
particular peak, those pixels are they are
coming from the foreground pixels.
And the and the pixels, which are coming from
the background they are centered around this
particular mode. So, we consider there are
two classes of pixels and these are the know
symbols of this class these are the notations
of this class in this case say it has considered
w 1 and w 2 just for the abstract representation
of this problem.
So, what we need to do in this case? We need
to compute the probability of a class w 1
given x and probability of a class w 2 given
x. So, this is because you know in Bayesian
classification rule, we can we assign the
pixel x to class w1, if the probability of
w 1 given x is greater than probability of
class w 2 given x otherwise we assign it to
w 2 that is a base Bayes classification rule.
So, how we can compute this know this particular
probability, which is called incidentally
posterior probability. In this case, so we
can apply Bayes theorem there and in the Bayes
theorem you can see that in this case I have
described the theorem.
So, consider this pixel x, so probability
of a class given that pixel x, can be computed
from this three quantities. So, this is probability
of the class itself then probability of x
given this class and divided by the probability
of x, so this is the Bayes theorem. And it
is simpler to compute particularly this quantity
is easier to compute then this quantity directly
because this is called likelihood. And we
can assume that the pixels which are around,
this part they form a distributions which
are coming a class distribution which are
coming from the foreground pixels and can
assume they are Gaussian distribution.
Similarly, the pixels, so this is this is
the probability distribution of the pixel
x, given that they are coming from class w
1. And similarly for the background class
also we can consider another probability distributions,
for the background class and that would be
the probability distribution of x given w
2. So, these probabilities are called likelihood
that could be easily that could be computed
easily rather than computing this directly.
And also you can compute probability of omega
itself class probabilities, if I assume that
know some threshold value is chosen then these
proportional areas can give me those two probabilities.
I will describe it in the subsequent slide,
but what is interesting to note that actually
you may not use this probability of x at all
in this computations. Because after computing
this two you need to compare only this values
that would, because this is proportional to
this values given a particular x p x is you
know already given by the data itself.
So, let us see how these computations can
be carried out? And there is a algorithm by
which we determine this thresholds, we call
this algorithms as expectation maximization
algorithm. So, let me explain this algorithm
here. So, let us consider that histogram once
again histogram of the image or probability
distribution of the pixels. And let us assume
and a threshold value initially say at this
point. So, this value divides the intervals
this brightness interval into two halves.
So, one we can consider this half belongs
to say foreground region and this half belongs
to a background region. And so, this is a
representation for the foreground part and
say this is a representation of background
part. So, what we can do? Given this threshold
this threshold we can compute probability
of w 1 and probability of w 2 why computing
the areas area of this part and also areas
of this part and take the proportional know
areas of which region to compute that values.
So, this is how the probability of class probabilities
could be computed, ones given a threshold
value. And then, after that we can consider
only concentrate only this values these classes
only and from there we can compute the parameters
of say probability of x given w 1 by assuming
it Gaussian. And similarly we can also compute
parameters of probability of x given w 2 by
assuming it Gaussian.
So, if I relook at the Gaussian distributions
function, it is a Gaussian distribution function
as you can see there are two parameters; one
is mu which is a mean of the distribution
another one is sigma which is the standard
deviation of this distribution. So, what you
need to do in this case to compute the probability
this likelihood probability, you just simply
you need to compute this parameters then you
can compute the probability of any value x
given those parameters. So, let us assume
that we call the parameters corresponding
parameters for the class w 1 as mu 1 and sigma
2 and corresponding parameters of w 2 as mu
2 and sigma 2.
So, next what we will do that we will be considering
the and this is how the corresponding parameters
are computed in this case. You can see probability
of w 1 is computed as the as the correspond
as the area of p x from its a summation from
0 to threshold probability of w 2 is a is
just its 1 minus p w 1 because it is a complimentary
part of the area. And then this is the main
from this region and this is a standard deviation
standard deviation from this region. And similarly
mu 2 is a main from this region and sigma
2 squared that is a variance or sigma 2 is
a standard deviation which is from this region
and this is how we are computing.
So, we are computing the variances with this
of this values and main of these values. And
these are all simple mathematical arithmetic
expressions of weighted means and weighted
variances. So, if you look at the statistics,
group of statistics it will be very clear
how these values are computed by this expressions
are there. So, ones we get these values then
we are determining a new threshold value such
that, the probability of w 1 by x is greater
than probability of w 2 x.
So, as soon as it becomes less, then we choose
that threshold values, still choose that threshold
value and that value would be a new value.
So, we expected this should be a threshold
value, but after computing this parameter
after maximizing the probabilities of occurrences
of these pixels then we found there is a better
threshold value which will be giving in a
bettered better probabilistic occurrences
of this observations. So, we iterate this
process, so that would be my new threshold
value and we will be iterating this process
till the process is converged. So, this is
what is your Bayesian classification based
binarization method.
And there is another method also we can consider
here, which is almost similar and which also
defines an optimization function by which
you can get the threshold value. So, this
optimization function is the between class
variance of the particular two classes. So,
between class variances are also as you can
see it is defined by those by those parameters
which I have discussed earlier. So, this is
a class probability of w 1 this is a class
probability of w 2 and these are the means
of each classes.
So, given a threshold value I can compute
this sigma square B, like probability of w
1 from this part probability of w 2 from this
part then mu 1 can be computed from here and
mu 2 can be computed from here. So, you consider
that you are computing thee value at every
pixel value from in your intervals say 0 to
255 for example, and you are you consider
the that pixel value where the between class
variance is maximum and that has been that
you consider as your threshold value. So,
this thresholding principle this thresholding
technique is proposed by Otsu and it is known
as Otsu thresholding technique.
So, the example of these particular processing,
you can see that with that particular document
image. We have computed this Otsu thresholds
and which is 157 in this case and it gives
know this kind of image. And, if I consider
the Bayesian thresholds, then we find an another
image incidentally though the threshold values
are same you can see that there is little
bit of difference between these two images
though quality of this two images are almost
similar; this difference happens because we
would like to make this histogram bimodal.
So, before applying Bayesian classification,
we process the image so that, the histogram
has a sharper modes also in the foreground
zone. So, how we are processing it I will
discuss in the you know next part.
So, in this case, so this is what is the method
which I was referring at it is a contrast
enhancement method and here the concept of
pixel mapping is used. The pixel mapping concept
is that you have an input pixel and which
will be mapped to an output pixel in such
a way that dynamic range of the input would
be expanded. That means, suppose in this case
dynamic range is from 0 to this value which
could be say half of the interval approximately.
But in the output we are converting that dynamic
range from 0 to 255 that makes the pixels
sharper I mean contrast sharper. And one of
the property of course, that you need to mention
that you need to you know preserve that this
function has to be monotonically increasing
because, if you have two pixels x 2 and x
1 and x 1 is higher than x 2 and corresponding
y 1 also should be higher than y 2. So, that
keeps a consistency of displaying you know
brighter pixel brighter and darker pixel darker,
so that is why you require this property.
One of the popular function which is used
in particular in this case, is this function
from the probability distribution of the you
know pixel itself. So, this is a cumulative
distribution and you are scaling it by 255
assuring the range would be from 0 to 255.
So, if I do this operation we can see that,
we get a contrast rate image where the features
are more visible in this are more prominent
here. You can also look at this histogram
this histogram has similar shape, but the
dynamic range has been expanded modes are
more clearly visible. In fact, this is a technique
what I was referring at this is a technique
we applied in the document which has been
processed for binarization. So, with this
let me stop here, this is a first part of
this particular talk, we will start further
we will go for the next part in the next lecture.
Thank you very much for your listening.
Keywords: Images, projection, histogram, thresholding,
expectation maximization, Bayesian, equalization