Next: Sample Programs...
Up: Main Previous: Numerical Errors
Computer Arithmetic
The most common computer arithmetic are integer arithmetic and floating point arithmetic. Now these arithmetic systems will be briefly discussed.
Integer Arithmetic :
The result of any integer arithmetic operation is always an integer.
The range of integers that can be represented on a given computer is finite.
The result of an integer division is usually given as a quotient.
The remainder is truncated as fractional quantities which cannot be represented
under the integer representation.
Eg:
Remark:
(1) Simple rules like
,
where
are integers may not hold under
computer integer arithmetic due to the truncation of the
remainder.
Floating Point Arithmetic:
In the floating point arithmetic all the numbers are stored and
processed in normalized exponential form . Firstly the process of
addition under floating point arithmetic will be discussed.
Addition under Floating Point Arithmetic:
Let and be the two numbers to be added and be the result. The normalized floating point representation of and are , , respectively. The rules for carrying out the addition are as follows :
(a) Set = maximum .
Say then .
b) Right shift by places, so that the exponent of are the same and call it
c) Set
d) Normalize and let be its normalized representation.
e) Set
E.g : Add the numbers and
a)
b) on right shifting by 3 we get
c)
d) which is already in normalized form
i.e ,
e)
Remark: Substraction is nothing but addition of numbers with different signs.
Multiplication Under Floating Point Arithmetic:
If , are two real numbers in normalized form then their product
E.g : Say
, then
Since
is
already in normalized form ,
.
Remark:
(1)
(after
normalization)
During the floating point arithmetic mantissa 'M' may be truncated due to the limitation on the number of bits available for its representation on a computer.
(2) Floating point arithmetic is prone to the following errors:
a) Errors due to inexact representation of a decimal
number in binary form. For example
. Since binary equivalent of
has a repeating fraction, it has to be terminated at
some point.
b) Error due to round-off-effect
c) Subtractive cancellation : It is possible that some mantissa
positions are unspecified. These unspecified positions may be
arbitrarily filled by the computer.This may lead to serious loss
of significance when two nearly equal numbers are subtracted.
For example if
and
then
has only one significant digit. However the mantissa will have
provision to store more number of significant digits, which may
get arbitrarily filled as they may be specified. Further if the
operands themselves are approximate representation due to this
non-specification problem the overall loss of significance will
get serious.
d) Basic laws of arithmetic such as associative, distributive may
not be satisfied i.e
(3) Numerical computation involves a series of computations
consisting of basic arithmetic operation. There may be round-off
or truncation error at every step of the computation. These
errors accumulate with the increasing number of computations in a
process. There can be situations where even a single operation
may magnify the roundoff errors to a level that completely ruins
the result.
A computation process in which the cumulative effect of all input errors is grossly magnified is said to be numerically unstable. It is important to understand the conditions under which the process is likely to be 'sensitive' to input errors and become unstable. Investigations to see how small changes in input parameters influence the output are termed as sensitivity analysis.
(4) Roundoff and truncation errors effect on the final numerical result may be reduced by
a) Increasing the significant figures of the computer either
through hardware or through software manipulations.For instance
one may use double precision for floating point arithmetic
operations.
b) Minimizing the number of arithmetic operations.
Here one may try to rearrange a formula to reduce the number of
arithmetic operations.
For example in the evaluation of a polynomial ,
it may be rearranged as
which requires less arithmetic operations.
c)A formula like
may be replaced by to avoid
substractive cancellation
d) While finding the sum of set of numbers, arrange the set so
that they are in ascending order of absolute value. i.e when
then is better than .
5) It may not be possible to simultaneously reduce both the truncation and round-off error effects on the final result of a numerical computation. For instance in an iterative procedure when one tries to reduce the round-off error by increasing the step size , it may lead to higher truncation error and vice-versa. Hence proper care has to be taken to reduce both the errors simultaneously.
Next: Sample Programs... Up: Main Previous: Numerical Errors