19
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
19
pages
English
Documents
Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres
Publié par
Langue
English
Fixed-Point Representation
& Fractional Math
By Erick L. Oberstar
©2004-2007 Oberstar Consulting
Revision 1.2
Released August 30, 20071
Table of Contents
Table of Contents............................................................................................................................ 1
Summary......................................................................................................................................... 2
1. Fixed-Point Representation..................................................................................................... 2
1.1. Fixed-point Range - Integer Portion ............................................................................... 3
1.1.1. Fixed-point Range for Unsigned Integers................................................................... 3
1.1.2. Fixed-point Range for Signed Integers....................................................................... 5
1.1.3. Comments/Conclusions ............................................................... 9
1.2. Fixed-point Resolution - Fractional Portion 9
1.3. Range & Resolution - Putting Them Together ............................................................. 10
1.4. Scaling A Floating-point Number To Fixed-point........................................................ 11
2. Math With Eight Bit Examples............................................................................................. 12
2.1. Q1.7 Format .................................................................................................................. 12
2.2. Q2.6 Format 12
2.3. Addition - Q1.7+Q2.6=Q2.6 Format ............................................................................ 12
2.4. Multiply - Q1.7xQ2.6=Q3.13 Format........................................................................... 13
2.5. An Example Using Real Numbers................................................................................ 14
3. Implementation Caveats........................................................................................................ 15
3.1. Computing QI & QI ≤0.................................................................................................. 15
3.2. Addition ........................................................................................................................ 16
3.3. Multiplication................................................................................................................ 17
3.4. General Caveats & Example C Code 17
4. References............................................................................................................................. 18 2
Summary
In a majority of the commercially available processors on the market today there is no hardware
support for floating-point arithmetic due to the cost the extra silicon imposes on a processor’s
total cost. In fact a large portion of processors do not even have hardware support for integer
multiplication. This necessitates software emulation for floating-point arithmetic and possibly
even software emulation for computing integer multiplications. This software overhead can
significantly limit the rate at which algorithms can be executed.
By implementing algorithms using fixed-point (integer) mathematics, a significant improvement
in execution speed can be observed because of inherent integer math hardware support in a large
number of processors, as well as the reduced software complexity for emulated integer multiply
and divide. This speed improvement does come at the cost of reduced range and accuracy of the
algorithms variables. The purpose of this paper is to investigate the issues relating to algorithm
implementation utilizing fixed-point rather than floating-point mathematics.
The Q[QI].[QF] format fixed-point number format analyzed in this paper is broken down in
subsequent sections into integer and fractional content for the purpose of study and
understanding. The separate sections on integer and fractional content are subsequently
combined to provide an overall understanding of the nature of Q[QI].[QF] format fixed-point
numbers.
1. Fixed-Point Representation
To more accurately construct an algorithm, double or single precision floating-point data and
coefficient values should be used. However there is significant processor overhead required to
perform floating-point calculations resulting from the lack of hardware based floating-point
support. In some cases such as with lower powered embedded processors there is not even
compiler support for double precision floating-point numbers. Floating-point overhead limits the
effective iteration rate of an algorithm.
To improve mathematical throughput or increase the execution rate (i.e. increase the rate the
algorithm could be repetitively run), calculations can be performed using two’s complement
signed fixed-point representations. Fixed-point representations require the programmer to create
a virtual decimal place in between two bit locations for a given length of data (variable type).
For the purposes of this paper the notion of a Q-point for a fixed-point number is introduced.
This labeling convention is as follows:
Q[QI].[QF]
Where QI = # of integer bits & QF = # of fractional bits
3
The number of integer bits (QI) plus the number of fractional (QF) bits yields the total number of
bits used to represent the number. Sum QI+QF is know as the Word Length (WL) and this sum
usually corresponds to variable widths supported on a given processor. Typical word lengths
would be {8,16,32} bits corresponding to {char, int, long int} C/C++ variable types commonly
implemented in compilers for microcontrollers or DSPs.
For example: a Q3.5 number would be an 8-bit value with three integer bits and five fractional
bits. For signed integer variable types we will include the sign bit in QI as it does have integer
weight albeit negative in sign. WL varies over processors and integer type names can infer
different word lengths in various tool chains (i.e some compilers treat int as 16-bit, some as 32-
bit) [ISO/IEC 9899:TC2]. Therefore, for the purpose of this paper the previously referenced
word lengths / type names are implied and used.
The Q[QI].[QF] format fixed-point number format is broken down in subsequent sections into
integer and fractional content for the purpose of study and understanding. The separate sections
on integer and fractional content are subsequently combined to provide an overall understanding
of the nature of Q[QI].[QF] format fixed-point numbers.
1.1. Fixed-point Range - Integer Portion
To represent a floating-point number in fixed-point a floating-point number needs to be viewed
as two distinct parts, the integer content, and the fractional content. The integer range of a
floating-point variable (i.e. its Min to Max range) in an algorithm sets the number of bits (QI)
required to represent the integer portion of the number. Keep in mind that QI itself can only hold
integer values because of the binary nature of a bit – it exists or doesn’t.
There are two different methods of computing the number of integer bits required (QI) for each
type of number, unsigned and signed.
1.1.1. Fixed-point Range for Unsigned Integers
This relationship for unsigned numbers (positive only) is defined by the minimum and maximum
of any QI-bit number shown in the following equation:
QI02≤ α≤−1( )
Equation 1
Method 1:
Solving Equation 1 for the required number of bits QI:
QI≥+ceiling log α 1 ( ( ))24
QI = ceiling log α +1 ( ( ))2
Equation 2
where α is the floating-point variable being ranged & ceiling rounds towards + ∞.
Note: The log () value in Equation 2 can never be negative which implies that QI is always ≥ 1. 2
The benefit of QI < 1 is addressed later in this paper.
For example an unsigned (positive only) variable α = 5.4321:
QI=+ceiling log 5.4321 1= ceiling 2.6835= 3( ) ( ) ()2
Example 1
∴ 3 bits are required for the integer portion of α
As sanity check, verify α less than the maximum 3-bit unsigned number.
3 5.4321 ≤ 2 −=1 7 - Yes!
Method 2:
Although the previous method above (Method 1) is one possible way of computing QI for
unsigned values, there is another way that is arguably better, especially when dealing with
numbers that could be much smaller than |1| that must be implemented on standard variable
types.
Taking our initial bounding inequality Equation 1:
QI02≤ α≤−1( )
The inequality can be rewritten:
QI02≤<α
Equation 3
Note: The upper boundary conditions changes from ≤ to < and the boundary value changes to
one integer count higher that the maximum QI-bit unsigned number.
Solving Equation 3 for QI for the constraint:
QIα < 2
log α < QI()25
QI > log α ( ( ))2
Equation 4
Knowing that QI is an integer number of bits we can create an equation to compute a QI that
satisfies constraint Equation 4 by adding 1 and truncating the result (rounding tward