User’s Manual for Keil Compiler, V 1.2, November 2003 XC166Lib A DSP Library for XC16x Family Microcontrollers N e v e r s t o p t h i n k i n g . Edition 2003-11 Published by Infineon Technologies AG 81726 München, Germany © Infineon Technologies AG 2006. All Rights Reserved. LEGAL DISCLAIMER THE INFORMATION GIVEN IN THIS APPLICATION NOTE IS GIVEN AS A HINT FOR THE IMPLEMENTATION OF THE INFINEON TECHNOLOGIES COMPONENT ONLY AND SHALL NOT BE REGARDED AS ANY DESCRIPTION OR WARRANTY OF A CERTAIN FUNCTIONALITY, CONDITION OR QUALITY OF THE INFINEON TECHNOLOGIES COMPONENT. THE RECIPIENT OF THIS APPLICATION NOTE MUST VERIFY ANY FUNCTION DESCRIBED HEREIN IN THE REAL APPLICATION. INFINEON TECHNOLOGIES HEREBY DISCLAIMS ANY AND ALL WARRANTIES AND LIABILITIES OF ANY KIND (INCLUDING WITHOUT LIMITATION WARRANTIES OF NON-INFRINGEMENT OF INTELLECTUAL PROPERTY RIGHTS OF ANY THIRD PARTY) WITH RESPECT TO ANY AND ALL INFORMATION GIVEN IN THIS APPLICATION NOTE. Information For further information on technology, delivery terms and conditions and prices please contact your nearest Infineon Technologies Office (www.infineon.com). Warnings Due to technical requirements components may contain dangerous substances. For information on the types in question please contact your nearest Infineon Technologies Office. Infineon Technologies Components may only be used in life-support devices or systems with the express written approval of Infineon Technologies, if a failure of such components can reasonably be expected to cause the failure of that life-support device or system, or to affect the safety or effectiveness of that device or system. Life support devices or systems are intended to be implanted in the human body, or to support and/or maintain and sustain and/or protect human life. If they fail, it is reasonable to assume that the health of the user or other persons may be endangered. User’s Manual for Keil Compiler, V 1.2, November 2003 XC166Lib A DSP Library for XC16x Family G ua n gy u W a n g AI MC MA TM (Munich, Germany) Microcontrollers N e v e r s t o p t h i n k i n g . XC166Lib Revision History:2003-11V 1.2 Previous Version: 2002-02 V 1.0, 2002-09 V1.1 Page Subjects (major changes since last revision) We Listen to Your Comments Any information within this document that you feel is wrong, unclear or missing at all? Your feedback will help us to continuously improve the quality of this document. Please send your proposal (including a reference to this document) to: [email protected] DSP Library for XC16x Family Page Table of Contents 1 1.1 1.2 1.3 1.4 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Introduction to XC166Lib, a DSP Library for XC16x . . . . . . . . . . . . . . . . . Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Future of the XC166Lib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 14 14 14 14 2 2.1 2.2 2.3 2.4 Installation and Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XC166Lib Content . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing XC166Lib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building XC166Lib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Source Files List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 16 16 16 17 3 3.1 3.2 3.3 3.4 3.4.1 3.4.2 3.4.3 3.4.4 3.4.5 DSP Library Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XC166Lib Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Calling a DSP Library Function from C Code . . . . . . . . . . . . . . . . . . . . . . XC166Lib Example Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XC166Lib Implementation - A Technical Note for Keil Compiler . . . . . . . . Memory Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Memory Models for Keil C Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . Optimization Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cycle Count . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Testing Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 19 19 19 19 19 20 21 24 24 4 4.1 4.1.1 4.2 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.3 4.3.1 4.3.2 4.3.3 4.3.4 4.4 4.4.1 4.4.2 4.4.3 4.4.4 4.4.5 Function Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Argument Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Arithmetic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Number Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Complex Number Schematic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Transversal Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Lattice Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Multirate FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct Form 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Direct Form 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cascaded Biquad IIR Filter with Direct Form 2 . . . . . . . . . . . . . . . . . . . Cascaded Biquad IIR Filter with Direct Form 1 . . . . . . . . . . . . . . . . . . . Lattice IIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 25 25 27 27 27 27 28 29 29 40 40 41 42 44 73 73 73 75 75 76 User’s Manual for Keil Compiler I-1 V 1.2, 2003-11 DSP Library for XC16x Family 4.4.6 4.5 4.5.1 4.5.2 4.6 4.6.1 4.6.2 4.6.3 4.6.4 4.6.5 4.7 4.7.1 4.7.2 4.7.3 4.7.4 4.8 4.8.1 4.9 4.10 4.10.1 4.10.2 4.10.3 Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Adaptive Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Delayed LMS algorithm for an adaptive filter . . . . . . . . . . . . . . . . . . . 112 Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 Fast Fourier Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 Radix-2 Decimation-In-Time FFT Algorithm . . . . . . . . . . . . . . . . . . . . 126 Radix-2 Decimation-In-Frequency FFT Algorithm . . . . . . . . . . . . . . . . 128 Complex FFT Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Calculation of Real Forward FFT from Complex FFT . . . . . . . . . . . . . 133 Calculation of Real Inverse FFT from Complex FFT . . . . . . . . . . . . . . 134 C166S V2 Core Implementation Note . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Organization of FFT functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 Implementation of Real Forward FFT . . . . . . . . . . . . . . . . . . . . . . . . . 136 Implementation of Real Inverse FFT . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 Matrix Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 Mathematical Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 Statistical Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 Implementation Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212 5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 User’s Manual for Keil Compiler -2 V 1.2, 2003-11 DSP Library for XC16x Family Page List of Figures Figure 4-1 Figure 4-2 Figure 4-3 Figure 4-4 Figure 4-5 Figure 4-6 Figure 4-7 Figure 4-8 Figure 4-9 Figure 4-10 Figure 4-11 Figure 4-12 Figure 4-13 Figure 4-14 Figure 4-15 Figure 4-16 Figure 4-17 Figure 4-18 Figure 4-19 Figure 4-20 Figure 4-21 Figure 4-22 Figure 4-23 Figure 4-24 Figure 4-25 Figure 4-26 Figure 4-27 Figure 4-28 Figure 4-29 Figure 4-30 Figure 4-31 Figure 4-32 Figure 4-33 Figure 4-34 Figure 4-35 Figure 4-36 Figure 4-37 Figure 4-38 Figure 4-39 Figure 4-40 Figure 4-41 Figure 4-42 Figure 4-43 The Complex Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 16-bit Complex number representation . . . . . . . . . . . . . . . . . . . . . . . . 29 Complex Number addition for 16 bits. . . . . . . . . . . . . . . . . . . . . . . . . . 31 Complex number subtraction for 16 bits . . . . . . . . . . . . . . . . . . . . . . . 34 Complex number multiplication for 16 bits . . . . . . . . . . . . . . . . . . . . . . 36 32 bit real number multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Block Diagram of the FIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 Lattice FIR Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 Block Diagram of FIR Decimation Filter . . . . . . . . . . . . . . . . . . . . . . . . 43 Equivelent Implementation of FIR Decimation Filter . . . . . . . . . . . . . . 43 Block Diagram of FIR Interpolation Filter . . . . . . . . . . . . . . . . . . . . . . . 44 Equivelent Implementation of FIR Interpolation Filter . . . . . . . . . . . . . 44 Fir_16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Fir_32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Fir_cplx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 Fir_sym . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Fir_lat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 Fir_dec . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Fir_inter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 Canonical Form (Direct Form 2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 Cascaded Biquad IIR Filter with Direct Form 2 Implementation . . . . . 75 Cascaded Biquad IIR Filter with Direct Form 1 Implementation . . . . . 76 Lattice IIR Filter Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 IIR_1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 IIR_2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 IIR_bi_1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 IIR_bi_2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 IIR_bi2_32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 IIR_bi_24 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 IIR_lat. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 Adaptive filter with LMS algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 Adap_filter_16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 Adap_filter_32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 Complexity Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 8-point DIT FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 Alternate Form of 8-point DIT FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 8-point DIF FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 Alternate Form of 8-point DIF FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . 129 8-point DIT complex FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130 Butterfly of Radix-2 DIT complex FFT . . . . . . . . . . . . . . . . . . . . . . . . 130 8-point DIF complex FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Butterfly of Radix-2 DIF complex FFT . . . . . . . . . . . . . . . . . . . . . . . . 132 Bit_reverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139 User’s Manual for Keil Compiler II-1 V 1.2, 2003-11 DSP Library for XC16x Family Figure 4-44 Figure 4-45 Figure 4-46 Figure 4-47 Figure 4-48 Figure 4-49 Figure 4-50 Figure 4-51 Figure 4-52 Figure 4-53 Figure 4-54 Figure 4-55 Figure 4-56 Figure 4-57 Figure 4-58 Figure 4-59 Figure 4-60 Figure 4-61 FloatTo1Q15 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real_DIT_FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Real_DIF_IFFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Q15toFloat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix_mul . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matrix_trans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Sine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . P_series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Windowing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Division of two 1Q15 fractional inputs . . . . . . . . . . . . . . . . . . . . . . . . Division of 1Q31 fractional by 1Q15 fractional . . . . . . . . . . . . . . . . . . Raw autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biased autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unbiased autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Raw cross-correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Biased cross-correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Unbiased cross-correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User’s Manual for Keil Compiler -2 142 146 152 155 160 164 169 172 175 178 180 190 193 196 200 204 208 212 V 1.2, 2003-11 DSP Library for XC16x Family Page List of Tables Table 2-1 Table 2-2 Table 3-1 Table 4-1 Directory Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Source files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XC166Lib Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Argument Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . User’s Manual for Keil Compiler III-1 16 17 19 25 V 1.2, 2003-11 DSP Library for XC16x Family Preface This is the User Manual of version 1.2 for XC166Lib - a DSP library for Infineon XC16x microcontroller family. To make the user easy by using the XC166Lib we provide a separate User Manual for each compiler to describe the implementations. This User Manual describes the implementations for Keil compiler. XC16x microcontroller family is the most recent generation of the popular C166 microcontroller families. The core of XC16x family is C166S V2 Core that combines high performance with enhanced modular architecture. Impressive DSP performance and advanced interrupt handling with fast context switching make XC16x the instrument of choice for powerful applications. This manual describes the implementation of essential algorithms for general digital signal processing applications on the XC16x using Keil compiler. For Keil compiler the DSP library is developed mainly in inline assembly and most of the source code files is stored in c files. The DSP library can be used as a library of basic functions for developing bigger DSP applications on XC16x microcontroller. The library serves as a user guide and a reference for XC16x microcontroller DSP programmers. It demonstrates how the processor’s architecture can be exploited for achieving high performance. The various functions and algorithms implemented and described are: • • – – – • – – • • • – – – Arithmetic Functions Filters FIR IIR Adaptive filters Transforms FFT IFFT Matrix Operations Mathematical Operations Statistical Functions Auto-correlation Cross-correlation Convolution Each function is described in detail under the following headings: Signature: This gives the function interface. Inputs: Lists the inputs to the function. User’s Manual for Keil Compiler -10 V 1.2, 2003-11 DSP Library for XC16x Family Outputs: Lists the output of the function if any. Return: Gives the return value of the function if any. Implementation Description: Gives a brief note on the implementation, the size of the inputs and the outputs, alignment requirements etc. Pseudocode: The implementation is expressed as a pseudocode using C conventions. Techniques: The techniques employed for optimization are listed here. Register Usage: Lists the registers that are used for parameter transfer from C to Assembly or inverse. Assumptions: Lists the assumptions made for an optimal implementation such as constraint on DPRAM. The input output formats are also given here. Memory Note: A detailed sketch showing how the arrays are stored in memory, the alignment requirements of the different memories, the nature of the arithmetic performed on them. The diagrams give a great insight into the actual implementation. Further, the path of an Example calling program, the Cycle Count and Code Size are given for each function. User’s Manual for Keil Compiler -11 V 1.2, 2003-11 DSP Library for XC16x Family Organization Chapter 1, Introduction, gives a brief introduction of the XC166Lib and its features. Chapter 2, Installation and Build, describes the XC166Lib content, how to install and build the XC166Lib. Chapter 3, DSP Library Notations, describes the DSP Library data types, arguments, calling a function from the C code and the assembly code, and the implementation notes. Chapter 4, Function Descriptions, describes the arithmetic functions, FIR filters, IIR filters, Adaptive filters, Fast Fourier Transforms, Matrix operations and Mathematical operations. Each function is described with its signature, inputs, outputs, return, implementation description, pseudocode, techniques used, registers used for parameter transfer, assumptions made, memory note, example, cycle count and code size. Chapter 5, gives the list of related references. Acknowledgements All source codes in XC166Lib have been designed, developed and tested using Tasking and Keil Tool chains. We in advance would like to acknowledge users for their feedback and suggestions to improve this product. Guangyu Wang XC166Lib Developer - Infineon User’s Manual for Keil Compiler -12 V 1.2, 2003-11 DSP Library for XC16x Family Acronyms and Definitions Acronyms Definitions DIT Decimation-In-Time DIF Decimation-In-Frequency LMS Least Mean Square DSP Digital Signal Processing XC166Lib DSP Library functions for XC16x FFT Fast Fourier Transform FIR Finite Impulse Response IIR Infinite Impulse Response Documentation/Symbol Conventions The following is the list of documentation/symbol conventions used in this manual. Documentation/Symbol Conventions Documentation/Sym- Description bol convention Courier Pseudocode Times-italic File name User’s Manual for Keil Compiler -13 V 1.2, 2003-11 DSP Library for XC16x Family Introduction 1 Introduction 1.1 Introduction to XC166Lib, a DSP Library for XC16x The XC166Lib, a DSP Library for XC16x microcontroller is C-callable, hand-coded assembly, general purpose signal processing routines. The XC166Lib includes commonly used DSP routines. The throughput of the system using the XC166Lib routines is considerably better than those achieved using the equivalent code written in ANSI C language. The XC166Lib significantly helps in understanding the general purpose signal processing routines, its implementation on XC16x microcontroller family. It also reduces the DSP application development time. Furthermore, The XC166Lib is also a good reference for XC16x microcontroller DSP programmer. The routines are broadly classified into the following functional categories: • • • • • • • • Arithmetic functions FIR filters IIR filters Adaptive filters Fast Fourier Transforms Matrix operations Mathematical operations Statistical functions 1.2 • • • • • • • Features Common DSP algorithms with source codes Hand-coded and optimized assembly modules with CoMAC instructions C-callable functions on Keil compiler Multi platform support - Win 95, Win 98, Win NT Examples to demonstrate the usage of functions Example input test vectors and the output test data for verification Complete User’s manual covering many aspects of implementation 1.3 Future of the XC166Lib The planned future releases will have the following improvements. • Expansion of the library by adding more functions in the domain of generic core routines of DSP. • Upgrading the existing 16 bit functions to 32 bit 1.4 Support Information Any suggestions for improvement, bug report if any, can be sent via e-mail to User’s Manual for Keil Compiler 1-14 V 1.2, 2003-11 DSP Library for XC16x Family Introduction [email protected]. Visit www.infineon.com /C166DSPLIB for update on XC166Lib releases. User’s Manual for Keil Compiler 1-15 V 1.2, 2003-11 DSP Library for XC16x Family Installation and Build 2 Installation and Build 2.1 XC166Lib Content The following table depicts the XC166Lib content with its directory structure. Table 2-1 Directory Structure Directory name Contents XC166Lib Directories which has all the files related None to the XC166Lib Tasking Directories which has all the files related None to the XC166Lib for Tasking compiler Keil Directories which has all the files related None to the XC166Lib for Keil compiler Source Directories of source files. Each directory *.c has respective assembly language *.asm *.a66 implementation files of the library functions Include Directory and common include files for ’C’ of the Keil compiler DspLib_Keil.h Docs User’s Manual for Keil compiler *.pdf Examples Example directories . Each directory *.c contains example "c" functions to depict the usage of XC166Lib. 2.2 Files Installing XC166Lib XC166Lib is distributed as a ZIP file. To install the XC166Lib on the system, unzip the ZIP file and extract them to the defined directory. The directory structure is as given in “XC166Lib Content” on Page 2-16 2.3 Building XC166Lib Include the DspLib_Keil.h into your project and also include the same into the files that need to call the library function like: #include “DspLib_Keil.h” Now include the respective source files for the required functionality into your project. Refer the functionality table, Table 2-2. User’s Manual for Keil Compiler 2-16 V 1.2, 2003-11 DSP Library for XC16x Family Installation and Build Build the system and start using the library. 2.4 Source Files List Table 2-2 Source files Complier: Keil Arithmetic Functions CplxAdd_16.c CplxSub_16.c CplxMul_16.c Mul_32.c FIR Filters Fir_16.c Fir_32.c Fir_cplx.c Fir_dec.c Fir_inter.c Fir_lat.c Fir_sym.c IIR Filters IIR_1.c IIR_2.c IIR_bi_1.c IIR_bi_2.c IIR_bi_24.c IIR_bi2_32.c IIR_lat.c Adaptive Filters Adap_filter_16.c Adap_filter_32.c FFT Bit_reverse.c real_DIT_FFT.a66 real_DIF_IFFT.a66 Matrix Operations User’s Manual for Keil Compiler 2-17 V 1.2, 2003-11 DSP Library for XC16x Family Installation and Build Table 2-2 Source files Matrix_mul.c Matrix_trans.c Mathematical Operations Power_series.c Windowing.c Sine.c div_q15.c div_q31.c Sqrt.c Statistical Functions Auto_raw.c Auto_bias.c Auto_unbias.c Cross_raw.c Cross_bias.c Cross_unbias.c Convol.c Miscellaneous FloatTo1Q15.c Q15toFloat.c User’s Manual for Keil Compiler 2-18 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations 3 DSP Library Notations 3.1 XC166Lib Data Types The XC166Lib handles the following fractional data types. Table 3-1 XC166Lib Data Types 1Q15 (DataS) 1Q15 operand is represented by a short data type that is predefined as DataS in header files DspLib_Keil.h. 1Q31 (DataL) 1Q31 operand is represented by a long data type that is predefined as DataL in header files DspLib_Keil.h. CplxS Complex data type that contains the two 1Q15 data arranged in Re-Im format. CplxL Complex data type that contains the two 1Q31 data arranged in Re-Im format. 3.2 Calling a DSP Library Function from C Code After installing the XC166Lib, include a XC166Lib function in the source code as follows: 1. Choose the memory model for C compiler 2. Include the header file DspLib_Keil.h 3. Include the source file that contains required DSP functions into the project along with the other source files 4. Include the Keil compiler system files TRAPS.C and START_V2.A66 5. Set the Options for Target - Device to select an MCU with XC16x 6. Build the system 3.3 XC166Lib Example Implementation The examples of how to use the XC166Lib functions are implemented and are placed in examples subdirectory. This subdirectory contains a subdirectory for set of functions. 3.4 XC166Lib Implementation - A Technical Note for Keil Compiler 3.4.1 Memory Issues The XC16x microcontroller family uses the C166S V2 Core that is a 16 bit microcontroller core, with impressive DSP performance. There are two sets of instructions for C166S V2 Core, namely Normal Instruction Set and DSP Instruction Set (MAC-instructions). Normal instruction set is compatible with the microcontroller family C166, while the DSP instruction set is especially designed for implementing DSP algorithms. XC166Lib was User’s Manual for Keil Compiler 3-19 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations developed mainly using DSP instruction set. But the normal instruction set has been also often used in the routines, in particular, for initializing memories and registers. For each instruction set there is a different addressing mode. DSP instructions use some standard C166 addressing modes such as GPR direct and #data5 for immediate shift value. To supply the MAC instructions with up to 2 new operands in one CPU cycle, new MAC instruction addressing models have been added in C166S V2 Core. These allow indirect addressing with address pointer post-modification. Double indirect addressing requires 2 pointers, one of which can be supplied by any GPR, the other is provided by one of two Specific Function Registers (SFRs) IDX0 and IDX1. Two pairs of offset registers QR0/QR1 and QX0/QX1 are associated with each pointer (GPR or IDXi). The GPR pointer gives access to the entire memory space, whereas IDXi are limited to the internal Dual-Port RAM (DPRAM), except for the CoMOV instruction. The XC166Lib is implemented with the C166S V2 Core memory addressing architecture. The following information gives memory conditions in order to work properly. Because the specific function register IDXi is limited to the internal DPRAM, in order to use MAC instructions properly we must first initialize IDXi with the address pointed to DPRAM space from 00’F200H to 00’FE00H (3KBytes) before using MAC instructions. This means that we must locate one of operands in MAC-instructions with double indirect addressing modes in range from 00’F200H to 00’FE00H. Using Keil complier we can easily realize it through defining the variables which should be located in the DPRAM area as on-chip memory type idata, e.g. short idata x[n] ; After compiling the vector x will be automatically located in the DPRAM area because Keil compiler locates all variables with the memory type idata in the DPRAM area. Note that C166S V2 Core has defined 3 KBytes of DPRAM. However, the XC16x microcontroller family is equipped with only 2 KBytes of DPRAM in the range from 00’F600H to 00’FE00H. The limited DPRAM area can make difficulty by executing DSP algorithms on XC16x, if the size of the vector is larger than 2 KBytes, e.g. a 1024-point Fir filter. When using pointer post-modification addressing models, the address pointed to must be a legal address, even if its content is not modified. An odd value will trigger the classB hardware Trap (Illegal Word Operand Access Trap (ILLOPA)). 3.4.2 Memory Models for Keil C Compiler Just as we said, the DSP library is developed mainly for calling from a C program. Therefore, the memory modes selected by C and assembly modules must be same in order to avoid memory model conflicts. Keil tool chain supports seven memory models: tiny, small, compact, hcompact, medium, large and hlarge. The basic differences between them are: User’s Manual for Keil Compiler 3-20 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations Beside the tiny model operating in the non-segmented CPU mode the other six memory models operate in the segmented CPU mode. The variables by the tiny and small memory models are located in the near area and the functions calls generate near calls (up to 64Kb code size). The both memory modes provide the same efficient code. The difference of tiny and small memory models is that the code and data in the tiny model are limited to the first segment of 64K, while the small model allows code and data to locate anywhere in the space. The compact, hcompact and medium models operate as small model does, except that the compact model uses the memory type far for variables, the hcompact model uses the memory type huge for variables and the medium model generates function calls as far call. For both large and hlarge models the function calls are generated as far calls by default. The difference between them is that the large memory locates the variables in the far memory, while the hlarge locates the variables in the huge memory. Because the most functions in the library are written in the inline assembly, for those library functions the parameter passing will be done by the compiler automatically. So the library functions written in the inline assembly can be used in all memory models without regarding the memory types of pointers passed from C to assembly code. But for the library functions which are pure assembly module and stored in assembly file *.a66, if we want to use them in the memory models with far or huge variable memory types, we need to redefine the pointers in the function arguments as near pointers, because the library functions use only 16 bit pointers by parameter passing from C to assembly routine. There are only 16 registers R0-R15 on XC16x that can be used for programming. By calling the XC166Lib routines from C, according to Keil Compiler conventions the registers from R8 to R12 will be used to translate parameters from C to the assembly code, and the rest parameters will be translated through using stack. R4 and R5 are used to store the output values after executing the XC166Lib routines. R1-R7 can be used in the assembly routine without regard for preserving their contents. 3.4.3 Optimization Techniques DSP optimization techniques depend strongly on the core architecture. So, different cores have different optimization techniques. Furthermore, the number of tricks that can be played by a DSP programmer are endless. In the development of XC166Lib the following optimization techniques have been used. • data dependencies removing Due to the pipeline requirement of the C166S V2 CPU there are a lot of possible data dependencies between instructions using GPRs. In the C166S V2 Core the dedicated hardware is added to detect and resolve the data dependencies. However, in the C166S V2 Core none of the instructions using indirect addessing modes are capable User’s Manual for Keil Compiler 3-21 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations of using a GPR, which is to be updated by one of the two immediately preceding instructions. This means that the instruction using indirect addressing modes will lead two cycles stall. To use these two cycles for the optimization we can insert before this instruction a multicycle or two single cycle instructions that must not update the GPR used for indirect addressing. Example: Assembly without optimization ( 6 cycles ) ............. ADD R1, R2 MOV R8, [R1] ADD R5, R1 ADD R6, R1 ; instruction using indirect addressing mode ............. Assembly with optimization ( 4 cycles ) ............. ADD R1, R2 ADD R5, R1 ; inserted one cycle instruction ADD R6, R1 ; inserted one cycle instruction MOV R8, [R1] ; instruction using indirect addressing mode ............. • memory bandwidth conflicts removing Memory bandwidth conflicts can occur if instructions in the pipeline access the same memory ares at the same time. The CoXXX instructions are specially designed for DSP implementation. To avoid the memory bandwidth conflicts in the DPRAM ares, one of the operands should be located in the internal SRAM to guarantee a single cycle execution time of the CoXXX instructions. • instruction re-ordering By writing DSP routines with CoXXX instructions it is often needed to change and update the Special Function Registers (SFRs), such as IDX0, IDX1, QX0, QX1, QR0, QR1, and so on. CPU-SFRs control the CPU functionality and behavior. Therefore, special care is required to ensure that instructions in the pipeline always work with the correct SFRs values. With instruction re-ordering the flow of instructions through the pipeline can be improved to optimize the routines. Example: Assembly code without optimization ( 7 cycles ) ............. User’s Manual for Keil Compiler 3-22 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations EXTR #1 MOV IDX1, #12 CoMUL [IDX1], [R1] MOV R6, R1 ADD R2, R1 ; initialize IDX1 with 12 ............. Assembly code with optimization ( 5 cycles ) ............. EXTR #1 MOV IDX1, #12 ; initialize IDX1 with 12 MOV R6, R1 ; instruction re-ordering ADD R2, R1 ; instruction re-ordering CoMUL [IDX1], [R1] ............. • loop unrolling The equation is written twice or more inside a loop. Example (unrolling factor 2): Assembly code without loop unrolling ( 17 cycles ) ............ MOV R3, #3 CoMAC [IDX0+], [R1+] CoMAC [IDX0+], [R1+] CMPD1 R3,#0h JMPR cc_NZ,loop loop: ............ Assembly code with loop unrolling ( 13 cycles ) ............ MOV R3, #1 CoMAC [IDX0+], [R1+] loop: CoMAC [IDX0+], [R1+] CoMAC [IDX0+], [R1+] User’s Manual for Keil Compiler 3-23 V 1.2, 2003-11 DSP Library for XC16x Family DSP Library Notations CoMAC [IDX0+], [R1+] CMPD1 R3,#0h JMPR cc_NZ,loop ............ 3.4.4 Cycle Count The cycle count given for each function in this User’s Manual represents the cycles to be needed for executing the assembly instructions in the function. They have been verified on XC16x boards. To some degree one can understand as the theoretical cycle count. The given values of cycle count can only be achieved only on conditions that all data and assembly codes are located in the internal memory area and no pipeline conflicts occur in the program. Note that the real cycle count may be much larger than the given values, if the data or source code are located in the external memory. 3.4.5 Testing Methodology The XC166Lib is tested on XC16x board. The test is believed to be accurate and reliable. However, the developer assumes no responsibility for the consequences of use of the XC166Lib. User’s Manual for Keil Compiler 3-24 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions 4 Function Descriptions Each function is described with its signature, inputs, outputs, return, implementation description, pseudocode, techniques used, assumptions made, register usage, memory note, example, cycle count and code size. Functions are classified into the following categories. • • • • • • • • Arithmetic functions FIR filters IIR filters Adaptive filters Fast Fourier Transforms Matrix operations Mathematical operations Statistical functions 4.1 Conventions 4.1.1 Argument Conventions The following conventions have been followed while describing the arguments for each individual function. Table 4-1 Argument Conventions Argument Convention X, x Input data or input data vector beside in FFT functions, where X representing the FFT spectra of the input vector x Y, y Output data or output data vector D_buffer, d_buffer Delay buffer N_x The size of input vectors H, h Filter coefficient vector N_h The size of coefficient vector H DataS Data type definition equating a short, a 16-bit value representing a 1Q15 number DataL Data type definition equating a long, a 32-bit value representing a 1Q31 number User’s Manual for Keil Compiler 4-25 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions Table 4-1 Argument Conventions Argument Convention CplxS Data type definition equating a short, a 16-bit value representing a 1Q15 complex number CplxL Data type definition equating a long, a 32-bit value representing a 1Q31 complex number User’s Manual for Keil Compiler 4-26 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions 4.2 Arithmetic Functions 4.2.1 Complex Numbers A complex number z is an ordered pair (x,y) of real numbers x and y, written as z= (x,y) where x is called the real part and y the imaginary part of z. 4.2.2 Complex Number Representation A complex number can be represented in different ways, such as Rectangular form : C = R + iI [4.1] Trigonometric form : C = M [ cos ( φ ) + j sin ( φ ) ] [4.2] Exponential form : C = Me Magnitude and angle form : C = M ∠φ iφ [4.3] [4.4] In the complex functions implementation, the rectangular form is considered. 4.2.3 Complex Plane To geometrically represent complex numbers as points in the plane two perpendicular coordinate axis in the Cartesian coordinate system are chosen. The horizontal x-axis is called the real axis, and the vertical y-axis is called the imaginary axis. Plot a given complex number z= x + iy as the point P with coordinates (x, y). The xy-plane in which the complex numbers are represented in this way is called the Complex Plane. User’s Manual for Keil Compiler 4-27 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions (Imaginary Axis) y P z = x + iy (Real Axis) O Figure 4-1 4.2.4 x The Complex Plane Complex Arithmetic Addition if z1 and z2 are two complex numbers given by z1 =x1+iy1 and z2 = x2 + iy2, z1+z2 = (x1+iy1) + (x2 + iy2) = (x1+x2) + i(y1+y2) [4.5] Subtraction if z1 and z2 are two complex numbers given by z1 =x1+iy1 and z2 = x2 + iy2, z1-z2 = (x1-x2) + i(y1-y2) [4.6] Multiplication if z1 and z2 are two complex numbers given by z1 =x1+iy1 and z2 = x2 + iy2, z1.z2 = (x1+iy1).(x2 + iy2) = x1x2 + ix1y2 + iy1x2 + i2 y1y2 = (x1x2 - y1y2) + i(x1y2 + x2y1) User’s Manual for Keil Compiler [4.7] 4-28 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions Conjugate The complex conjugate, z of a complex number z = x+iy is given by z = x - iy [4.8] and is obtained by geometrically reflecting the point z in the real axis. 4.2.5 Complex Number Schematic 15 0 High Addr. Imaginary Low Addr. Real Sign Bit Figure 4-2 4.2.6 16-bit Complex number representation Descriptions The following arithmetic functions for 16 bit and 32 bit are described. • • • • 16 bit complex addition 16 bit complex subtraction 16 bit complex multiplication 32 bit real multiplication User’s Manual for Keil Compiler 4-29 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxAdd_16 Signature Inputs 16 bit Complex Number Addition void CplxAdd_16 (CplxS* X, CplxS* Y, ClpxS* R) X : Pointer to 16 bit Complex input value in 1Q15 format Y : Pointer to 16 bit Complex input value in 1Q15 format Output None Return Pointer to the sum of two complex numbers as a 16 bit complex number in 1Q15 format Implementation Description This function computes the sum of two 16 bit complex numbers. Wraps around the result in case of overflow. The algorithm is as follows Rr = xr + yr [4.9] Ri = xi + yi Pseudo code { R.real = X.real + Y.real; //add the real part R.imag = X.imag + Y.imag; //add the imaginary part return R; } Techniques None Assumption Register Usage • From .c file to inline assembly file: Decided by compiler User’s Manual for Keil Compiler 4-30 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxAdd_16 16 bit Complex Number Addition (cont’d) Memory Note 15 0 Imaginary 15 Imaginary 0 Imaginary 0 15 + Real Real + Real Figure 4-3 Example Complex Number addition for 16 bits C166Lib\Keil\Examples\Arith_16\Arith_16.c Cycle Count Initialization and read input values 6 Real Addition 3 Imaginary Addition 3 Total 12 Initialization and read input values 12 bytes Real Addition 10 bytes Code Size User’s Manual for Keil Compiler 4-31 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxAdd_16 16 bit Complex Number Addition (cont’d) Imaginary Addition 10 bytes Total 32 bytes User’s Manual for Keil Compiler 4-32 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxSub_16 Signature Inputs 16 bit Complex Number Subtraction void CplxSub_16 (CplxS* X, CplxS* Y, CplxS* R ) X : Pointer to 16 bit complex input value in 1Q15 format Y : Pointer to 16 bit complex input value in 1Q15 format Output None Return Pointer to the difference of two complex numbers as a 16 bit complex number Implementation Description This function computes the difference of two 16 bit complex numbers. Wraps around the result in case of underflow. The algorithm is as follows. Rr = xr – yr [4.10] Ri = xi – yi Pseudo code { R.real = X.real - Y.real; //subtract the real part R.imag = X.imag - Y.imag; //subtract the imaginary part return R; } Techniques None Assumption Register Usage • From .c file to inline assembly file: Decided by compiler User’s Manual for Keil Compiler 4-33 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxSub_16 16 bit Complex Number Subtraction (cont’d) Memory Note 15 0 Imaginary 0 15 Real Imaginary 15 0 Real Imaginary Real Figure 4-4 Example Complex number subtraction for 16 bits C166Lib\Keil\Examples\Arith_16\Arith_16.c Cycle Count Initialization and read input values 6 Real Subtraction 3 Imaginary Subtraction 3 Total 12 Initialization and read input values 12 bytes Real Subtraction 10 bytes Imaginary Subtraction 10 bytes Total 32 bytes Code Size User’s Manual for Keil Compiler 4-34 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxMul_16 16 bit Complex Number Multiplication Signature void CplxMul_16 (CplxS* X, CplxS* Y, CplxS* R ) Inputs X : Pointer to 16 bit Complex input value in 1Q15 format Y : Pointer to 16 bit Complex input value in 1Q15 format Return Pointer to the multiplication result in 1Q15 format Implementation Description This function computes the product of the two 16 bit complex numbers. Wraps around the result in case of overflow. The complex multiplication is computed as follows. Rr = xr × yr – xi × yi Ri = xi × yr + xr × yi Pseudo code { R->real = X.real*Y.real - Y.imag*X.imag; R->imag = X.real*Y.imag + Y.real*X.imag; } Techniques None Assumption Register Usage • From .c file to inline assembly file: Decided by compiler User’s Manual for Keil Compiler 4-35 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxMul_16 16 bit Complex Number Multiplication (cont’d) Memory Note 15 0 Imaginary x Real 15 x 0 Imaginary + x Imaginary 0 15 - Real x Real Figure 4-5 Example Complex number multiplication for 16 bits C166Lib\Keil\Examples\Arith_16\Arith_16.c Cycle Count Initialization and read input values 5 Real multiplication 3 Imaginary multiplication 3 Total 11 Initialization and read input values 10 bytes Real multiplication 12 bytes Code Size User’s Manual for Keil Compiler 4-36 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions CplxMul_16 16 bit Complex Number Multiplication (cont’d) Imaginary multiplication 12 bytes Total 34 bytes User’s Manual for Keil Compiler 4-37 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions Mul_32 32 bit Real Multiplication Signature DataL Mul_32 (DataL X, DataL Y) Inputs X : 32 bit real input value in 1Q31 Y : 32 bit real input value in 1Q31 Return Multiplication result in 1Q31 format Implementation Description This function computes the product of the two 32 bit real numbers. Wraps around the result in case of overflow. The multiplication is computed as follows. R = x L × y L + x L × y H + ( x H × y L + x H × y H ) » 16 Pseudo code { R = xL*yL + xL*yH + (xH*yL + xH*yH)>>16 ; } Techniques None Assumption Register Usage • From .c file to inline assembly file: Decided by compiler. • From .asm file to .c file: RL(LSW) is stored in R4. RH(MSW) is stored in R5. User’s Manual for Keil Compiler 4-38 V 1.2, 2003-11 DSP Library for XC16x Family Function Descriptions Mul_32 32 bit Real Multiplication (cont’d) Memory Note 15 0 XL x XH 32 x 16 RL + RH x YL 0 15 32 x 16 YH Figure 4-6 Example 32 bit real number multiplication C166Lib\Keil\Examples\Arith_16\Arith_16.c Cycle Count Code Size Initialization 5 Multiplication 8 Output 2 Total 15 Initialization 10 bytes Multiplication 32 bytes Output 8 Total User’s Manual for Keil Compiler 50 4-39 bytes V 1.2, 2003-11 DSP Library for XC16x Family 4.3 FIR Filters The FIR (Finite Impulse Response) filter, as its name suggests, will always have a finite duration of non-zero output values for given finite duration of non-zero input values. FIR filters use only current and past input samples, and none of the filter's previous output samples, to obtain a current output sample value. For causal FIR systems, the system function has only zeros (except for poles at z=0). The realization of an FIR filter has many forms. But the transversal and lattice forms are most useful structures in practice. 4.3.1 Transversal Form The transversal form of an Fir filter showed in Figure 4-7 is realized by a tapped delay line. The delay line stores the past input values. The input x(n) for the current calculation will become x(n-1) for the next calculation. The output from each tap is summed to generate the filter output. For a general N tap FIR filter, the difference equation is N–1 y(n) = ∑ hi ⋅ x ( n – i ) [4.11] i=0 where, x(n) : the filter input for nth sample y(n) : output of the filter for nth sample hi : filter coefficients N : filter order The filter coefficients, which decide the scaling of current and past input samples stored in the delay line, define the filter response. The transfer function of the filter in Z-transform is [ z ]- = H[z] = Y ----------X[z ] N–1 ∑ hi ⋅ z –i [4.12] i=0 User’s Manual for Keil Compiler -40 V 1.2, 2003-11 DSP Library for XC16x Family Delay Line x(n) (Filter Input) h0 x(n) z-1 X h1 x(n-1) z-1 z-1 hN-1 X x(n-N+1) X + y(n) (Filter Output) Figure 4-7 4.3.2 Block Diagram of the FIR Filter Lattice Form The structure of a lattice FIR filter is showed in Figure 4-8 . Each stage of the filter has an input and output that are related by the equations: yi ( n ) = yi – 1 ( n ) + ki ui ( n – 1 ) ui ( n ) = ki yi – 1 ( n ) + ui – 1 ( n – 1 ) 1 ≤ i≤ M [4.13] The initial values are equal to the filter input x(n): User’s Manual for Keil Compiler -41 V 1.2, 2003-11 DSP Library for XC16x Family y0 ( n ) = x ( n ) u0 ( n ) = x ( n ) At the last stage we have the output of the lattice FIR filter y(n)=yM(n). + x(n) k1 z-1 y1(n) y2(n) …. + k2 u1(n) + + kM + z-1 …. z-1 u2(n) Figure 4-8 4.3.3 yM(n) + uM(n) Lattice FIR Filter Multirate FIR Filters Multirate filters are a kind of digital filters that change the sampling rate of a digital signal. A multirate filter converts a digital signal with sampling rate M to another digital signal with sampling rate N. The both digital signals represent the same analog signal at differtent sampling rates. A multirate filter can be realized in an FIR filter or an IIR filter. Due to the advantages of FIR filters, such as linear phase, unconditional stability, simple structure and easy coefficient design, most of the multirate filters are implemented with FIR filters. Hier we describe only FIR multirate filters. The basic operations of the multirate filters are decimation and interpolation. Decimation reduces the sample rate of a signal and can be used to eliminate redundant or unnecessary information contained in the signal. Interpolation increases the sample rate of a signal through filling in missing information between the samples of a signal based on the calculation on the existing data. 4.3.3.1 FIR Decimation Filter The FIR decimation filter can be described using the equation N–1 y(m) = ∑ h ( k )x ( mM – k ) [4.14] k=0 User’s Manual for Keil Compiler -42 V 1.2, 2003-11 DSP Library for XC16x Family where h(k) is filter coefficient vector, x(n) is the input signal and M represents the decimation factor. Figure 4-9 shows the block diagram of an FIR decimation filter. Its equivelent form is showed in Figure 4-10 . x(n) Figure 4-9 h(k) w (n) LM y(m ) Block Diagram of FIR Decimation Filter h(0) x(n) z -1 z y(m) h(1) -1 h(2) z -1 z -1 LM h(3) i i i i i i h(N-1) i i i Figure 4-10 Equivelent Implementation of FIR Decimation Filter 4.3.3.2 FIR Interpolation Filter In comparison with decimation filter the interpolation filter can be used in the reconstruction of a digital signal from another digital signal. Figure 4-11 shows a block diagram of an interpolation filter, where the low-pass filter of the interpolator uses an Fir filter structure. An Fir interpolation filter can be described as N–1 y(m) = ∑ h ( k )x ( ( m – k ) ⁄ L ) [4.15] k=0 User’s Manual for Keil Compiler -43 V 1.2, 2003-11 DSP Library for XC16x Family for m-k=0, L, 2L, ... . An equivelent implementation of the Fir interpolation filter is showed in Figure 4-12. x(n) L h(k) y(m ) Figure 4-11 Block Diagram of FIR Interpolation Filter x(n) h(0) KL z -1 z -1 z -1 z -1 y(m) h(1) h(2) h(3) i i i i i i h(N-1) i i i Figure 4-12 Equivelent Implementation of FIR Interpolation Filter 4.3.4 Descriptions The following FIR filter routines are implemented in the library. • • • • • • • Fir filter with transversal form, 16 bit filter coefficients, Sample processing Fir filter with transversal form, 32 bit filter coefficients, Sample processing Complex Fir filter with transversal form, 16 bit filter coefficients, Vector processing Symmetric Fir filter, 16 bit filter coefficients, Vector processing Lattice Fir filter, 16 bit filter coefficients, Vector processong Fir decimation filter, 16 bit filter coefficients, Vector processing Fir interpolation filter, 16 bit filter coefficients, Vector processing User’s Manual for Keil Compiler -44 V 1.2, 2003-11 DSP Library for XC16x Family Fir_16 FIR Filter with transversal form, 16 bit coefficients, Sample processing Signature DataS Fir_16 ( DataS* H, DataS* IN, DataS N_h, DataS* D_buffer) Inputs H : Pointer to filter coefficients in 1Q15 format IN : Pointer to the new input sample in 1Q15 format L : Filter order D_buffer Pointer to delay buffer Output : Return Y Implementation Description The implementation of FIR filter uses transversal structure (direct form). A single input is processed at a time and output for every sample is returned. The filter operates on 16-bit real input, 16-bit coefficients and gives 16-bit real output. The number of coefficients given by the user is arbitrary. The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer will be located in the DPRAM area. User’s Manual for Keil Compiler : -45 Filter output in 1Q15 format V 1.2, 2003-11 DSP Library for XC16x Family Fir_16 FIR Filter with transversal form, 16 bit coefficients, Sample processing (cont’d) Pseudo code { short x(N_h)={0,...}; short Y; short i; //Update for(i=0; x(i) = x(N_h-1) //Input vector //Filter result the input vector with the new input value i<N_h-1; i++) x(i+1); = IN; //move the new input unto X[N_h-1] //Calculate the current FIR output Y = 0; for(i=0; i<N_h; i++) Y = Y + h(i)*x(N_h-1-i) //FIR filter output return Y; //Filter output returned } Techniques • Memory bandwidth conflicts removing Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -46 V 1.2, 2003-11 DSP Library for XC16x Family Fir_16 FIR Filter with transversal form, 16 bit coefficients, Sample processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-N_h+1) d(n-N_h+2) . . d(n-3) d(n-2) d(n-1) d(n-1) Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) DPRAM d(n-N_h+2) d(n-N_h+3) . . d(n-2) d(n-1) d(n) d(n) IDX0 IDX0 Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) CoMACM 1Q15 1Q15 Before After Figure 4-13 Fir_16 Example C166Lib\Keil\Examples\Filters\Fir\Fir16.c Cycle Count Memory Initialization 9 Read new input into DPRAM 2 FIR loop N_h User’s Manual for Keil Compiler -47 V 1.2, 2003-11 DSP Library for XC16x Family Fir_16 FIR Filter with transversal form, 16 bit coefficients, Sample processing (cont’d) Write output 3 Total N_h + 14 Example: N_h = 14 cycle = 28 Code Size Memory 18 bytes 8 bytes FIR loop 10 bytes Write output 10 bytes Total 46 bytes initialization Read new input into DPRAM User’s Manual for Keil Compiler -48 V 1.2, 2003-11 DSP Library for XC16x Family Fir_32 FIR Filter with transversal form, 32 bit coefficients, Sample processing Signature DataS Fir_32 ( DataL* H, DataS* IN, DataS N_h, DataS* D_buffer) Inputs H : Pointer to filter coefficients in 1Q31 format IN : Pointer to the new input sample in 1Q15 format N_h : Filter order D_buffer Pointer to delay buffer located in DPRAM area from 0xf200 to 0xfe00 (3 KBytes) Output : Return Y Implementation Description The implementation of FIR filter uses transversal structure (direct form). A single input is processed at a time and output for every sample is returned. The filter operates on 16-bit real input, 32-bit coefficients and gives 16-bit real output. The number of coefficients given by the user is arbitrary. The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer will be located in the DPRAM area. User’s Manual for Keil Compiler : -49 Filter output in 1Q15 format V 1.2, 2003-11 DSP Library for XC16x Family Fir_32 FIR Filter with transversal form, 32 bit coefficients, Sample processing (cont’d) Pseudo code { short short short long x(N_h)={0,...}; Y; i; temp; //Update for(i=0; x(i) = x(N_h-1) //Input vector //Filter result the input vector with the new input value i<N_h-1; i++) x(i+1); = IN; //move the new input unto X[N_h-1] //Calculate the current FIR output temp = 0; for(i=0; i<N_h; i++) temp = temp + h(i)*x(N_h-1-i) Y = (short)temp; return Y; //FIR filter output //Filter output returned } Techniques • Memory bandwidth conflicts removing • Instruction re-ordering • Loop unrolling Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -50 V 1.2, 2003-11 DSP Library for XC16x Family Fir_32 FIR Filter with transversal form, 32 bit coefficients, Sample processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-1) d(n-1) d(n-2) d(n-3) . . d(n-N_h+2) d(n-N_h+1) Memory hH(0) hL(0) hH(1) hL(1) . . hH(N_h-1) hL(N_h-1) DPRAM d(n) d(n) d(n-1) d(n-2) . . d(n-N_h+3) d(n-N_h+2) IDX0 IDX0 Memory hH(0) hL(0) hH(1) hL(1) . . hH(N_h-1) hL(N_h-1) CoMACM 1Q15 1Q15 Before After Figure 4-14 Fir_32 Example C166Lib\Keil\Examples\Filters\Fir\Fir32.c Cycle Count Memory Initialization 16 Read new input into DPRAM 2 FIR loop (LSW) N_h + 4 FIR loop (MSW) N_h + 1 User’s Manual for Keil Compiler -51 V 1.2, 2003-11 DSP Library for XC16x Family Fir_32 FIR Filter with transversal form, 32 bit coefficients, Sample processing (cont’d) Write output 1 Total 2*N_h + 24 Example: N_h = 12 cycle = 48 Code Size Memory 32 bytes initialization Read new input into DPRAM 8 bytes FIR loop (LSW) 28 bytes FIR loop (MSW) 14 bytes Write output 4 bytes Total User’s Manual for Keil Compiler 86 bytes -52 V 1.2, 2003-11 DSP Library for XC16x Family Fir_cplx Complex Fir filter with transversal form, 16 bit filter coefficients, Vector processing Signature DataS Fir_cplx ( CplxS* x, CplxS* h, CplxS* y, DataS* d_buffer, DataS N_x, DataS N_h) Inputs x : Pointer to the input vector in 1Q15 format h : Pointer to filter coefficients in 1Q15 format d_buffer : Pointer to delay buffer Pointer to the new input sample in 1Q15 format N_x : Size of input vector N_h Output Filter order y : Filter output vector in 1Q15 format Return Implementation Description The implementation of the complex FIR filter uses transversal structure (direct form). A vector of the inputs is processed at a time and the output vector is returned. The filter operates on 16-bit complex input vector, 16-bit complex coefficients and gives 16-bit complex output vector. The number of coefficients given by the user is arbitrary. The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer must be located in the DPRAM area. User’s Manual for Keil Compiler -53 V 1.2, 2003-11 DSP Library for XC16x Family Fir_cplx Complex Fir filter with transversal form, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { CplxS CplxS CplxS CplxS short x[N_x]; //Input vector Y[N_x]; //Filter result h[N_h]; //filter coefficient vector d_buffer[N_h]={0,...} i, j; for(j=0; j<N_x; j++) { //Update the delay buffer with the new input value for(i=0; i<N_h-1; i++) d_buffer[i] = d_buffer[i+1]; d_buffer[N_h-1] = x[j]; //move the new input unto d_buffer[N_h-1] //Calculate the current FIR output y_real[j] = 0; y_imag[j] = 0; for(i=0; i<N_h; i++) { y_real[j] = y_real[j] + h_real[i]*d_buffer_real[j-i]h_imag[i]*d_buffer_imag[j-i]; y_imag[j] = y_imag[j] + h_real[i]*d_buffer_imag[j-i]h_imag[i]*d_buffer_real[j-i]; } } } Techniques Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: User’s Manual for Keil Compiler -54 V 1.2, 2003-11 DSP Library for XC16x Family Fir_cplx Complex Fir filter with transversal form, 16 bit filter coefficients, Vector processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM dim(n-N_h+1) dre(n-N_h+1) . . dim(n-1) dre(n-1) dim(n-1) dre(n-1) DPRAM dim(n-N_h+2) dre(n-N_h+2) . . dim(n) dre(n) dim(n) dre(n) IDX0 Memory him(N_h-1) hre(N_h-1) . . him(1) hre(1) him(0) hre(0) IDX0 Memory him(N_h-1) hre(N_h-1) . . him(1) hre(1) him(0) hre(0) CoMACM 1Q15 1Q15 Before After Figure 4-15 Fir_cplx Example C166Lib\Keil\Examples\Filters\Fir\FirCplx.c Cycle Count Read parameters 1 Memory initialization 10 Set counters 3 Fir loop User’s Manual for Keil Compiler N_x(4N_h + 24) -55 V 1.2, 2003-11 DSP Library for XC16x Family Fir_cplx Complex Fir filter with transversal form, 16 bit filter coefficients, Vector processing (cont’d) Return 1 Total N_x(4N_h + 24) + 15 Example: N_x = 1 N_h = 14 cycle = 95 Code Size Read parameters Memory initialization 20 bytes FIR loop 124 bytes Return 2 bytes Total User’s Manual for Keil Compiler 2 bytes 148 bytes -56 V 1.2, 2003-11 DSP Library for XC16x Family Fir_sym Symmetric Fir filter, 16 bit filter coefficients, Vector processing Signature DataS Fir_sym ( DataS* x, DataS* h, DataS* y, DataS* d_buffer, DataS N_x, DataS N_h) Inputs x : Pointer to the input vector in 1Q15 format h : Pointer to symmetric filter coefficients in 1Q15 format d_buffer : Pointer to delay buffer N_x : Size of input vector N_h Output Half of the filter order y : Filter output vector in 1Q15 format Return Implementation Description The implemented Fir filter has the symmetric filter coefficient, i.e. h(i)=h(2N_h-1-i), for i=0,1,..., N_h-1. The implementation uses transversal structure (direct form) and vector processing. The filter operates on 16-bit real input vector, 16bit real coefficients and gives 16-bit real output vector. The number of coefficients should be even.The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer must be located in the DPRAM area. User’s Manual for Keil Compiler -57 V 1.2, 2003-11 DSP Library for XC16x Family Fir_sym Symmetric Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { short short short short short x[N_x]; y[N_x]; h[2N_h]; d_buffer[2N_h]={0,...}; i, j; //Input vector //Filter result vector //filter coefficient vector //delay buffer for(j=0; j<N_x; j++) { //Update the input vector with the new input value for(i=0; i<2N_h-1; i++) d_buffer[i] = d_buffer[i+1]; d_buffer(2N_h-1) = x[j]; //move the new input unto d_buffer[2N_h-1] //Calculate the current FIR output y[j] = 0; for(i=0; i<2N_h; i++) y[j] = y[j] + h(i)*d_buffer(2N_h-1-i) //FIR filter output } return; } Techniques Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: User’s Manual for Keil Compiler -58 V 1.2, 2003-11 DSP Library for XC16x Family Fir_sym Symmetric Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-N_h+1) d(n-N_h+2) . . d(n-3) d(n-2) d(n-1) d(n-1) DPRAM d(n-N_h+2) d(n-N_h+3) . . d(n-2) d(n-1) d(n) d(n) IDX0 Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) IDX0 Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) CoMACM 1Q15 1Q15 Before After Figure 4-16 Fir_sym Example C166Lib\Keil\Examples\Filters\Fir\FirSym.c Cycle Count Read parameters 1 Memory initialization 7 Set counters 3 Fir loop N_x(2N_h + 12) User’s Manual for Keil Compiler -59 V 1.2, 2003-11 DSP Library for XC16x Family Fir_sym Symmetric Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Return 1 Total N_x(2N_h + 12) + 11 Example: N_x = 1 N_h = 14 cycle = 51 Code Size Read parameters 2 bytes Memory initialization 14 bytes Set counters 6 FIR loop Return Total User’s Manual for Keil Compiler -60 68 bytes 2 bytes 92 bytes V 1.2, 2003-11 DSP Library for XC16x Family Fir_lat Lattice Fir filter, 16 bit filter coefficients, Vector processing Signature DataS Fir_lat ( DataS* x, DataS* K, DataS* y, DataS* u, DataS N_x, DataS M) Inputs x : Pointer to the input vector in 1Q15 format K : Pointer to lattice coefficients in 1Q15 format u : Pointer to state variable vector N_x : Size of input vector M Output Nummber of stages of the lattice filter y : Filter output vector in 1Q15 format Return Implementation Description The implementation uses lattice structure showed in Figure 4-8 and vector processing. The filter operates on 16bit real input vector, 16-bit real coefficients and gives 16-bit real output vector. The number of stages used in the lattice Fir filter implementation is arbitrary. Lattice coefficient K must be located in the DPRAM area. User’s Manual for Keil Compiler -61 V 1.2, 2003-11 DSP Library for XC16x Family Fir_lat Lattice Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { short x[N_x]; short yn[M]={0,..}; short K[M]; short un[M]; short un-1[M]={0,...}; //Input vector //Output vector of different stages at time n //Lattice filter coefficient vector //State variable vector at time n //State variable vector at time n-1 short i, j; for(j=0; j<N_x; j++) { //Initialization yn[j] = x[j]; //Calculate the output and state vector for(i=1; i<M; i++) { un[i] = K[i]*yn[j] + un-1[i-1]; //State variable yn[j] = yn[j] + K[i]*un-1[i-1]; //Output un-1[i] = un[i]; //Update the state vector } //Update the first state variable un-1[0] = x[j]; } return; } Techniques Assumptions • Lattice coefficient vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: User’s Manual for Keil Compiler -62 V 1.2, 2003-11 DSP Library for XC16x Family Fir_lat Lattice Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Memory Note DPRAM High Addr. Low Addr. Memory K(M-1) K(M-2) . . K(1) K(0) X IDX0 + X ui-1(n-1) ui-1(n-1) . . ui-1(n-1) ui-1(n-1) + yi(n) ui(n) x(n) Figure 4-17 Fir_lat Example C166Lib\Keil\Examples\Filters\Fir\FirLat.c Cycle Count Read parameters 1 Memory initialization 9 Set counters 1 Fir loop N_x(15M - 6) User’s Manual for Keil Compiler -63 V 1.2, 2003-11 DSP Library for XC16x Family Fir_lat Lattice Fir filter, 16 bit filter coefficients, Vector processing (cont’d) Return 1 Total N_x(15M - 6) + 12 Example: N_x = 1 M=3 cycle = 51 Code Size Read parameters 2 bytes Memory initialization 18 bytes Set counters 2 bytes 62 bytes 2 bytes 86 bytes FIR loop Return Total User’s Manual for Keil Compiler -64 V 1.2, 2003-11 DSP Library for XC16x Family Fir_dec Fir decimation filter, 16 bit filter coefficients, Vector processing Signature DataS Fir_dec ( DataS* x, DataS* h, DataS* y, DataS* d_buffer, DataS N_x, DataS N_h, DataS D) Inputs x : Pointer to the input vector in 1Q15 format h : Pointer to the filter coefficient vector in 1Q15 format d_buffer : Pointer to the delay buffer N_x : Size of the input vector N_h : Filter order D Output Decimation factor y : Return Implementation Description Pointer to output vector : The implementation of the decimation filter is after Figure 4-9, where the FIR filter uses transversal structure (direct form). The vector processing is used in the implementation. The filter operates on 16-bit real input, 16-bit coefficients and gives 16bit real output. The number of coefficients and the decimator factor given by the user is arbitrary. The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer will be located in the DPRAM area. User’s Manual for Keil Compiler -65 V 1.2, 2003-11 DSP Library for XC16x Family Fir_dec Fir decimation filter, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { short short short short x[N_x]; d_buffer[N_h]; y[N_x/D]; i, j, k; //Input vector //Delay buffer //Filter result for(j=0; j<N_x/D; j++) { for(k=0; k<D; k++) { //Update the delay buffer with the new input value for(i=0; i<N_h-1; i++) d_buffer[i] = d_buffer[i+1]; d_buffer[N_h-1] = x[j*D+k]; if(k==0) {//Calculate the current decimation FIR filter y[j] = 0; for(i=0; i<N_h; i++) y[j] = y[j] + h(i)*d_buffer[N_h-i]; //filter output } } } return; } Techniques • Memory bandwidth conflicts removing Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Ouput in vector y User’s Manual for Keil Compiler -66 V 1.2, 2003-11 DSP Library for XC16x Family Fir_dec Fir decimation filter, 16 bit filter coefficients, Vector processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-N_h+1) d(n-N_h+2) . . d(n-3) d(n-2) d(n-1) d(n-1) X DPRAM d(n-N_h+2) d(n-N_h+3) . . d(n-2) d(n-1) d(n) d(n) CoMACM Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) IDX0 Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) LM Output 1Q15 1Q15 Before After Figure 4-18 Fir_dec Example C166Lib\Keil\Examples\Filters\Fir\FirDec.c Cycle Count Read parameter 1 Memory Initialization 7 Set counters 6 FIR loop N_x(N_h + 13) Return 1 User’s Manual for Keil Compiler -67 V 1.2, 2003-11 DSP Library for XC16x Family Fir_dec Fir decimation filter, 16 bit filter coefficients, Vector processing (cont’d) Total N_x(N_h + 13) + 15 Example: N_x = 1 N_h = 10 cycle = 33 Code Size Read parameter 2 bytes Memory 14 bytes initialization Set counters 12 bytes FIR loop 44 bytes Return 2 bytes Total User’s Manual for Keil Compiler 74 bytes -68 V 1.2, 2003-11 DSP Library for XC16x Family Fir_inter Fir interpolation filter, 16 bit filter coefficients, Vector processing Signature DataS Fir_inter ( DataS* x, DataS* h, DataS* y, DataS* d_buffer, DataS N_x, DataS N_h, DataS I) Inputs x : Pointer to the input vector in 1Q15 format h : Pointer to the filter coefficient vector in 1Q15 format d_buffer : Pointer to the delay buffer N_x : Size of the input vector N_h : Filter order I Output Interpolation factor y : Return Implementation Description Pointer to output vector : The implementation of the interpolation filter is after Figure 411, where the FIR filter uses transversal structure (direct form). The vector processing is used in the implementation. The filter operates on 16-bit real input, 16-bit coefficients and gives 16-bit real output. The number of coefficients and the interpolation factor given by the user is arbitrary. The delay line is implemented in parallel to the multiply-accumulate operation using instructions CoMACM. Delay buffer will be located in the DPRAM area. User’s Manual for Keil Compiler -69 V 1.2, 2003-11 DSP Library for XC16x Family Fir_inter Fir interpolation filter, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { short short short short x[N_x]; d_buffer[N_h]; y[N_x*I]; i, j, k, m; //Input vector //Delay buffer //Filter result for(j=0; j<N_x; j++) { for(k=0; k<I; k++) { //Update the delay buffer with the new input value for(i=0; i<N_h-1; i++) d_buffer[i] = d_buffer[i+1]; if(k==0) d_buffer[N_h-1] = x[j]; else d_buffer[N_h-1] = 0; //Calculate the current decimation FIR filter y[j*I+k] = 0; for(i=0; i<N_h; i++) y[j*I+k] = y[j*I+k] + h(i)*d_buffer[N_h-i]; //filter output } } return; } Techniques • Memory bandwidth conflicts removing Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Ouput in vector y User’s Manual for Keil Compiler -70 V 1.2, 2003-11 DSP Library for XC16x Family Fir_inter Fir interpolation filter, 16 bit filter coefficients, Vector processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-N_h+1) d(n-N_h+2) . . d(n-3) d(n-2) d(n-1) d(n-1) Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) Input KL DPRAM d(n-N_h+2) d(n-N_h+3) . . d(n-2) d(n-1) d(n) d(n) IDX0 Memory h(N_h-1) h(N_h-2) . . . h(2) h(1) h(0) CoMACM X Output 1Q15 1Q15 Before After Figure 4-19 Fir_inter Example C166Lib\Keil\Examples\Filters\Fir\FirInter.c Cycle Count Read parameter 1 Memory Initialization 7 Set counters 6 FIR loop N_x(N_h + 13) User’s Manual for Keil Compiler -71 V 1.2, 2003-11 DSP Library for XC16x Family Fir_inter Fir interpolation filter, 16 bit filter coefficients, Vector processing (cont’d) Return 1 Total N_x(N_h + 13) + 15 Example: N_x = 1 N_h = 10 cycle = 33 Code Size Read parameter 2 bytes Memory 14 bytes initialization Set counters 12 bytes FIR loop 44 bytes Return 2 bytes Total User’s Manual for Keil Compiler 74 bytes -72 V 1.2, 2003-11 DSP Library for XC16x Family 4.4 IIR Filters Infinite Impulse Response (IIR) filters have infinite duration of non-zero output values for a given finite duration of non-zero impulse input. Infinite duration of output is due to the feedback used in IIR filters. Recursive structures of IIR filters make them computational efficient but because of feedback not all IIR structures are realizable (stable). An IIR filter has different implementation structures according to its mathematical description. 4.4.1 Direct Form 1 th The N order difference equation for the direct form 1 of the IIR filter is given by y(n ) = N M i=1 i=0 ∑ a(i – 1) ⋅ y(n – i ) + ∑ b( i) ⋅ x(n – i) [4.16] where, x(n) is the nth input and y(n) is the corresponding output. If M=N=2, we have the biquad (second order) IIR filter as y ( n ) = b ( 0 )x ( n ) + b ( 1 )x ( n – 1 ) + b ( 2 )x ( n – 2 ) + a ( 0 )y ( n – 1 ) + a ( 1 )y ( n – 2 ) [4.17] where a(0), a(1) correspond to the poles and b(0), b(1), b(2) correspond to the zeroes of the filter. The equivalent transform function is –1 –2 [ z ]- = ------------------------------------------------------------b ( 0 ) + b ( 1 )z + b ( 2 )z H[z] = Y ----------–1 –2 X[z] 1 – a ( 0 )z – a ( 1 )z 4.4.2 . [4.18] Direct Form 2 In the case of a linear shift-invariant system, the overall input-output relationship of a cascade is independent of the order in which systems are cascaded. This property suggests a second direct form realization. Breaking Equation [4.16] into two parts in terms of zeroes and poles of transfer function (M=N), we have User’s Manual for Keil Compiler -73 V 1.2, 2003-11 DSP Library for XC16x Family N u(n) = x(n) + ∑ a(i – 1 ) ⋅ u(n – i) [4.19] i=1 N y(n) = ∑ b(i) ⋅ u(n – i) [4.20] i=0 where intermediate state variable u(n) is used to calculate the filter output y(n). This representation is called "Direct Form 2" implementation of an IIR filter and is illustrated in Figure 4-20. Direct Form 2 has an advantage over Direct Form 1 as it requires less data memory. x(n) u(n) + b(0) + y(n) z-1 a(0) + u(n-1) b(1) + z-1 a(1) + + u(n-2) b(2) + u(n-N+1) a(N-2) b(N-1) + z-1 + a(N-1) u(n-N) b(N) + Figure 4-20 Canonical Form (Direct Form 2) User’s Manual for Keil Compiler -74 V 1.2, 2003-11 DSP Library for XC16x Family 4.4.3 Cascaded Biquad IIR Filter with Direct Form 2 If N=2, Equation [4.19] and Equation [4.20] are reduced to the biquad (second order) IIR filter with direct form 2 implementation: u(n ) = x(n ) + a( 0) ⋅ u( n – 1) + a(1 ) ⋅ u(n – 2 ) [4.21] y(n ) = b(0 ) ⋅ u(n ) + b(1 ) ⋅ u(n – 1 ) + b( 2) ⋅ u( n – 2) [4.22] Any higher order IIR filter can be constructed by cascading several biquad stages together. A cascaded realization of a fourth order system using direct form 2 realization of each biquad subsystem would be as shown in the following diagram. x(n) + u1(n) b1(0) + u2(n) b2(0) + z-1 + u1(n-1) + + a2(0) b2(1) u2(n-1) + z-1 z-1 a1(1) y(n) z-1 b1(1) a1(0) + b1(2) a2(1) u1(n-2) b2(2) u2(n-2) Figure 4-21 Cascaded Biquad IIR Filter with Direct Form 2 Implementation 4.4.4 Cascaded Biquad IIR Filter with Direct Form 1 In Figure 4-21 each biquad subsystem is realized with direct form 2 structure. Similarly, the biquad subsystem can also be implemented with direct form 1 structure. Rewriting Equation [4.17]we have y(n) = b(0) ⋅ x(n) + u(n – 1) u(n) = a(1) ⋅ y(n) + b(1) ⋅ x(n) + w(n – 1) w(n) = a(2) ⋅ y(n) + b(2) ⋅ x(n) User’s Manual for Keil Compiler -75 [4.23] [4.24] [4.25] V 1.2, 2003-11 DSP Library for XC16x Family where u(n) and w(n) are state variables at time n. According to Equation [4.23], Equation [4.24] and Equation [4.25] we have another cascaded biquads IIR filter implementation showed in Figure 4-22. x(n) + + b1(0) b1(1) + z-1 u1(n) a1(0) + + + b2(0) b2(1) z-1 u2(n) a2(0) y(n) + z-1 z-1 b1(2) + b2(2) w1(n) w2(n) a2(1) a1(1) Figure 4-22 Cascaded Biquad IIR Filter with Direct Form 1 Implementation A Comparison between FIR and IIR filters: • IIR filters are computational efficient than FIR filters i.e., IIR filters require less memory and fewer instruction when compared to FIR to implement a specific transfer function. • The number of necessary multiplications are least in IIR while it is most in FIR. • IIR filters are made up of poles and zeroes. The poles give IIR filter an ability to realize transfer functions that FIR filters cannot do. • IIR filters are not necessarily stable, because of their recursive nature it is designer’s task to ensure stability, while FIR filters are guaranteed to be stable. • IIR filters can simulate prototype analog filter while FIR filters cannot. • Probability of overflow errors is quite high in IIR filters in comparison to FIR filters. • FIR filters are linear phase as long as H(z) = H(z-1) but all stable, realizable IIR filters are not linear phase except for the special cases where all poles of the transfer function lie on the unit circle. 4.4.5 Lattice IIR Filter Similar with lattice Fir filter the IIR filter can be also realized using the lattice filter structure. The typical application of the lattice IIR filter is found in voice analysis and sythesis system, where lattice Fir filer is used to analyse the speech and the lattice IIR filter is used in the synthesis of speech. User’s Manual for Keil Compiler -76 V 1.2, 2003-11 DSP Library for XC16x Family The lattice IIR filter showed in equations. Figure 4-23 can be described using the following yi – 1 ( n ) = yi ( n ) –ki ui – 1 ( n – 1 ) [4.26] 0 ≤ i≤ M– 1 ui ( n ) = ki yi – 1 ( n ) + ui – 1 ( n – 1 ) The initial value is equal to the filter input x(n): yM ( n ) = x ( n ) At the last stage (i=0) we have the output of the lattice IIR filter y(n)=y0(n). yM-1(n) yM(n)=x(n) …. + kM + …. z-1 uM(n) y(n)=y0(n) y 1(n) + + k2 k1 + + z-1 u1(n) uM-1(n) z-1 u0(n) Figure 4-23 Lattice IIR Filter Structure 4.4.6 Descriptions The following IIR filter routines are described. • 16 bit filter coefficients, direct form 1, sample processing • 16 bit filter coefficients, direct form 2, sample processing • 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing • 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing • 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing User’s Manual for Keil Compiler -77 V 1.2, 2003-11 DSP Library for XC16x Family • 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing • Lattice IIR filter, 16 bit filter coefficients, vector processong User’s Manual for Keil Compiler -78 V 1.2, 2003-11 DSP Library for XC16x Family IIR_1 16 bit filter coefficients, Direct form 1, Sample processing Signature DataS IIR_1( DataS* h, DataS* IN, DataS N, DataS* x_y) Inputs h : Pointer to filter coefficients in 2Q14 IN : Pointer to new input sample in 1Q15 N : Filter order x_y : Pointer to delay buffer containing the past input and output samples Output : Return Y Implementation Description The routine is implemented according to the Equation [4.16] , which is called direct form 1 implementation. The vector h contains IIR filter coefficient a(i) and b(i), and is stored in memory. Before starting the routine the delay buffer x_y containing the past input and output signals should be located in DPRAM area. This IIR filter routine processes one sample at a time and returns the output for that sample in 1Q15 format. User’s Manual for Keil Compiler : -79 Filter output in 1Q15 format V 1.2, 2003-11 DSP Library for XC16x Family IIR_1 16 bit filter coefficients, Direct form 1, Sample processing (cont’d) Pseudo code { ; ; ; ; x(n) = input signal at time n y(n) = output signal at time n a(k), b(k) = IIR filter coefficients N refer to the filter order in Equation [4.16] DataS DataS DataS a[N], b[N+1]; y[N], x[N+1]; i, temp; //Filter vectors //input and output signal vectors ;Move the new input sample into input vector in DPRAM for(i=N; i>0; i--) x(n-i-1) = x(n-i); x(n) = IN; //New input sample ;IIR filtering y(n) = 0; for(i=0 to N) y(n) = y(n)+b(i)*x(n-i); for(i=1 to N) y(n) = y(n)+a(i)*y(n-i); Y = y(n) return Y; //Filter Output returned } Techniques • Memory bandwidth conflicts removing • Instruction re-ordering Assumptions • Delay buffer must be located in DPRAM area Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 Memory Note User’s Manual for Keil Compiler -80 V 1.2, 2003-11 DSP Library for XC16x Family IIR_1 High Addr. Low Addr. High Addr. Low Addr. 16 bit filter coefficients, Direct form 1, Sample processing (cont’d) DPRAM DPRAM y(n-1) y(n) : y(n-N+1) y(n-N) x(n-1) x(n-1) x(n-2) : x(n-M+2) x(n-M+1) x(n-M) : y(n-N+2) y(n-N+1) x(n) x(n) x(n-1) : x(n-M+3) x(n-M+2) x(n-M+1) IDX0 Memory Memory a(0) a(1) : a(N-2) a(N-1) b(0) b(1) : b(M-2) b(M-1) b(M) 2Q14 IDX0 CoMACM a(0) a(1) : a(N-2) a(N-1) b(0) b(1) : b(M-2) b(M-1) b(M) 2Q14 CoMACM Before After Figure 4-24 IIR_1 Example C166Lib\Keil\Examples\Filters\IIR\ IIRform1.c Cycle Count Memory Initialization 10 Read the new sample into DPRAM 2 User’s Manual for Keil Compiler -81 V 1.2, 2003-11 DSP Library for XC16x Family IIR_1 16 bit filter coefficients, Direct form 1, Sample processing (cont’d) First IIR loop N+1 Repeat count reinitialization 2 Second IIR loop N Write the output 6 Total 2N + 21 Example: N=4 cycle = 29 Code Size Memory initialization 20 bytes Read the new sample into DPRAM 8 bytes First IIR loop 10 bytes Repeat count reinitialization 4 bytes Second IIR loop 10 bytes Write the output 22 bytes Total 74 bytes User’s Manual for Keil Compiler -82 V 1.2, 2003-11 DSP Library for XC16x Family IIR_2 16 bit filter coefficients, Direct form 2, Sample processing Signature DataS IIR_2 ( DataS* h, DataS* IN, DataS N, DataS* u) Inputs h : Pointer to filter coefficients in 2Q14 IN : Pointer to new input sample in 1Q15 N : Filter order u : Pointer to state variable vector Output : Return Y : Filter output in 1Q15 format Implementation Description The IIR filter routine is implemented based on direct form II implementation (Nth Order) showed in Figure 4-20. The filter operates on 16-bit real input, 16-bit real coefficients and returns 16-bit real output. The number of inputs is arbitrary. Pseudo code { ; ; ; ; ; x(n) = input signal at time n u(n) = state variable at time n y(n) = output signal at time n a(k), b(k) = IIR filter coefficients N = M refer to the filter order in Equation [4.16] DataS DataS DataS a[N], b[N+1]; u[N+1]; i; //Filter vectors //state variable vector ;Calculate the sate variable at time n, u(n) u(n) = x(n); for(i=1; i<N; i++) u(n) = u(n) + a(i-1)*u(n-i); ;Calculate the output at time n y(n) = 0; for(i=0; i<=N; i++) y(n) = y(n)+b(i)*u(n-i); Y = y(n) return Y; //Filter Output returned } User’s Manual for Keil Compiler -83 V 1.2, 2003-11 DSP Library for XC16x Family IIR_2 16 bit filter coefficients, Direct form 2, Sample processing (cont’d) Techniques • Memory bandwidth conflicts removing • Use of CoMAC and CoMACM instructions • Filter output converted to 16-bit with saturation Assumptions • The state variable vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -84 V 1.2, 2003-11 DSP Library for XC16x Family IIR_2 16 bit filter coefficients, Direct form 2, Sample processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM u(n-1) u(n-1) u(n-2) u(n-3) : u(n-N+2) u(n-N+1) u(n-N) DPRAM u(n) u(n) u(n-1) u(n-2) u(n-3) : u(n-N+2) u(n-N+1) IDX0 Memory Memory b(0) b(1) b(2) b(3) : b(N-1) b(N) a(0) a(1) a(2) : a(N-1) a(N) b(0) b(1) b(2) b(3) : b(N-1) b(N) a(0) a(1) a(2) : a(N-1) a(N) CoMACM CoMAC 2Q14 2Q14 Before After IDX0 Figure 4-25 IIR_2 Example C166Lib\Keil\Examples\Filters\IIR\ IIRform2.c Cycle Count Memory Initialization User’s Manual for Keil Compiler 9 -85 V 1.2, 2003-11 DSP Library for XC16x Family IIR_2 16 bit filter coefficients, Direct form 2, Sample processing (cont’d) Read the new input sample 2 First IIR loop N+4 Repeat count reinitialization 1 Second IIR loop N+1 Write the output 3 Return 1 Total 2N + 21 Example: N=4 cycle = 29 Code Size Memory initialization Read the new input sample First IIR loop Repeat count reinitialization 18 bytes 6 bytes 20 bytes 2 bytes Second IIR loop 8 bytes Write the output 12 bytes Return Total User’s Manual for Keil Compiler -86 2 bytes 68 bytes V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_1 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing Signature DataS IIR_bi_1(DataS* x, DataS* h, DataS* y, DataS* u_w, DataS N_x, DataS N_biq) Inputs x : Pointer to input vector in 1Q15 h : Pointer to filter coefficent vecotr in 1Q15 u_w : Pointer to the state variable vector N_x : Size of input vector N_biq : Number of the biquads y : Pointer to output vector in 1Q15 format Output Return Implementation : The IIR filter is implemented as a cascade of direct form 1 biquads according to Figure 4-21. If the number of biquads is 'N', the filter order is 2*N. A vector of samples is processed at a time and the output vector for those samples is returned. The filter operates on 16-bit real vector input, 16-bit real coefficients and returns 16-bit real vector output. The number of inputs is arbitrary. Length of delay line is 2*N. User’s Manual for Keil Compiler -87 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_1 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing (cont’d) Pseudo code { ; ; ; ; ; ; X[N_x], input vector xi(n) = input buffer at time n of biquad number i ui(n), wi(n) = state variables at time n of biquad number i yi(n) = output signal at time n of biquad number i ai(k), bi(k) = Filter coefficients of biquad number i N = number of biquads DataS DataS DataS DataS ai(n), bi(n); ui(n), wi(n); yi(n), Y[N_x]; i, j; //Filter vectors //state variable vector //output for(j=0;j<N_x;j++) { xi(n) = X[j]; for(i=1; i<=N_biq; i++) { ;Calculate the output at time n yi(n) = bi(0)*xi(n) + ui(n-1); ;Update the sate variable ui(n)at time n ui(n) = bi(1)*xi(n) + ai(0)*yi(n) + wi(n-1); ;Update the sate variable wi(n)at time n wi(n) = bi(2)*xi(n) + ai(1)*yi(n); ;Set i-th biquad output to (i+1)-th biquad input xi+1(n) = yi(n); } ;Output of the last biquad Y[j] = yN-1(n); } return; } User’s Manual for Keil Compiler -88 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_1 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing (cont’d) Techniques • • • • Assumptions • The state variable vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering Filter output converted to 16-bit with saturation Decided by compiler • From .asm file to .c file: Y is stored in R4 Memory Note User’s Manual for Keil Compiler -89 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_1 High Addr. Low Addr. 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing (cont’d) DPRAM wN(n-1) uN(n-1) : : w2(n-1) u2(n-1) w1(n-1) u1(n-1) DPRAM wN(n) uN(n) : : w2(n) u2(n) w1(n) u1(n) IDX0 IDX0 Memory High Addr. Low Addr. bN(2) aN(1) bN(1) aN(0) bN(0) : : : b1(2) a1(1) b1(1) a1(0) b1(0) bN(2) aN(1) bN(1) aN(0) bN(0) : : : b1(2) a1(1) b1(1) a1(0) b1(0) 1Q15 1Q15 Before After Figure 4-26 IIR_bi_1 Example C166Lib\Keil\Examples\Filters\IIR\ IIRbi_1.c Cycle Count Read parameter User’s Manual for Keil Compiler 1 -90 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_1 16 bit filter coefficients, N-cascaded real biquads, direct form 1, 5-coefs per biquad, vector processing (cont’d) Memory Initialization 11 Biquad loop N_x(17*N_biq + 7) Return 1 Total N_x(17*N_biq +7) + 13 Example: N_x = 1 N_biq = 2 cycle = 54 Code Size Read parameter 2 bytes Memory initialization 22 bytes Biquad loop 118 bytes 2 bytes 144 bytes Return Total User’s Manual for Keil Compiler -91 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_2 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing Signature DataS IIR_bi_2 ( DataS* x, DataS* h, DataS* y, DataS* u, DataS N_x, DataS N_biq ) Inputs x : Pointer to input vector in 1Q15 h : Pointer to filter coefficients in 1Q15 u : Pointer to state variable vector N_x : Size of input vector N_biq : Number of the biquads y : Pointer to filter output in 1Q15 Output Return Implementation Description : The IIR filter is implemented as a cascade of direct form 2 biquads according to Figure 4-21. If number of biquads is 'N', the filter order is 2*N. The filter operates on 16-bit real input, 16-bit real coefficients and returns 16-bit real output. The number of inputs is arbitrary. Length of delay line is 2*N. User’s Manual for Keil Compiler -92 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_2 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Pseudo code { ; ; ; ; ; ; X[N_x], input vector xi(n) = input buffer at time n of biquad number i ui(n) = state variables at time n of biquad number i yi(n) = output signal at time n of biquad number i ai(k), bi(k) = Filter coefficients of biquad number i N = number of biquads DataS DataS DataS DataS ai(n), bi(n); ui(n); yi(n), Y; i, j; //Filter vectors //state variable vector //output for(j=0; j<N_x; j++) { xi(n) = X[j]; for(i=1; i<=N_biq; i++) { ;Update the sate variable ui(n)at time n ui(n) = xi(n) + ai(0)*ui(n-1) + ai(1)*ui(n-2); ;Calculate the output at time n yi(n) = bi(0)*ui(n) + bi(1)*ui(n-1) + bi(2)*ui(n-2); ;Set i-th biquad output to (i+1)-th biquad input xi+1(n) = yi(n); } ;Output of the last biquad Y[j] = yN-1(n); } return; //Filter Output returned } Techniques • • • • Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering Filter output converted to 16-bit with saturation User’s Manual for Keil Compiler -93 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_2 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Assumptions • The state variable vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -94 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_2 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Memory Note High Addr. Low Addr. DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) IDX1 IDX0 IDX1 IDX0 Memory High Addr. Low Addr. bN(0) bN(1) bN(2) aN(0) aN(1) : : : b1(0) b1(1) b1(2) a1(0) a1(1) bN(0) bN(1) bN(2) aN(0) aN(1) : : : b1(0) b1(1) b1(2) a1(0) a1(1) 1Q15 1Q15 Before After Figure 4-27 IIR_bi_2 Example C166Lib\Keil\Examples\Filters\IIR\ IIRbi_2.c Cycle Count User’s Manual for Keil Compiler -95 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_2 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Parameter read 1 Memory Initialization 16 Biquad loop N_x(13*N_biq + 7) Return 1 Total N_x(13*N_biq + 7) + 18 Example: N_x = 1 N_biq = 2 cycle = 51 Code Size Parameter read 2 bytes Memory Initialization 32 bytes Biquad loop 54 bytes 2 bytes 90 bytes Return Total User’s Manual for Keil Compiler -96 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi2_32 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing Signature DataS IIR_bi2_32 ( DataS* x, DataSL* h, DataS* y, DataS* u, DataS N_x, DataS N_biq ) Inputs x : Pointer to input vector in 1Q15 h : Pointer to filter coefficients in 1Q31 u : Pointer to state variable vector N_x : Size of input vector N_biq : Number of the biquads y : Pointer to filter output in 1Q15 Output Return Implementation Description : The IIR filter is implemented as a cascade of direct form 2 biquads according to Figure 4-21 . If number of biquads is 'N', the filter order is 2*N. A vector of samples is processed at a time and the output vector for those samples is returned. The filter operates on 16-bit real input, 32-bit real coefficients and returns 16-bit real output. The number of inputs is arbitrary. Length of delay line is 2*N. User’s Manual for Keil Compiler -97 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi2_32 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Pseudo code { ; ; ; ; ; ; X[N_x], input vector xi(n) = input buffer at time n of biquad number i ui(n) = state variables at time n of biquad number i yi(n) = output signal at time n of biquad number i ai(k), bi(k) = Filter coefficients of biquad number i N = number of biquads DataS DataS DataS DataS ai(n), bi(n); ui(n); yi(n), Y; i, j; //Filter vectors //state variable vector //output for(j=0; j<N_x; j++) { xi(n) = X[j]; for(i=1; i<=N_biq; i++) { ;Update the sate variable ui(n)at time n ui(n) = xi(n) + ai(0)*ui(n-1) + ai(1)*ui(n-2); ;Calculate the output at time n yi(n) = bi(0)*ui(n) + bi(1)*ui(n-1) + bi(2)*ui(n-2); ;Set i-th biquad output to (i+1)-th biquad input xi+1(n) = yi(n); } ;Output of the last biquad Y[j] = yN-1(n); } return; //Filter Output returned } Techniques • • • • Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering Filter output converted to 16-bit with saturation User’s Manual for Keil Compiler -98 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi2_32 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Assumptions • The state variable vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -99 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi2_32 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Memory Note High Addr. Low Addr. DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) IDX1 IDX0 IDX1 IDX0 Memory High Addr. Low Addr. bN(0) bN(1) bN(2) aN(0) aN(1) : : : b1(0) b1(1) b1(2) a1(0) a1(1) bN(0) bN(1) bN(2) aN(0) aN(1) : : : b1(0) b1(1) b1(2) a1(0) a1(1) 1Q31 1Q31 Before After Figure 4-28 IIR_bi2_32 Example C166Lib\Keil\Examples\Filters\IIR\ IIRbi2_32.c Cycle Count User’s Manual for Keil Compiler -100 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi2_32 32 bit filter coefficients, N-cascaded real biquads, direct form 2, 5-coefs per biquad, vector processing (cont’d) Parameter read 1 Memory Initialization 18 Biquad loop N_x(24*N_biq + 7) Return 1 Total N_x(24*N_biq + 7) + 20 Example: N_x = 1 N_biq = 2 cycle = 75 Code Size Parameter read 2 bytes Memory Initialization 36 bytes Biquad loop 156 bytes 2 bytes Return Total User’s Manual for Keil Compiler 196 bytes -101 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_24 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing Signature DataS IIR_bi_24 ( DataS* x, DataS* h, DataS* y, DataS* u, DataS N_x, DataS N_biq ) Inputs x : Pointer to input vector in 1Q15 h : Pointer to filter coefficients in 1Q15 u : Pointer to state variable vector N_x : Size of input vector N_biq : Number of the biquads y : Pointer to filter output in 1Q15 Output Return Implementation Description : The IIR filter is implemented as a cascade of direct form 2 biquads according to Figure 4-21 . If number of biquads is 'N', the filter order is 2*N. A vector of samples is processed at a time and the output vector for those samples is returned. The filter operates on 16-bit real input, 16-bit real coefficients and returns 16-bit real output. The number of inputs is arbitrary. Length of delay line is 2*N. User’s Manual for Keil Compiler -102 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_24 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing (cont’d) Pseudo code { ; ; ; ; ; ; X[N_x], input vector xi(n) = input buffer at time n of biquad number i ui(n) = state variables at time n of biquad number i yi(n) = output signal at time n of biquad number i ai(k), bi(k) = Filter coefficients of biquad number i N = number of biquads DataS DataS DataS DataS ai(n), bi(n); ui(n); yi(n), Y; i, j; //Filter vectors //state variable vector //output for(j=0; j<N_x; j++) { xi(n) = X[j]; for(i=1; i<=N_biq; i++) { ;Update the sate variable ui(n)at time n ui(n) = xi(n) + ai(0)*ui(n-1) + ai(1)*ui(n-2); ;Calculate the output at time n yi(n) = bi(0)*ui(n) + bi(1)*ui(n-1) + bi(2)*ui(n-2); ;Set i-th biquad output to (i+1)-th biquad input xi+1(n) = yi(n); } ;Output of the last biquad Y[j] = yN-1(n); } return; //Filter Output returned } Techniques • • • • Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering Filter output converted to 16-bit with saturation User’s Manual for Keil Compiler -103 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_24 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing (cont’d) Assumptions • The state variable vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: Y is stored in R4 User’s Manual for Keil Compiler -104 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_24 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing (cont’d) Memory Note High Addr. Low Addr. DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) DPRAM uN(n-1) uN(n-2) : : u2(n-1) u2(n-2) u1(n-1) u1(n-2) IDX1 IDX0 IDX1 IDX0 Memory High Addr. Low Addr. bN(1) bN(2) aN(0) aN(1) : : : : : b1(1) b1(2) a1(0) a1(1) bN(1) bN(2) aN(0) aN(1) : : : : : b1(1) b1(2) a1(0) a1(1) 1Q15 1Q15 Before After Figure 4-29 IIR_bi_24 Example C166Lib\Keil\Examples\Filters\IIR\ IIRbi_24.c Cycle Count User’s Manual for Keil Compiler -105 V 1.2, 2003-11 DSP Library for XC16x Family IIR_bi_24 16 bit filter coefficients, N-cascaded real biquads, direct form 2, 4-coefs per biquad, vector processing (cont’d) Parameter read 1 Memory Initialization 16 Biquad loop N_x(13*N_biq + 5) Return 1 Total N_x(13*N_biq + 5) + 17 Example: N_x = 1 N_biq = 2 cycle = 48 Code Size Parameter read 2 bytes Memory Initialization 32 bytes Biquad loop 100 bytes 2 bytes 136 bytes Return Total User’s Manual for Keil Compiler -106 V 1.2, 2003-11 DSP Library for XC16x Family IIR_lat Lattice IIR filter, 16 bit filter coefficients, Vector processing Signature DataS Fir_lat ( DataS* x, DataS* K, DataS* y, DataS* u, DataS N_x, DataS M) Inputs x : Pointer to the input vector in 1Q15 format K : Pointer to lattice coefficients in 1Q15 format u : Pointer to state variable vector N_x : Size of input vector M Output Nummber of stages of the lattice filter y : Filter output vector in 1Q15 format Return Implementation Description The implementation uses lattice structure showed in Figure 4-23 and vector processing. The filter operates on 16bit real input vector, 16-bit real coefficients and gives 16-bit real output vector. The number of stages used in the lattice IIR filter implementation is arbitrary. Lattice coefficient K must be located in the DPRAM area. User’s Manual for Keil Compiler -107 V 1.2, 2003-11 DSP Library for XC16x Family IIR_lat Lattice IIR filter, 16 bit filter coefficients, Vector processing (cont’d) Pseudo code { short x[N_x]; short yn[M]={0,..}; short K[M]; short un[M]; short un-1[M]={0,...}; //Input vector //Output vector of different stages at time n //Lattice filter coefficient vector //State variable vector at time n //State variable vector at time n-1 short i, j; for(j=0; j<N_x; j++) { //Initialization yn[j] = x[j]; //Calculate the output and state vector for(i=1; i<M; i++) { yn[j] = yn[j] - K[i]*un-1[i-1]; //Output un[i] = K[i]*yn[j] + un-1[i-1]; //Update the state variable } } return; } Techniques Assumptions • Lattice coefficient vector must be located in DPRAM area. Register Usage • From .c file to .asm file: Decided by compiler • From .asm file to .c file: User’s Manual for Keil Compiler -108 V 1.2, 2003-11 DSP Library for XC16x Family IIR_lat Lattice IIR filter, 16 bit filter coefficients, Vector processing (cont’d) Memory Note DPRAM High Addr. Low Addr. Memory K(M-1) K(M-2) . . K(1) K(0) X IDX0 + X ui-1(n-1) ui-1(n-1) . . ui-1(n-1) ui-1(n-1) yi(n) ui(n) x(n) Figure 4-30 IIR_lat Example C166Lib\Keil\Examples\Filters\IIR\IIRLat.c Cycle Count Read parameters 1 Memory initialization 9 IIR loop N_x(11*M + 1) Return 1 User’s Manual for Keil Compiler -109 V 1.2, 2003-11 DSP Library for XC16x Family IIR_lat Lattice IIR filter, 16 bit filter coefficients, Vector processing (cont’d) Total N_x(11*M + 1) + 11 Example: N_x = 1 M=3 cycle = 45 Code Size Read parameters 2 bytes Mmeory initialization 18 bytes FIR loop 66 bytes 2 bytes Return Total User’s Manual for Keil Compiler 88 -110 bytes V 1.2, 2003-11 DSP Library for XC16x Family 4.5 Adaptive Digital Filters An adaptive filter adapts to changes in its input signals automatically. Conventional linear filters are those with fixed coefficients.These can extract signals where the signal and noise occupy fixed and separate frequency bands. Adaptive filters are useful when there is a spectral overlap between the signal and noise or if the band occupied by the noise is unknown or varies with time. In an adaptive filter, the filter characteristics are variable and they adapt to changes in signal characteristics. The coefficients of these filters vary and cannot be specified in advance. The self-adjusting nature of adaptive filters is largely used in applications like telephone echo cancelling, radar signal processing, equalization of communication channels etc. Adaptive filters with the LMS (Least Mean Square) algorithm are the most popular kind. The basic concept of an LMS adaptive filter is as follows. x(n) FIR (h(0,n), h(1,n) ... h(N-1,n) ) y(n) + d(n) e(n) LMS Algorithm Figure 4-31 Adaptive filter with LMS algorithm The filter part is an N-tap FIR filter with coefficients h(0,n), h(1,n),..., h(N-1,n), whose input signal is x(n) and output is y(n). The difference between the actual output y(n) and a desired output d(n), gives an error signal e( n) = d( n) – y(n ) User’s Manual for Keil Compiler [4.27] -111 V 1.2, 2003-11 DSP Library for XC16x Family The algorithm uses the input signal x(n) and the error signal e(n) to adjust the filter coefficients h(0,n), h(1,n),..., h(N-1,n), such that the difference, e(n) is minimized on a criterion. The LMS algorithm uses the minimum mean square error criterion min h(0,n), h(1,n),..., h(N-1,n) E(e2(n)) [4.28] Where E denotes statistical expectation. The algorithm of a regular (non-delayed) LMS adaptive filter is mathematically expressed as follows. y ( n ) = h ( 0, n ) ⋅ x ( n ) + h ( 1, n ) ⋅ x ( n – 1 ) + h ( 2, n ) ⋅ x ( n – 2 ) + … + h ( N – 1, n ) ⋅ x ( n – N + 1 ) [4.29] e( n) = d(n ) – y( n) [4.30] h ( k, n + 1 ) = h ( k, n ) + x ( n – k ) × µ × e ( n ) [4.31] where µ >0 is a constant called step-size. Note that the filter coefficients are time varying. h(i,n) denotes the value of the i-th coefficient at time n. The algorithm has three stages: 1. calculate the current filter output y(n). 2. calculate the current error value e(n) using the current expected value d(n) and currently calculated output y(n). 3. update the filter coefficients for next iteration using current error e(n) and samples x(n-k). Step-size µ controls the convergence of the filter coefficients to the optimal (or stationary) state. The larger the µ value, the faster the convergence of the adaptation. On the other hand, a large value of µ also leads to a large variation of h(i,n) (a bad accuracy) and thus a large variation of the output error (a large residual error). Therefore, the choice of µ is always a trade-off between fast convergence and high accuracy. µ must not be larger than a certain threshold. Otherwise, the LMS algorithm diverges. 4.5.1 Delayed LMS algorithm for an adaptive filter In the regular LMS adaptive filter the update of filter coefficients makes use of current error value and input value. This makes the choice of step-size µ more difficult due to the effect of µ on convergence of adaptive filter. To minimize the effect of µ on the filter convergence a delayed LMS algorithm for an adaptive filter is introduced. The algorithm of a delayed LMS adaptive filter can be represented by the following mathematical equations. User’s Manual for Keil Compiler -112 V 1.2, 2003-11 DSP Library for XC16x Family N–1 y(n) = ∑ h ( k, n ) ⋅ x ( n – k ) [4.32] k=0 h ( k, n + 1 ) = h ( k, n ) + x ( n – k – 1 ) × µ × e ( n – 1 ) e( n) = d(n ) – y( n) [4.33] [4.34] where, y(n) : output sample of the filter at index n x(n) : input sample of the filter at index n d(n) : expected output sample of the filter at index n h(0,n),h(1,n),.. : filter coefficients at index n N : filter order (number of coefficients) µ : step-size e(n) : error value at index n The algorithm has three stages: 1. calculate the current filter output y(n). 2. update filter coefficients for the next iteration using previous error value e(n-1) and the delayed input sample x(n-k-1). 3. calculate the current error value e(n) and store it in memory. 4.5.2 Descriptions In the DSP library, the delayed LMS adaptive filters are implemented. The following are the implemented delayed LMS adaptive FIR filter functions with 16 bit input and 16 bit or 32 bit filter coefficients. • 16 bit real coefficients, delayed LMS, Sample processing • 32 bit real coefficients, delayed LMS, Sample processing User’s Manual for Keil Compiler -113 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_16 Signature 16 bit real coefficients, Delayed LMS, Sample Processing DataS Adap_filter_16 ( DataS* DataS* DataS* DataS* DataS DataS DataS* h, IN, D, error, N_h, Step, d_buffer ) Inputs h : Pointer to filter coefficients in 1Q15 IN : Pointer to new input value D : Pointer to the expected signal at time n error : Pointer to error signal N_h Filter size Step : d_buffer Adaptive gain Pointer to delay buffer Return Y Implementation Description Delayed LMS algorithm has been used to realize an adaptive FIR filter. That is, the update of coefficients in the current instant is done using the error in the previous output. : Output value of the filter in 1Q15 The FIR filter is implemented using transversal structure and is realized as a tapped delay line. In this routine, both of signals and filter coefficients have 16 bit precision. This routine processes one sample at a time and returns output of that sample. The input for which the output is to be calculated is sent as an argument to the function. User’s Manual for Keil Compiler -114 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_16 16 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Pseudo code { ; ; ; ; ; ; x(n) = input signal at time n d(n) = desired signal at time n y(n) = output signal at time n h(k,n) = k-th coefficient at time n gain = adaptive gain N = number of coefficient taps in the filter DataS DataS DataS DataS DataS x(n); d(n); h(k,n); y(n), Y; k; //input signal //desired signal //filter coefficient //output ;Calculate the output at time n y(n) = 0; for(k=0; k<N_h; k++) y(n) = y(n) + h(k,n)*x(n-k); ;Update the filter coefficients for(k=0; k<N_h; k++) h(k,n+1) = h(k,n) + gain*e(n-1)*x(n-k-1); ;Error signal at time n e(n) = d(n) - y(n); Y = y(n); return Y; //Filter Output returned } Techniques • • • • Memory bandwidth conflicts removing Use of CoMACM instructions Instruction re-ordering Filter output converted to 16-bit with saturation Assumptions • Delay buffer must be located in DPRAM area. Register Usage • From .c file to .asm file: Defined by the Compiler • From .asm file to .c file: (R4) = Y User’s Manual for Keil Compiler -115 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_16 16 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-1) d(n-1) d(n-2) d(n-3) . . d(n-N_h+2) d(n-N_h+1) DPRAM d(n) d(n) d(n-1) d(n-2) . . d(n-N_h+3) d(n-N_h+2) IDX0 Memory h(0,n) h(1,n) h(2,n) h(3,n) . . h(N_h-2,n) h(N_h-1,n) IDX0 Memory h(0,n+1) h(1,n+1) h(2,n+1) h(3,n+1) . . h(N_h-2,n+1) h(N_h-1,n+1) CoMACM 1Q16 1Q16 Before After Figure 4-32 Adap_filter_16 Example C166Lib\Keil\Example\Adaptive_filter\Ad_filter_16.c Cycle Count Memory Initialization 13 Read the new input sample Calculate the current output User’s Manual for Keil Compiler 3 N_h + 6 -116 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_16 16 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Filter coefficient adaptation 7N_h + 1 Calculation of the error signal Total 3 8*N_h + 26 Example: N_h = 12 cycle = 122 Code Size Memory Initialization 26 bytes Read the new input sample 12 bytes Calculate the current output 32 bytes Filter coefficient adaptation 52 bytes Calculation of the error signal 6 bytes Total 128 bytes User’s Manual for Keil Compiler -117 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_32 32 bit real coefficients, Delayed LMS, Sample Processing Signature DataS Adap_filter_32( DataL* DataS* DataS* DataS* DataS DataS DataS* ) Inputs h : Pointer to filter coefficients in 1Q31 IN : Pointer to new input value D : Pointer to the expected signal at time n error : Pointer to error signal N_h h, IN, D, error, N_h, Step, d_buffer Filter size Step : d_buffer Adaptive gain Pointer to delay buffer Return Y Implementation Description LMS algorithm has been used to realize an adaptive FIR filter. That is, the update of coefficients in the current instant is done using the error in the previous output. : Output value of the filter The FIR filter is implemented using transversal structure and is realized as a tapped delay line. Here, the filter coefficients have 32 bit precision, while input and output signals have 16 bit precision. This routine processes one sample at a time and returns output of that sample. The input for which the output is to be calculated is sent as an argument to the function. User’s Manual for Keil Compiler -118 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_32 32 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Pseudo code { ; ; ; ; ; ; x(n) = input signal at time n d(n) = desired signal at time n y(n) = output signal at time n h(k,n) = k-th coefficient at time n gain = adaptive gain N = number of coefficient taps in the filter DataS DataS DataL DataS DataS x(n); d(n); h(k,n); y(n), Y; k; //input signal //desired signal //filter coefficient //output ;Calculate the output at time n y(n) = 0; for(k=0; k<N_h; k++) y(n) = y(n) + (short)h(k,n)*x(n-k); ;Update the filter coefficients for(k=0; k<N_h; k++) h(k,n+1) = h(k,n) + (long)gain*e(n)*x(n-k); ;Error signal at time n e(n) = d(n) - y(n); Y = y(n); return Y; //Filter Output returned } Techniques • • • • Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering Filter output converted to 16-bit with saturation Assumptions • Delay buffer must be located in DPRAM area. Register Usage • From .c file to .asm file: Defined by the compiler • From .asm file to .c file: (R4) = Y User’s Manual for Keil Compiler -119 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_32 32 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM d(n-1) d(n-1) d(n-2) d(n-3) . . d(n-N_h+2) d(n-N_h+1) Memory hH(0,n) hL(0,n) hH(1,n) hL(1,n) . . hH(N_h-1,n) hL(N_h-1,n) DPRAM d(n) d(n) d(n-1) d(n-2) . . d(n-N_h+3) d(n-N_h+2) IDX0 IDX0 Memory hH(0,n) hL(0,n) hH(1,n) hL(1,n) . . hH(N_h-1,n) hL(N_h-1,n) CoMACM 1Q16 1Q16 Before After Figure 4-33 Adap_filter_32 Example C166Lib\Keil\Example\Adaptive_filter\Ad_filter_32.c Cycle Count Memory Initialization 20 Read the new input Calculation of the current output y(n) (LSW ) User’s Manual for Keil Compiler -120 3 N_h + 3 V 1.2, 2003-11 DSP Library for XC16x Family Adap_filter_32 32 bit real coefficients, Delayed LMS, Sample Processing (cont’d) Calculation of the current output y(n) (MSW ) N_h + 3 Filter coefficient adaptation 6N_h +1 Calculation of the error signal 4 Total 8*N_h + 34 Example: N_h = 12 cycle = 130 Code Size Memory Initialization 40 bytes Read the new input 12 bytes Calculation of the current output y(n) (LSW ) 24 bytes Calculation of the current output y(n) (MSW ) 26 bytes Filter coefficient adaptation 48 bytes Calculation of the error signal 10 bytes Total 160 bytes User’s Manual for Keil Compiler -121 V 1.2, 2003-11 DSP Library for XC16x Family User’s Manual for Keil Compiler -122 V 1.2, 2003-11 DSP Library for XC16x Family 4.6 Fast Fourier Transforms Spectrum (Spectral) analysis is a very important methodology in Digital Signal Processing. Many applications have a requirement of spectrum analysis. The spectrum analysis is a process of determining the correct frequency domain representation of the sequence. The analysis gives rise to the frequency content of the sampled waveform such as bandwidth and centre frequency. One of the method of doing the spectrum analysis in Digital Signal Processing is by employing the Discrete Fourier Transform (DFT). The DFT is used to analyze, manipulate and synthesize signals in ways not possible with continuous (analog) signal processing. It is a mathematical procedure that helps in determining the harmonic, frequency content of a discrete signal sequence. The DFT is defined by N –1 ∑ x ( n )WN X(k) = nk [4.35] n=0 where WN = e – j2π ⁄ N = cos ( 2πnk ⁄ N ) – j sin ( 2πnk ⁄ N ) . [4.36] Using Equation [4.36] in Equation [4.35] we can rewrite X(k) as N–1 ∑ x ( n ) [ cos ( 2πnk ⁄ N ) – j sin ( 2πnk ⁄ N ) ] X(k) = [4.37] n=0 X(k) is the kth DFT output component for k=0,1,2,....,N-1 x(n) is the sequence of discrete sample for n=0,1,2,...,N-1 j is imaginary unit –1 N is the number of samples of the input sequence (and number of frequency points of DFT output). While the DFT is used to convert the signal from time domain to frequency domain. The complementary function for DFT is the IDFT, which is used to convert a signal from frequency to time domain. The IDFT is given by 1 x ( n ) = ---N N–1 ∑ X ( k )e j2πnk ⁄ N [exponential form] [4.38] k=0 User’s Manual for Keil Compiler -123 V 1.2, 2003-11 DSP Library for XC16x Family 1x ( n ) = --N N–1 ∑ X ( k ) [ cos ( 2πnk ⁄ N ) + j sin ( 2πnk ⁄ N ) ] [4.39] k=0 Notice the difference between DFT in Equation [4.37] and Equation [4.39] , the IDFT Kernel is the complex conjugate of the DFT and the output is scaled by N. WNnk, the Kernel of the DFT and IDFT is called the Twiddle-Factor and is given by, In exponential form, e e – j 2πnk ⁄ N j2πnk ⁄ N for DFT for IDFT In rectangular form, cos ( 2πnk ⁄ N ) – j sin ( 2πnk ⁄ N ) for DFT cos ( 2πnk ⁄ N ) + j sin ( 2πnk ⁄ N ) for IDFT While calculating DFT, a complex summation of N complex multiplications is required for each of N output samples. N2 complex multiplications and N(N-1) complex additions compute an N-point DFT. The processing time required by large number of calculation limits the usefulness of DFT. This drawback of DFT is overcome by a more efficient and fast algorithm called Fast Fourier Transform (FFT). The radix-2 FFT computes the DFT in N*log2(N) complex operations instead of N2 complex operations for that of the DFT. (where N is the transform length.) The FFT has the following preconditions to operate at a faster rate. • The radix-2 FFT works only on the sequences with lengths that are power of two. • The FFT has a certain amount of overhead that is unavoidable, called bit reversed ordering. The output is scrambled for the ordered input or the input has to be arranged in a predefined order to get output properly arranged. This makes the straight DFT better suited for short length computation than FFT. The graph shows the algorithm complexity of both on a typical processor like pentium. User’s Manual for Keil Compiler -124 V 1.2, 2003-11 DSP Library for XC16x Family Execution time (seconds) 1000 correlation DFT 100 10 1 0.1 FFT 0.01 0.001 8 16 32 64 128 256 512 Number points in DFT 1024 2048 4096 Figure 4-34 Complexity Graph The Fourier transform plays an important role in a variety of signal processing applications. Anytime, if it is more comfortable to work with a signal in the frequency domain than in the original time or space domain, we need to compute Fourier transform. FFT is an incredibly efficient algorithm for computing DFT. The main idea of FFT is to nk exploit the periodic and symmetric properties of the DFT Kernel W N . The resulting algorithm depends strongly on the transform length N. The basic Cooley-Tukey algorithm assumes that N is a power of two. Hence it is called radix-2 algorithm. Depending on how the input samples x(n) and the output data X(k) are grouped, either a decimation-in-time (DIT) or a decimation-in-frequency (DIF) algorithm is obtained. The technique used by Cooley and Tukey can also be applied to DFTs, where N is a power of r. The resulting algorithms are referred as radix-r FFT. It turns out that radix-4, radix8, and radix-16 are especially interesting. In cases where N cannot be represented in terms of powers of single number, mixed-radix algorithms must be used. For example for 28 point input, since 28 cannot be represented in terms of powers of 2 and 4 we use radix-7 and radix-4 FFT to get the frequency spectrum. In this library the basic radix-2 decimation-in-frequency FFT algorithm is implemented. User’s Manual for Keil Compiler -125 V 1.2, 2003-11 DSP Library for XC16x Family 4.6.1 Radix-2 Decimation-In-Time FFT Algorithm The decimation-in-time (DIT) FFT divides the input (time) sequence into two groups, one of even samples and the other of odd samples. N/2-point DFTs are performed on these sub-sequences and their outputs are combined to form the N-point DFT. First, x(n) the input sequence in the Equation [4.35] is divided into even and odd subsequences. N ---- – 1 2 X(k) = 2nk ∑ x ( 2n )WN 2nk ∑ x ( 2n )WN + WN 2nk WN k for k=0 to N-1 N ---- – 1 2 [4.40] 2nk ∑ x ( 2n + 1 )WN n=0 n=0 But, ( 2n + 1 )k ∑ x ( 2n + 1 )WN + n=0 n=0 N ---- – 1 2 = N ---- – 1 2 = (e ( – j2π ) ⁄ N 2nk ) = (e ( – j2π ) ⁄ ( N ⁄ 2 ) nk nk ) = WN ⁄ 2 By substituting the following in Equation [4.40] h(n)=x(2n) g(n)=x(2n+1) , Equation [4.40] becomes N⁄2–1 X(k) = ∑ nk k h ( n )W N ⁄ 2 + W N N⁄2–1 ∑ nk g ( n )W N ⁄ 2 n=0 n=0 k = H ( k ) + WN G ( k ) for k=0 to N-1 [4.41] Equation [4.41] is the radix-2 DIT FFT equation. It consists of two N/2-point DFTs H(k) and G(k) performed on the subsequences of even and odd samples of the input sequence x(n), respectively. Multiples of WN, the Twiddle-Factors are the coefficients in the FFT calculation. Further, WN k+N⁄2 = (e – j 2π ⁄ N k ) × (e – j 2π ⁄ N N ⁄ 2 User’s Manual for Keil Compiler ) k = –WN . -126 [4.42] V 1.2, 2003-11 DSP Library for XC16x Family Equation [4.41] can be expressed in two equations k X ( k ) = H ( k ) + WN G ( k ) [4.43] k X ( k + N ⁄ 2 ) = H ( k ) – WN G ( k ) for k=0 to N/2-1 [4.44] The decomposition procedure can be continued until two-point DFTs are reached. Figure 4-36 illustrates the flow graph of a real 8-point DIT FFT. Input Stage 1 Stage 3 + x(0) W0 x(4) x(1) W0 x(2) W2 W0 x(3) - x(6) W0 + x(1) W0 x(5) W0 x(4) W1 x(5) + x(3) Output x(0) + x(2) x(7) Stage 2 W0 W2 W2 W3 x(6) x(7) - Figure 4-35 8-point DIT FFT Note that the input sequence x(n) in Figure 4-35 is in the scrabbled order. The same procedure can be repeated with the linear input order. Then we have the alternate form of the DIT FFT shown in Figure 4-36 , where the output sequence X(n) is rearranged to appear in bit-reversed order. The only difference between Figure 4-35 and Figure 4-36 is a rearrangement of the butterflies and the computational load and final results are identical. In the library the implementation of real-valued forward FFT is based on Figure 4-36 . User’s Manual for Keil Compiler -127 V 1.2, 2003-11 DSP Library for XC16x Family Input Stage 1 Stage 2 Stage 3 x(0) Output + x(0) - x(4) + x(2) - x(6) + x(1) - x(5) + x(3) - x(7) W0 x(1) W0 x(2) W0 W2 x(3) W0 x(4) W0 W1 x(5) W0 W2 W0 W2 x(6) W3 x(7) Figure 4-36 Alternate Form of 8-point DIT FFT 4.6.2 Radix-2 Decimation-In-Frequency FFT Algorithm A second variant of the radix-2 FFT is the decimation-in-frequency algorithm. In order to get this algorithm, we split the input sequence into the first and second halves and write the DFT as N –1 X(k) = n=0 N⁄2–1 = ∑ n=0 N⁄2–1 = nk ∑ x ( n )WN ∑ nk x ( n )W N + N⁄2–1 ∑ ( n + N ⁄ 2 )k x ( n + N ⁄ 2 )W N n=0 k nk [ x ( n ) + ( – 1 ) x ( n + N ⁄ 2 ) ]W N , for k=0 to N. [4.45] n=0 For the even and odd numbered DFT points we get User’s Manual for Keil Compiler -128 V 1.2, 2003-11 DSP Library for XC16x Family N⁄2–1 X ( 2k ) = ∑ nk [ x ( n ) + x ( n + N ⁄ 2 ) ]W N ⁄ 2 n=0 N⁄2–1 X ( 2k + 1 ) = ∑ n [4.46] nk [ x ( n ) – x ( n + N ⁄ 2 ) ]W N W N ⁄ 2 , for k=0 to N/2. [4.47] n=0 As with the decimation-in-time algorithm, the N-point DFT is decomposed into two N/2point DFTs. Using the principle repeatedly results in an FFT algorithm where the input values appear in their natural order, but where the output values appear in bit reversed order. The complexity is the same as for the decimation-in-time FFT. Figure 4-37 shows the flow graph of an 8-point DIF FFT. The comparison of Figure 4-35 and Figure 4-37 shows that the two graphs can be viewed as transposed versions of one another. Input Stage 1 Stage 2 Stage 3 Output x(0) + x(0) - x(1) W0 x(2) W2 x(3) W0 + - x(4) x(2) W0 x(6) W0 + x(4) W1 - x(5) W2 W0 W3 W2 x(6) x(5) x(3) + - x(7) x(1) W0 W0 x(7) Figure 4-37 8-point DIF FFT Similar to FFT with decimation-in-time, if the inputs in Figure 4-37 are rearranged in bitinverse order and the outputs in linear order, we have an alternate implementation for DIF FFT showed in Figure 4-38 . User’s Manual for Keil Compiler -129 V 1.2, 2003-11 DSP Library for XC16x Family Input Stage 1 Stage 2 Stage 3 Output + x(0) x(0) W0 - x(4) x(1) W0 + x(2) x(2) W0 W2 x(3) - x(6) W0 + x(1) x(4) - x(5) W0 W3 x(5) + x(3) W0 W2 W0 x(6) W1 - x(7) W2 x(7) Figure 4-38 Alternate Form of 8-point DIF FFT 4.6.3 Complex FFT Algorithm If the input sequence is complex, according to Equation [4.43] andEquation [4.44] , we can write the complex DFT as k X ( k ) = P ( k ) + WN Q ( k ) k X ( k + N ⁄ 2 ) = P ( k ) – WN Q ( k ) for k=0 to N/2-1, [4.48] where P(k) and Q(k) are the complex even and odd partial DFTs. As same as real FFT, the complex FFT has also two implementations, i.e. decimation-in-time and decimationin-frequency complex FFT.Figure 4-39 shows an 8-point DIT complex FFT implementation. In Figure 4-39 , each pair of arrows represents a Butterfly. The whole of the complex FFT is computed by different patterns of Butterflies. These are called groups and stages. User’s Manual for Keil Compiler -130 V 1.2, 2003-11 DSP Library for XC16x Family Stage 1 Input x0 x1 x2 x3 x4 x5 x6 x7 Stage 2 Stage 3 x(0) + x(1) W0 x(2) - x(3) W0 x(4) + x(5) W0 x(6) W2 - x(7) x(8) W0 + x(9) x(10) W0 W1 - x(11) x(12) x(13) x(14) W0 W2 + W0 W2 W3 - x(15) Output x(0) x(1) x(8) x(9) x(4) x(5) x(12) x(13) x(2) x(3) x(10) x(11) x(6) x(7) x(14) x(15) x0 x4 x2 x6 x1 x5 x3 x7 Figure 4-39 8-point DIT complex FFT For 8-point DIT FFT the first stage consists of one groups of four Butterfly, second consists of two groups of two butterflies and third stage has four group of one Butterflies. Each Butterfly is represented as in Figure 4-40 . Dual node spacing Primary node Dual node Re(Pn) + Im(Pn) Re(Qn) WNk - Im(Qn) Re(Pn+1) Im(Pn+1) Re(Qn+1) Im(Qn+1) Figure 4-40 Butterfly of Radix-2 DIT complex FFT User’s Manual for Keil Compiler -131 V 1.2, 2003-11 DSP Library for XC16x Family The output is derived as follows k Pn + 1 = Pn + Qn ⋅ WN [4.49] k Qn + 1 = Pn –Qn ⋅ WN , [4.50] where n represents the number of stages. Of course, the complex FFT can also be implemented with decimation-in-frequency FFT showed in Figure 4-38 . In this case, an 8-point complex FFT has the implementation structure depicted in Figure 4-41 . Stage 1 Input x0 x4 x2 x6 x1 x5 x3 x7 x(0) x(1) x(8) x(9) x(4) x(5) x(12) x(13) x(2) x(3) x(10) x(11) x(6) x(7) x(14) x(15) Stage 2 Stage 3 Output x(0) + x(1) W0 - x(2) x(3) W0 + x(4) x(5) W2 - W0 x(6) x(7) W0 + x(8) x(9) W0 W1 - x(10) x(11) + W2 W0 W2 W0 x(12) x(13) W3 - x(14) x(15) x0 x1 x2 x3 x4 x5 x6 x7 Figure 4-41 8-point DIF complex FFT The corresponding butterfly is illustrated in Figure 4-42, in that the input-output relationship writes Pn + 1 = Pn + Qn [4.51] k Qn + 1 = ( Pn – Qn ) ⋅ WN User’s Manual for Keil Compiler [4.52] -132 V 1.2, 2003-11 DSP Library for XC16x Family Dual node spacing Primary node Re(Pn) Dual node Re(Qn) + Im(Pn) Re(Pn+1) Im(Pn+1) WNk - Im(Qn) Re(Qn+1) Im(Qn+1) Figure 4-42 Butterfly of Radix-2 DIF complex FFT 4.6.4 Calculation of Real Forward FFT from Complex FFT Having only real valued input data x(n), the computational effort of a N point real FFT can be reduced to a N/2 point complex FFT. Firstly, even indexed data h(n)=x(2n) and odd indexed data g(n)=x(2n+1) are separated. According to Equation [4.43] and Equation [4.44], The spectrum X(k) can be decomposed into the spectra H(k) and G(k) as follows: X ( k ) = H ( k ) + cos 2πk ---------- – j sin 2πk ---------- G ( k ) N N for k=0 to N/2-1. [4.53] In order to cut the N-point real FFT into N/2-point complex transformation a complex input vector y(n) = h(n)+jg(n) is formed with the index n running from 0 to N/2-1. The real part values are formed by the even indexed input data h(n). The imaginary part is formed by the odd indexed input data g(n). Then y(n) is transformed into the frequency domain resulting in a spectrum consisting of the spectra H(k) and G(k). Y ( k ) = H ( k ) + jG ( k ) = Re { Y ( k ) } + jIm { Y ( k ) } [4.54] Now the complex spectra H(k) and G(k) have to be extracted out of the complex spectrum Y(k). By employing symmetry relations, the spectra H(k) and G(k) can be derived from the spectrum Y(k) as follows: { Y ( N ⁄ 2 – k ) } + Re { Y ( k ) } Re { H ( k ) } = Re ---------------------------------------------------------------------------2 Im { Y ( k ) } – Im { Y ( N ⁄ 2 – k ) }Im { H ( k ) } = --------------------------------------------------------------------------2 { Y ( k ) } + Im { Y ( N ⁄ 2 – k ) } Re { G ( k ) } = Im ---------------------------------------------------------------------------2 Re { Y ( N ⁄ 2 – k ) } – R e { Y ( k ) }Im { G ( k ) } = ------------------------------------------------------------------------2 User’s Manual for Keil Compiler -133 [4.55] V 1.2, 2003-11 DSP Library for XC16x Family Therefore, computing an N-point real forward FFT has the following steps: 1. Generating the N/2 point complex sequence y(n)=x(2n)+jx(2n+1), 2. Computing the complex spectra Y(k) using complex FFT, 3. Extracting H(k) and G(k) from computed Y(k) according to ,Equation [4.55] 4. Calculating the first N/2+1 points of X(k) based on Equation [4.53] using H(k) and G(k), 5. Using the symmetry relation of the spectra of the real sequence to get the complete N-point X(k). 4.6.5 Calculation of Real Inverse FFT from Complex FFT Using the algorithm in Section 4.6.4 we can calculate a N-point real-valued FFT through N/2-point complex FFT. Now we present the corresponding inverse FFT algorithm to repeat the original real-valued input sequences based on the FFT spectrum. Usually a N-point real-valued forward FFT produces N-point complex spectrum. To get the original real-valued sequences one can simply input this N-point complex spectrum to an inverse FFT. But it needs N-point real inverse FFT. With the symmetry properties of real-valued FFT a N-point real inverse FFT can be reduced to N/2-point complex inverse FFT. This can be realized through two steps. In the first step the real FFT spectrum will be unpacked to the corresponding complex spectrum similar with the unpack stage in the forward FFT. Then, the results are inputted into a N/2-point inverse complex FFT to get real-valued sequences. In the unpack stage only the first (N/2+1)-point spectrum are needed because the first (N/2+1)-point spectrum of a N-point real-valued FFT contain all information. According to Equation [4.53] and the symmetry properties of H(k) and G(k), we have Re { X ( k ) } + Re { X ( N ⁄ 2 – k ) } Re { H ( k ) } = ---------------------------------------------------------------------------2 Im { X ( k ) } – Im { X ( N ⁄ 2 – k ) }Im { H ( k ) } = --------------------------------------------------------------------------2 for k=0 to N/2-1. [4.56] Defining G′ ( k ) = X ( k ) – H ( k ) , [4.57] and replacing Equation [4.56] into Equation [4.57] we get User’s Manual for Keil Compiler -134 V 1.2, 2003-11 DSP Library for XC16x Family { X ( k ) } – R e { X ( N ⁄ 2 – k ) }Re { G′ ( k ) } = Re ------------------------------------------------------------------------2 Im { X ( k ) } + Im {X(N ⁄ 2 – k)} Im { G′ ( k ) } = ---------------------------------------------------------------------------2 [4.58] Finally we have the expression of the complex spectrum Y(k) from Equation [4.54] 2πk- ⋅ Re { G′ ( k ) } – cos 2πk Re { Y ( k ) } = Re { H ( k ) } – sin ------------------ ⋅ Im { G′ ( k ) } N N 2πk 2πk Im { Y ( k ) } = Im { H ( k ) } + cos ---------- ⋅ Re { G′ ( k ) } – sin ---------- ⋅ Im { G′ ( k ) } N N [4.59] where K=0 to N/2-1. Summarily, computing an N-point real inverse FFT has the following steps: 1. Extracting H(k) according to Equation [4.56] , 2. Extracting G’(k) according to Equation [4.58], 3. Computing the complex spectrum Y(k) based on H(k) and G’(k) using Equation [4.59], 4. Calculating N/2-point inverse complex FFT with input Y(k). User’s Manual for Keil Compiler -135 V 1.2, 2003-11 DSP Library for XC16x Family 4.7 C166S V2 Core Implementation Note 4.7.1 Organization of FFT functions In the library the radix-2 FFT is implemented. Basically there are following three kinds of functions: • Bit reverse function (Bit_reverse.c) • Floating point to 1Q15 format function (FloatTo1Q15.c) • FFT kernel function The first two functions will be as subroutines called by FFT kernel function. The subroutine Bit_reverse.c is used for bit reversing the binary presentation of the input indices to get the output data indices. FloatTo1Q15.c is used to change the floating point format to 1Q15 fractional format. The kernel FFT realizes the kernel FFT algorithm that may be complex and real FFT. Butterflies are implemented in the form of macros. The kernel FFT implementation in the C166S V2 Core Library is based on an application note performed early by Siemens HL MCB AT, in which an implementation of a realvalued 1024-point radix-2 decimation-in-time forward FFT for C166 microcontroller family is described. However, there are many basic differences between the two implementations. Firstly, the FFT implementation for C166S V2 Core is not only Ccallable but also Assembler-callable, while the early implementation for C166 is only Assembler-callable. Secondly, the FFT implementation for C166S V2 Core is performed with the new DSP MAC instruction set, while the early implementation for C166 has only used the normal instruction set. Therefore, the FFT implementation for C166S V2 Core is a more optimal implementation in comparison with early implementation. Thirdly, the size of input data of the new FFT implementation for C166S V2 Core can be changed from 2 to 2048 points, while the early implementation is only suitable for 1024-point FFT. This library provides not only forward FFT but also inverse FFT implementation as well as their implementation theory basis. 4.7.2 Implementation of Real Forward FFT A real valued N point FFT can be reduced to a N/2 point complex FFT followed by an unweave phase in order to compute the N/2 point spectrum. The N/2 complex FFT consists of log2(N/2) stages and each stage calculates N/4 butterflies. The input data is stored in the vector FFT_in that consists of N 16 bit words and is defined in C main function. Since we perform an in-place FFT, the input data field will be overwritten by the output data. For providing the trigonometric functions, the precomputed sinus and cosines values are stored in memory. Because the input data and the trigonometric function values are represented by a 15 bit fixed-point fraction in two’s complement (1Q15 format), the floating-point sinus and cosines values have to be changed into 1Q15 format with the routine FloatToQ15. To rearrange the bit reversed User’s Manual for Keil Compiler -136 V 1.2, 2003-11 DSP Library for XC16x Family output of the complex FFT and to calculate the twiddle factor Wk, a bit reversal index table has been precomputed. Regarding the N/2 point complex FFT, the number of twiddle factors amounts N/2. However, due to the symmetry only the first N/4 are used. The implementation consists of log2(N/2) stages. Each stage contains basically three loops, i.e. Outloop, Mitloop and Inloop. One stage has one Outloop and N/2 Inloops, but different Mitloops due to different twiddle factors. Each Inloop implements one butterfly with a twiddle factor. For example, an 8-point DIT complex FFT in Figure 4-39 has the following structure: Stage 1 (Outloop_1): 4 butterflies with W0 Stage 2 (Outloop_2): 2 butterflies with W0 2 butterflies with W2 Stage 3 (Outloop_3): 1 butterfly with W0 1 butterfly with W1 1 butterfly with W2 1 butterfly with W3 The output of the decimation-in-time FFT shows a bit reversed order that has to be ordered to calculate the final frequency spectrum. Supposing the input data has been in a sequential order, the indices of the output data can be easily computed by bit reversing the binary presentation of the input indices. The table below gives an example of the bit reversal for an 8-point FFT. Order of input data Order of output data index binary binary index 0 000 000 0 1 001 100 4 2 010 010 2 3 011 110 6 4 100 001 1 5 101 101 5 6 110 011 3 7 111 111 7 The second part of the program unweaves the bit reversed output of the N/2 point FFT to extract the N point real valued FFT. User’s Manual for Keil Compiler -137 V 1.2, 2003-11 DSP Library for XC16x Family 4.7.3 Implementation of Real Inverse FFT From Section 4.6.5 we know that the N-point real inverse FFT can be realized by N/2 point complex inverse FFT. Different from forward FFT the unpack stage will be performed at beginning to get the N/2 point complex input spectra according to N point real FFT spectra. The input data is the first (N/2+1)-point real FFT spectra and stored in the vector with size N/2+1 that consists of N+2 16 bit words and is defined in C main function. After unpacking the N/2 point complex spectra are strode in vector FFT_out. Note that the input data should be by 1/N scaled real FFT spectra. All data has 1Q15 format. Here we use the name real inverse FFT, which doesn’t mean that the input data is real value. Actually here the input data is complex value coming from a real-valued FFT. The word real indicates that the input value come from a real-valued forward FFT instead of a complex forward FFT. The butterfly of inverse FFT has the same structure as forward FFT beside the twiddle – nk instead of factor. The twiddle factor of the inverse FFT butterfly has the form W N nk W N . Similar with forward FFT the inverse FFT can be also implemented with DIT and DIF. For example, for an 8-point real inverse FFT, if the DIF implementation in Figure 441 is used, there are following structure: Stage 1 (Outloop_1): 1 butterfly with W0 1 butterfly with W1 1 butterfly with W2 1 butterfly with W3 Stage 2 (Outloop_2): 2 butterflies with W0 2 butterflies with W2 Stage 3 (Outloop_3): 4 butterflies with W0 In this case the output shows the linear order while the input has bit-reversed order. The corresponding butterfly has the structure showed in Figure 4-42. 4.7.4 Description The following functions are described. • • • • • Bit_reverse.c FloatTo1Q15.c real_DIT_FFT.c real_DIF_IFFT.a66 Q15toFloat.c User’s Manual for Keil Compiler -138 V 1.2, 2003-11 DSP Library for XC16x Family Bit_reverse Reverse the binary bit of the input data Signature DataS Bit_reverse( DataS* X, Inputs X : 16 bit Input data vector N : Size of the vector, N=2n (n<12) Output X : Bit reversed data vector Return n : Exponent of N Implementation Description This routine is a subroutine in the DIT FFT implementation packet and used to obtain a bit reversed data in respect to the input data. The bit reversed data is also stored in the vector X. DataS N) For an 8 bit input data there is following algorithm: Before bit reverse: b7.b6.b5.b4.b3.b2.b1.b0 After bit reverse: b0.b1.b2.b3.b4.b5.b6.b7 Pseudo code { ; X = input/output data vector ; N = the size of the vector, N=2n (n<12) DataS* DataS DataS X; N; i, k, n; //input/output vector //size of vector N for(k=0; k<N; k++) { temp = 0; for(i=15; i>0; i--) temp = temp + (X(k)<<i)>>(15-i); //write the output into X X(k) = temp; } n = (DataS)log10(N); Y = n; return Y; //Filter Output returned } Techniques User’s Manual for Keil Compiler -139 V 1.2, 2003-11 DSP Library for XC16x Family Bit_reverse Reverse the binary bit of the input data (cont’d) Assumptions • 16 bit Input data vector X • N = 2n ( n < 12) Register Usage • From .c file to .asm file: Defined by the compiler • From .asm file to .c file: (R4) = n (exponent of N) Memory Note Memory Memory High Addr. b15b14b13 ...... b2b1b0 b0b1b2 ...... b13b14b15 b15b14b13 ...... b2b1b0 b0b1b2 ...... b13b14b15 b15b14b13 ...... b2b1b0 b0b1b2 ...... b13b14b15 . . . . . . X(N-1) . . . b15b14b13 ...... b2b1b0 b0b1b2 ...... b13b14b15 Low Addr. b15b14b13 ...... b2b1b0 b0b1b2 ...... b13b14b15 1Q15 1Q15 Before bitreverse After bitreverse X(0) Figure 4-43 Bit_reverse Example C166Lib\Keil\Examples\FFT\real_FFT\bit_rev.c Cycle Count Exponent determination 6 Bit reverse: User’s Manual for Keil Compiler -140 V 1.2, 2003-11 DSP Library for XC16x Family Bit_reverse Reverse the binary bit of the input data (cont’d) if N=21 3 if N=22 , N=23 10 if N=24 , N=25 12 if N=26 , N=27 14 if N=28 , N=29 10 if N=2 , N=2 16 11 18 Restore state 3 Return 1 Total Code Size Exponent determination 12 bytes Bit reverse: if N=21 6 bytes if N=22 , N=23 20 bytes if N=24 , N=25 24 bytes N=27 28 bytes if N=28 , N=29 32 bytes if if N=26 , N=210 , N=211 36 bytes Restore state 6 bytes Return 2 bytes Total User’s Manual for Keil Compiler 304 bytes -141 V 1.2, 2003-11 DSP Library for XC16x Family FloatTo1Q15 Change the floating point format to fixed point format 1Q15 Signature DataS FloatTo1Q15 ( float x) Inputs x : Output : Return : Implementation Description Input in float format Output in 1Q15 format This routine is a subroutine in the DIT FFT implementation packet and used to change the floating point data into a 1Q15 fixed point format. The floating point format has the structure: 1. word: s e e e e e e e e mmmmmmm 2. word: mmmmmmmmmmmmmmmm where s =sign, e=exponent, m=mantissa. After format change we have the 1Q15 fixed point data with the structure: s. b1 b2 b3 b4 ...... b15 sig. 2-1 2-2 2-3 2-4 ...... 2-15 For example: binary 0111 1111 1111 1111 0110 0000 0000 0000 1010 0000 0000 0000 1000 0000 0000 0000 hex 7FFF 6000 A000 8000 value +1 +0.75 - 0.75 -1 Pseudo code Assumption DPRAM begins with the address # f200h (it can be changed) Register Usage • From .c file to .asm file: Defined by the Compiler • From .asm file to .c file: The output is stored in R4. User’s Manual for Keil Compiler -142 V 1.2, 2003-11 DSP Library for XC16x Family FloatTo1Q15 Change the floating point format to fixed point format 1Q15 (cont’d) Memory Note s m15 e7 m14 ... m13 e0 m22 m2 ... ... m1 m16 m0 s b1 b2 sign 2-1 2-2 Floating point format ... b14 b15 2-14 2-15 Fixed point format Figure 4-44 FloatTo1Q15 Example C166Lib\Keil\Examples\Misc\Float_1Q15.c Cycle Count Read exponent 10 Read mantissa 7 Output 12 Return 1 Total 30 Read exponent 20 bytes Read mantissa 14 bytes Output 20 bytes Return 2 bytes Code SIze Total User’s Manual for Keil Compiler 56 bytes -143 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation Signature void real_DIT_FFT ( Inputs x DataS* DataS* DataS DataS* DataS* x, index, exp, table, X ) : index Output Bit reversed input index vector exp : Exponent of the input block size table : The precalculated trigonometric function (sinus and cosine) table X : The FFT output vector in 1Q15 format Return Implementation Description 16 bit real input vector : This function computes the N-point real forward radix-2 decimation-in-time Fast Fourier Transform on the given Npoint real input array. The detailed implementation is given in the Section 4.7.2 . The function is implemented as a complex FFT of size N/2 followed by a unpack stage to unpack the real FFT results. Normally an FFT of a real sequence of size N produces a complex sequence of size N or 2N real numbers that will not fit in the input sequence. To accommodate all the results without requiring extra memory locations, the output reflects only half of the complex spectrum plus the spectrum at the nyquist point (N/2). This still provides the full information because an FFT of a real sequence has even symmetry around the nyquist point. User’s Manual for Keil Compiler -144 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation (cont’d) Pseudo code { DataS* DataS* CplxS* CplxS DataS table; //sinus and cosine table x; //16 bit real input vector X; //FFT output vector P(n), P(n+1), Q(n), Q(n+1), Y(N/2), H(N/2), G(N/2); k,i,n; //building the complex sequences P(n) and Q(n) Q(n) = x(2n) + jx(2n+1); P(n) = x(2n+N/4) + jx(2n+N/4+1); //Outloop = 1 to exp=log2(N/2) for(k=0; k++; k<exp) { //Midloop = 1 to N/4 according to which stage for(i=0; i<Midloop; i++) { //Inloop = N/2 to 1 (number of butterflies) for(n=0; n<Inloop; n++) { P(n+1) = Re(P(n))+Re(Q(n))*cos(X)+Im(Q(n))*sin(X) +j[Im(P(n))-Im(Q(n))*cos(X)-Re(Q(n)*sin/X)]; Q(n+1) = Re(P(n))-Re(Q(n))*cos(X)-Im(Q(n))*sin(X) +j[Im(P(n))-Im(Q(n))*cos(X)-Re(Q(n)*sin/X)]; } //if all elements processed, jump out of the Midloop } } //output the first half of N/2 point complex FFT values Y, //Extracting H and G from computed Y according to Equation [4.55], //Calculating the first N/2 points of X based on Equation [4.53] using H and G } Assumption • Scale factor = 1/N (N = 2exp), preventing overflow • FFT spectra = N * X(k) Techniques • • • • Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering output converted to 16-bit with saturation User’s Manual for Keil Compiler -145 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Register Usage Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation (cont’d) • From .c file to .asm file: R8 points to input vector x R9 points to index vector (R10) = exp R11 points to the trigonometric function table R12 contains the pointer of output vector X User’s Manual for Keil Compiler -146 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation (cont’d) Memory Note Hi Memory Input-Buffer Last Stage Output-Spectrum x(N-1) Bit reversed data fetch X(N/2) x(N-2) . . X(N/2-1) Hi Memory . . Stage 1 -- Stage exp-1 x(3) X(3) x(2) X(2) x(1) X(1) x(0) X(0) Butterfly 32 bit 16 bit N point real input data in 1Q15 in linear order Sinus, cosine table The first N/2+1 point FFT spectra in linear order sinus(N-1) The N point real data is rearranged as N/2 point complex data sinus(N-2) . . sinus(3) sinus(2) sinus(1) sinus(0) 16 bit Figure 4-45 Real_DIT_FFT User’s Manual for Keil Compiler -147 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Example Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation (cont’d) C166Lib\Keil\Examples\FFT\real_FFT\ real_T_FFT.c Cycle Count Store state 3 Read parameters and memory Initialization 12 Butterfly 31 FFT kernel exp – 2 n N 6 + 2 • 29 + ---- • 36 4 n=0 ∑ (FFT_cycle) where exp=Log2N. Unpack stage 71 + (N/4-1)*44 (U_cycle) Restore state 3 Return 1 Total 19 + FFT_cycle + U_cycle Examples: N = 8 : cycle = 360 N = 16 : cycle = 876 N = 1024: cycle = 109111 Code Size Store state 6 bytes Read parameters and memory initialization 24 bytes Butterfly 96 bytes FFT kernel 88 bytes User’s Manual for Keil Compiler -148 V 1.2, 2003-11 DSP Library for XC16x Family real_DIT_FFT Real Forward Radix-2 Decimation-in-Time Fast FourierTransformation (cont’d) Unpack Stage 178 bytes Restore state 6 bytes Return 2 bytes Total User’s Manual for Keil Compiler 400 bytes -149 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation Signature void real_DIF_IFFT Inputs index ( DataS* DataS* DataS DataS* DataS* ) : exp Output Bit reversed input index vector Exponent of the input block size table : The precalculated trigonometric function (sinus and cosine) table X : Input vector, FFT spectra in 1Q15 format x : 16 bit output of inverse FFT, real sequences Return Implementation Description x, index, exp, table, X : This function computes the N-point real inverse radix-2 Fast Fourier Transform with decimation-in-frequency on the given N-point FFT spectra. The detailed implementation is given in the Section 4.7.3. The function is implemented as a unpack stage followed by a complex inverse FFT of size N/2. The unpack stage aims to unpack the N-point real FFT as N/2-point complex FFT. The N/2-point complex inverse FFT is implemented with decimation-in-frequency structure showed in Figure 4-41, nk – nk where the butterfly W N should be replaced by W N . User’s Manual for Keil Compiler -150 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation (cont’d) Pseudo code { CplxS* DataS* CplxS DataS X; //input FFT spectra x; //16 bit real output vector P(n), P(n+1), Q(n), Q(n+1), Y(N/2), H(N/2), G’(N/2); k,i,n; //extracting H(k) according to Equation [4.56] Re{H(n)} = {Re{X(n)} + Re{X(N/2-n)}}/2; Im{H(n)} = {Im{X(n)} - Im{X(N/2-n)}}/2; //extracting G’(k) according to Equation [4.58] Re{G’(n)} = {Re{X(n)} - Re{X(N/2-n)}}/2; Im{G’(n)} = {Im{X(n)} + Im{X(N/2-n)}}/2; //computing the complex spectrum Y according to Equation [4.59] Re{Y(n)} = Re{H(n)}-sin(2*pi*n/N)*Re{G’(n)}cos(2*pi*n/N)*Im{G’(k)}; Im{Y(n)} = Im{H(n)}+cos(2*pi*n/N)*Re{G’(n)}sin(2*pi*n/N)*Im{G’(k)}; //define P(n) = Y(2n); Q(n) = Y(2n+1); //N/2-point inverse complex FFT //Outloop = 1 to exp=log2(N/2) for(k=0; k++; k<exp) { //Midloop = 1 to N/4 according to which stage for(i=0; i<Midloop; i++) { //Inloop = 1 to N/4 (number of butterflies) for(n=0; n<Inloop; n++) { P(n+1) = Re{P(n)}+Re{Q(n)}+j[Im{P(n)}+Im{Q(n)}]; Re{Q(n+1)} = (Re{P(n)}-Re{Q(n)})*cos(x)(Im{Q(n)}-Im{Q(n)}*sin(X); Im[Q(n+1)} = (Im{P(n)}-Im{Q(n)})*cos(X)+ (Re(Q(n)-Re{Q(n)})*sin/X); } //if all elements processed, jump out of the Midloop } } } User’s Manual for Keil Compiler -151 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation (cont’d) Assumption • The inputs are the scaled FFT spectra with factor = 1/N (N = 2exp) • Output without scale Techniques • • • • Register Usage • From .c file to .asm file: Memory bandwidth conflicts removing Use of CoMAC instructions Instruction re-ordering output converted to 16-bit with saturation R8 pointer of output vector x R9 pointer of index vector (R10) = exp R11 pointer of trigonomic function table R12 contains the pointer of input vector X User’s Manual for Keil Compiler -152 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation (cont’d) Memory Note Hi Memory Input-Buffer Output-Buffer Stage 1 -- Stage exp-1 Output-Buffer X(N/2) Y(N/2-1) x(N-1) X(N/2-1) Y(N/2-2) x(N-2) . . . . . . X(3) Y(3) x(3) X(2) Y(2) x(2) X(1) Y(1) X(0) Y(0) 32 bit 32 bit N/2+1 point complex input data in 1Q15 in linear order x(1) Butterfly x(0) 16 bit rearranged N/2 point complex data in 1Q15 in bit-reversed order Sinus, cosine table sinus(N-1) sinus(N-2) . First Stage: . sinus(3) The N/2+1 point complex data is rearranged as N/2 point complex data sinus(2) sinus(1) sinus(0) 16 bit Figure 4-46 Real_DIF_IFFT User’s Manual for Keil Compiler -153 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Example Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation (cont’d) C166Lib\Keil\Examples\real_IFFT\ real_F_IFFT.c Cycle Count Store state 4 Read parameters and memory Initialization 7 Butterfly 28 Unpack stage 6 + 48*N/4 (U_cycle) IFFT kernel exp – 2 n N 7 + 2 • 28 + ---- • 33 4 n=0 ∑ (IFFT_cycle) where exp=Log2N. Restore state 4 Return 1 Total 16 + IFFT_cycle + U_cycle Examples: N = 8 : cycle = 338 N = 16 : cycle = 817 N = 1024: cycle = 77654 Code Size Store state 8 bytes Read parameters and memory initialization 14 bytes Butterfly 78 bytes Unpack Stage 122 bytes IFFT kernel 102 bytes User’s Manual for Keil Compiler -154 V 1.2, 2003-11 DSP Library for XC16x Family real_DIF_IFFT Real Inverse Radix-2 Decimation-in-Frequency Fast Fourier Transformation (cont’d) Restore state 8 bytes Return 2 bytes Total User’s Manual for Keil Compiler 334 bytes -155 V 1.2, 2003-11 DSP Library for XC16x Family Q15toFloat Change the Fixed point 1Q15 format to Floating point format Signature float Q15toFloat (DataS X) Inputs X : The input value in 1Q15 format Output None Return Y : The output value in floating format Implementation Description This routine is used to change the 1Q15 fixed point format into the floating point data. we have the 1Q15 fixed point data with the structure: s. b1 b2 b3 b4 ...... b15 sig. 2-1 2-2 2-3 2-4 ...... 2-15 The floating point format has the structure: 1. word: mmmmmmmmmmmmmmmm 2. word: s e e e e e e e e mmmmmmm where s =sign, e=exponent, m=mantissa. After format change Pseudo code Techniques None Assumption Register Usage • From .c file to .asm file: Defined by the Compiler • Output: R4 =Mantissa, R5 =Exponent User’s Manual for Keil Compiler -156 V 1.2, 2003-11 DSP Library for XC16x Family Q15toFloat Change the Fixed point 1Q15 format to Floating point format (cont’d) Memory Note s sign b1 2-1 b2 ... b14 b15 m15 s ... m14 e7 ... ... e0 m2 m22 m1 ... m0 R4 m16 R5 2-14 2-15 2-2 Floating point format Fixed point format Figure 4-47 Q15toFloat Example C166Lib\Keil\Examples\Misc\Q15_Float.c Cycle Count Check if X=0 or X=1 4 Get sign bit 6 Initialise Exponent and Mantissa 3 Check Most significant bit 1+3N Compute output 8 floating Write Output 6 Return 1 Total 28+3N Example: N=3 Cycle =37 Code Size Check if X=0 or X=1 12 bytes Get sign bit 12 bytes User’s Manual for Keil Compiler -157 V 1.2, 2003-11 DSP Library for XC16x Family Q15toFloat Change the Fixed point 1Q15 format to Floating point format (cont’d) Initialise Exponent and Mantissa 6 bytes Check Most significant bit 12 bytes Compute output 16 bytes floating Write Output 12 bytes Return 2 bytes Total User’s Manual for Keil Compiler 72 bytes -158 V 1.2, 2003-11 DSP Library for XC16x Family 4.8 Matrix Operations A matrix is a rectangular array of numbers (or functions) enclosed in brackets. These numbers (or functions) are called entries or elements of the matrix.The number of entries in the matrix is product of number of rows and columns. An m × n matrix means matrix with m rows and n columns. In the double-subscript notation for the entries, the first subscript always denotes the row and the second the column. 4.8.1 Descriptions The following Matrix Operations are described. • Multiplication • Transpose User’s Manual for Keil Compiler -159 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_mul Signature [MxN][NxP] Matrix Multiplication void Matrix( DataS* DataS* DataS* DataS DataS DataS DataS x_1, x_2, y, row1, col1, row2, col2 ) Inputs Output x_1 : Pointer to the first matrix in 1Q15 format x_2 : Pointer to the second matrix in 1Q15 format row1 : M, the number of rows in the first matrix col1, row2 : N, the number of columns in the first matrix, the number of rows in the second matrix col2 : P, M, the number of rows in the second matrix y : Pointer to the output in 1Q15 format Return Implementation Description The multiplication of two matrices A and B is done. The multiplication results will be stored in the [MxP] matrix y. Both the input matrices and output matrix are 16-bit. All the element of the matrix are stored row-by-row in the buffer. y 11 y 12 …y 1P a 11 a 12 …a 1N y 21 y 22 …y 2P a 21 a 22 …a 2N ′ ′ ′ ′ y M1 y M2 User’s Manual for Keil Compiler ′ ′ …y MP = -160 ′ ′ ′ ′ a M1 a M2 ′ ′ …a MN b 11 b 12 …b 1P × b 21 b 22 …b 2P ′ ′ ′ ′ b N1 b N2 ′ ′ …b NP V 1.2, 2003-11 DSP Library for XC16x Family Matrix_mul [MxN][NxP] Matrix Multiplication (cont’d) Pseudo code { DataS DataS DataS DataS x_1[M*N]; x_2[N*P]; y[M*P]; i,j; //first input matrix //second input matrix //output matrix for (i=0 to M-1) for (k=0 to P-1) { y(i,k) = 0; for (j=0 to N-1) y(i,k) = y(i,k) + x_1(i,j)*x_2(j,k); } } Assumption Techniques • Memory bandwidth conflicts removing • Use of CoMAC instructions • Instruction re-ordering Register Usage • From .c file to .asm file: Decided by the compiler User’s Manual for Keil Compiler -161 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_mul [MxN][NxP] Matrix Multiplication (cont’d) Memory Note Input Matrix B Input Matrix A High Addr. Low Addr. a[M-1,N-1] b[N-1,P-1] . b[N-1,P-2] a[1,1] . a[1,0] . a[0,N-1] . . b[0,2] a[0,1] b[0,1] b[0,0] a[0,0] DPRAM MAC High Addr. Low Addr. Memory Output Matrix Y High Addr. y[M-1,P-1] y[M-1,P-2] . . . y[0,2] y[0,1] Low Addr. y[0,0] Memory Figure 4-48 Matrix_mul Example C166Lib\Keil\Examples\Matrix\Matrixmul.c Cycle Count User’s Manual for Keil Compiler -162 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_mul [MxN][NxP] Matrix Multiplication (cont’d) Read parameters and memory Initialization 13 Matrix loop M(4+P(6+3N)) Return 1 Total 14 + M(4 + P(6 + 3N)) Example: M = 4, N = 3, P = 5 cycle = 330 Code Size Read parameters and memory initialization 26 bytes Matrix loop 40 bytes Return 2 bytes Total User’s Manual for Keil Compiler 68 bytes -163 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_trans [MxN] Matrix Transpose Signature void Matrix_trans( DataS* DataS* DataS DataS ) Inputs x : Pointer to the input matrix in 1Q15 format row : M, the number of rows in the matrix col : N, the number of columns in the matrix y : Pointer to the output in 1Q15 format Output x, y, row, col Return Implementation Description This function performs Transpose of the given matrix.It takes pointer to input matrix,size of row and size of column as input. multiplication of two matrices A and B is done. The Transpose results will be stored in the [MxN] matrix y. Both the input matrices and output matrix are 16-bit. All the element of the input matrix are stored row-by-row in the buffer. x 11 x 21 …x M1 x 11 x 12 …x 1N x 12 x 22 …x M2 ′ ′ ′ ′ x 1N x 2N User’s Manual for Keil Compiler x 21 x 22 …x 2N = ′ ′ ′ ′ ′ ′ ′ ′ …x MN x M1 x M2 …x MN -164 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_trans [MxN] Matrix Transpose (cont’d) Pseudo code { DataS* DataS* DataS DataS x; y; M,N; i,k; //input matrix //output matrix //Number of rows,columns for (i=0 to M-1) for (k=0 to N-1) { y(k,i) =x(i,k); } } Assumption Techniques • Memory bandwidth conflicts removing • Instruction re-ordering Register Usage • From .c file to .asm file: Decided by the compiler User’s Manual for Keil Compiler -165 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_trans [MxN] Matrix Transpose (cont’d) Memory Note High Addr. Low Addr. Input Matrix x Output Matrix y x[M-1,N-1] y[N-1,P-1] . y[N-1,P-2] x[1,1] . x[1,0] y[1,0] x[0,N-1] . . y[0,2] x[0,1] y[0,1] x[0,0] y[0,0] Memory High Addr. Low Addr. Memory MOV Figure 4-49 Matrix_trans Example C166Lib\Keil\Examples\Matrix\Matrixtrans.c Cycle Count Memory Initialization 10 Matrix loop M(2N+4) Return 1 Total 11+M(2N+4)) Example: M = 4, N = 3 cycle = 51 Code Size User’s Manual for Keil Compiler -166 V 1.2, 2003-11 DSP Library for XC16x Family Matrix_trans [MxN] Matrix Transpose (cont’d) Memory initialization 20 bytes Matrix loop 22 bytes Return 2 bytes Total User’s Manual for Keil Compiler 44 bytes -167 V 1.2, 2003-11 DSP Library for XC16x Family 4.9 Mathematical Operations Here we provide some assembly routines of basic mathematical operations which can be used in many DSP algorithms, such as speech codecs and spectrum analysis. The following routines are described. • • • • • • Sine N-th order power series Windowing with N coefficients div_q15 div_q31 Sqrt User’s Manual for Keil Compiler -168 V 1.2, 2003-11 DSP Library for XC16x Family Sine Sine function Signature DataS Sine( DataS* x) Inputs x : Output By pi normalized input value between [-1,1] in 1Q15 format, x=xrad/pi, where xrad contains the angle in radians [-p1,p1]. : Return y Implementation Description For the input value in 1th quadrant ( 0, π ⁄ 2 ) the Sin(x) can be approximated using the polynomial expansion. : output in 1Q15 format sin ( x ) = 3.140625 ( x ) + 0.02026367 ( x ) 3 2 – 5.325196 ( x ) + 0.5446778 ( x ) + 1.800293 ( x ) 4 [4.60] 5 · 0 ≤ x ≤ 05 . The values of Sine function in other quadrants are computed by using the relations, sin ( – x ) = – sin ( x ) and sin ( 180 – x ) = sin x . The function takes 16 bit input in 1Q15 format to accommodate the range ( – 1, 1 ) . The output is 16 bits in 1Q15 format. Coefficients are stored in 4Q12 format. The absolute value of the input is calculated. If the input is negative (III/IV Quadrant), then sign=1. If absolute value of the input is greater than 1/2 (II/III Quadrant), it is subtracted from 1. If sign=1, the result is negated to give the final sine result. To have an optimal implementation with zero overhead load store, the polynomial in Equation [4.60] is rearranged as below. User’s Manual for Keil Compiler -169 V 1.2, 2003-11 DSP Library for XC16x Family Sine Sine function (cont’d) sin ( x ) = ( ( ( ( 1.800293 x + 0.5446778 )x – 5.325196 )x + 0.02026367 )x + 3.140625 )x [4.61] Hence, 4 multiply-accumulate and 1 multiply instruction will compute the Equation [4.61]. Pseudo code { DataS DataS DataS DataS DataS a[5]; x; y; i; sign; //coefficient vector //input value //output value //sign of input //determine the sign of input sign = 0; if(x<0) sign = 1 //if x in III/IV quadrant if(abs(x)>0.5) x = 1-abs(x); //polynomial y = a(0)*x + a(1); y = y*x + a(2); y = y*x + a(3); y = y*x + a(4); y = y*x; //x in I/II quandrant if(sign == 0) y = y; //x in III/IV quandrant else if(sign == 1) y = -y; } Assumption User’s Manual for Keil Compiler -170 V 1.2, 2003-11 DSP Library for XC16x Family Sine Sine function (cont’d) Techniques • Use of CoMAC instructions • Instruction re-ordering Register Usage • Defined by the compiler • From .asm file to.c file y is stored in R4 Memory Note Data Memory High Addr. x Input a(0) (R4)=y Output 4Q12 1Q15 a(4) a(3) a(2) a(1) Low Addr. . . . . . . 5 ∑ a(i − 1) x i i =1 Figure 4-50 Sine Example C166Lib\Keil\Examples\Math\ Sinus.c Cycle Count Memory Initialization 1 Polynomial 26 User’s Manual for Keil Compiler -171 V 1.2, 2003-11 DSP Library for XC16x Family Sine Sine function (cont’d) Return 1 Total 28 Code Size Memory initialization 2 bytes Polynomial 88 bytes Return 2 bytes Total User’s Manual for Keil Compiler 92 bytes -172 V 1.2, 2003-11 DSP Library for XC16x Family P_series N-th order power series Signature DataS P_series( DataS* a, DataS* IN, DataS N) Inputs a : Pointer to the coefficient vector in 1Q15 format IN : Input value N : Size of series that must be even number Y : Series output in 1Q15 format Output Return Implementation Description The routine is implemented with CoMUL and CoADD instructions. To optimize the routine, the implementation uses "loop unrolling" technique and assumes that N is even. The implementation formula is: N y = ∑ a(i) ⋅ x i i=0 = [ [ [ [ a ( N ) ⋅ x + a ( N – 1 ) ]x + a ( N – 2 ) ]x + a ( N – 3 ) ]x + … ] Pseudo code { DataS DataS DataS DataS a[N]; X; Y; i; //coefficient vector //input value //output value Y = a(N); for(i=0; i<N; i++) Y = Y*X + a(N-1); } Assumption • N = even number Techniques • Use of CoMAC instructions • Loop unrolling • Instruction re-ordering User’s Manual for Keil Compiler -173 V 1.2, 2003-11 DSP Library for XC16x Family P_series N-th order power series (cont’d) Register Usage • From .c file to .asm file: Defined by the compiler • From .asm file to.c file Y is stored in R4 Memory Note M em ory High Addr. IN Input a(0) (R4)=y O utput 1Q 15 1Q 15 a(N) a(N-1) a(2) a(1) Low Addr. . . . . . . N ∑ a (i ) x i i=0 Figure 4-51 P_series Example C166Lib\Keil\Examples\Math\ Power_Serie.c Cycle Count Memory Initialization 13 Series loop 7N/2 Return 1 User’s Manual for Keil Compiler -174 V 1.2, 2003-11 DSP Library for XC16x Family P_series N-th order power series (cont’d) Total 14 + 7N/2 Example: N = 10 cycle = 49 Code Size Memory initialization 26 bytes Series loop 28 bytes Return 2 bytes Total User’s Manual for Keil Compiler 56 bytes -175 V 1.2, 2003-11 DSP Library for XC16x Family Windowing Windowing with N coefficients Signature void Windowing( DataS* DataS* DataS* DataS h, x, y, N Inputs h : Pointer to the coefficient vector in 1Q15 format x : Pointer to the input vector in 1Q15 format N : The length of the window y : Output vector in 1Q15 format ) Output Return : Implementation Description To optimize the routine, the implementation uses "loop unrolling" technique and assumes that N is even number. The implementation formula is: y(i ) = x( i) ⋅ w( i), for i = 0, 1, …N – 1 Pseudo code { DataS DataS DataS DataS w[N]; x[N]; y[N]; i; //coefficient vector //input value //output value for(i=0; i<N; i++) y(i) = x(i)*w(i); } Assumption • N should be even number. Techniques • Use of CoMUL instructions • Loop unrolling with the factor 2 Register Usage • From .c file to .asm file: Defined by the compiler User’s Manual for Keil Compiler -176 V 1.2, 2003-11 DSP Library for XC16x Family Windowing Windowing with N coefficients (cont’d) Memory Note High Addr. Low Addr. Input vector X Coefficient vector h x(n) h(n) x(n-1) h(n-1) . . . . . . x(n-N+3) h(n-N+3) x(n-N+2) h(n-N+2) x(n-N+1) CoMUL Memory h(n-N+1) Memory Output Matrix Y High Addr. y(n) y(n-1) . . . y(n-N+3) y(n-N+2) Low Addr. y(n-N+1) Memory Figure 4-52 Windowing Example C166Lib\Keil\Examples\Math\ Window.c User’s Manual for Keil Compiler -177 V 1.2, 2003-11 DSP Library for XC16x Family Windowing Windowing with N coefficients (cont’d) Cycle Count Memory Initialization 4 Loop 7N/2 Return 1 Total 5 + 7N/2 Example: N = 128 cycle = 453 Code Size Memory initialization 8 Loop 24 bytes Return 2 Total 34 bytes User’s Manual for Keil Compiler -178 bytes bytes V 1.2, 2003-11 DSP Library for XC16x Family div_q15 Division of two 1Q15 fractional inputs Signature DataS div_q15( DataS DataS ) Inputs x : Dividend in 1Q15 format y : Divisor in 1Q15 format Output x, y : Return r Implementation Description This function performs the division of two fractional inputs. The dividend and the divisor are in 1Q15 format.The result is saturated to +1 or -1 if |x|>|y|. : Result in 1Q15 format x--- = x---------------× 2 31- × 2 –15 y y × 2 15 Pseudo code Assumption |x| should be less than |y| Techniques Register Usage • From .c file to .asm file: Defined by the compiler • Output in R4 User’s Manual for Keil Compiler -179 V 1.2, 2003-11 DSP Library for XC16x Family div_q15 Division of two 1Q15 fractional inputs (cont’d) Memory Note Dividend x15 x14 ... x1 x0 Divisor y15 y14 ... y1 y0 DIVL r15 r14 ... r1 r0 R4 Output Input Figure 4-53 Division of two 1Q15 fractional inputs Example C166Lib\Keil\Examples\Math\ div_16.c Cycle Count Compute values absolute Compute signs 6 8 Division 11 Return 1 Total 26 Code Size Compute values absolute 20 bytes Compute signs 24 bytes Division 18 bytes Return Total User’s Manual for Keil Compiler -180 2 bytes 64 bytes V 1.2, 2003-11 DSP Library for XC16x Family div_q31 Division of 1Q31 fractional by 1Q15 fractional Signature DataS div_q31( DataL x, DataS y ) Inputs x : Dividend in 1Q31 format y : Divisor in 1Q15 format Return r : Result in 1Q15 format Implementation Description This function performs the division of two fractional inputs. The dividend is in1Q31 format and the divisor is in 1Q15 format.The result is saturated to +1 or -1 if |x|>|y|,which is in 1Q15 format. Output x--- = x---------------× 2 31y y × 2 15 Pseudo code Assumption • |x|should be less than |y| Techniques • Use of CoMUL instructions Register Usage • From .c file to .asm file: Defined by the compiler • Output in R4 User’s Manual for Keil Compiler -181 V 1.2, 2003-11 DSP Library for XC16x Family div_q31 Division of 1Q31 fractional by 1Q15 fractional (cont’d) Memory Note Dividend x31 x30 ... x17 x16 Dividend x15 x14 ... x1 x0 ... Input y1 y0 Divisor y15 y14 DIVL r15 r14 ... r1 r0 R4 Output Figure 4-54 Division of 1Q31 fractional by 1Q15 fractional Example C166Lib\Keil\Examples\Math\div_32.c Cycle Count Compute values absolute Compute signs 8 8 Division 11 Return 1 Total 28 Code Size Compute values 22 bytes Compute signs 24 bytes Division 18 bytes User’s Manual for Keil Compiler absolute -182 V 1.2, 2003-11 DSP Library for XC16x Family div_q31 Division of 1Q31 fractional by 1Q15 fractional (cont’d) Return Total User’s Manual for Keil Compiler -183 2 bytes 60 bytes V 1.2, 2003-11 DSP Library for XC16x Family Sqrt Square root Signature DataS Sqrt( DataS x Inputs x : Input value in 1Q15 format in the range [0,1) Output y : Output value in 1Q15 format ) Return Implementation Description : The square root of the input value x can be calculated by using the following polynomial expansion Sqrt ( x ) = 1.454895 ( x ) – 1.34491 ( x ) 2 + 1.106812 ( x ) 3 – 0.536499 ( x ) 4 + 0.1121216 ( x ) 5 + 0.2075806 where,0.5<=x<=1 The coefficents of polynomial are stored in 2Q14 format.The square root table(table of scale factors)stores 1/sqrt(2^n) in 1Q15 format where n ranges from 0 to15. As the polynomial expansion needs input only in the range 0.5 to 1,the given input has to be scaled up.If the input value is less than 0.5,then it is scaled up by the powers of two,so that the scaled input value lies in the range 0.5 to 1.This scaled input is used for polynomial calculation.The calculated output is scaled down by 1/sqrt(2^n) to get the actual output.The obtained result is in 1Q15 format. User’s Manual for Keil Compiler -184 V 1.2, 2003-11 DSP Library for XC16x Family Sqrt Square root (cont’d) Pseudo code { DataS DataS DataS DataS x; //input value C; // Coeffcient y; //output value *scale factor; //scaling if(x>=0.5 && x<1) { y=((((c^5*x+c^4)x+c^3)x+c^2)x+c)x; //compute Square root y=y+0.2075806; //compute Square root } if(x>=0 && x<0.5) { x=x*scale factor; y=((((c^5*x+c^4)x+c^3)x+c^2)x+c)x; //compute Square root y=y+0.2075806; y=y/scale factor; } Assumption • Input is positive Techniques • Use of CoMUL instructions Register Usage • From .c file to .asm file: Defined by the compiler • From .asm file to .c file R4 = y Memory Note Example C166Lib\Keil\Examples\Math\ Squart.c Cycle Count Register Initialization 1 checking if x<=0 3 checking if x>0.5 3N compute root 17 Multiplication scalar User’s Manual for Keil Compiler square with -185 5 V 1.2, 2003-11 DSP Library for XC16x Family Sqrt Square root (cont’d) Return 1 Total 27+3N Example: N=4 cycle=39 Code Size Register Initialization 2 bytes checking if x<0 8 bytes checking if x>0.5 12 bytes compute root square 66 bytes with 14 bytes 2 bytes Multiplication scalar Return Total User’s Manual for Keil Compiler 104 bytes -186 V 1.2, 2003-11 DSP Library for XC16x Family 4.10 Statistical Functions The following statistical functions are described. • Correlation Cross-correlation Autocorrelation • Convolution 4.10.1 Correlation 4.10.1.1 Definitions of Correlation Correlation determines the degree of similarity between two signals. If two signals are identical their correlation coefficient is 1, and if they are completely different it is 0. If they are identical by 180 phase shift between them, then the correlation coefficient is -1. There are two types of correlation, Cross-Correlation and Autocorrelation. When two independent signals are compared, the procedure is cross-correlation. When the same signal is compared to phase shifted copies of itself, the procedure is autocorrelation. Autocorrelation is used to extract the fundamental frequency of a signal. The distance between correlation peaks is the fundamental period of the signal. Suppose that N1 and N2 represent the size of input signals x1 and x2, respectively, and N=N1+N2 and N1 ≥ N2 . Extending x1 to N-point vector through adding N2-points of zero at the beginning, and x2 to N-point vector through adding N1-points of zero in the end, we can define the discrete cross-correlation as follows. Raw Cross-correlation: N1 + j – 1 r(j) = ∑ x1 ( i )x2 ( i + j ) – N 2 + 1 ≤ j ≤ N1 – 1 , [4.62] i=0 Biased Cross-correlation: 1- × r ( j ) = -----N1 N1 + j – 1 ∑ x1 ( i )x2 ( i + j ) i=0 , – N 2 + 1 ≤ j ≤ N1 – 1 [4.63] Unbiased Cross-correlation: 1 r ( j ) = ------------------------- × N – abs ( j ) N1 + j – 1 ∑ x1 ( i )x2 ( i + j ) , i=0 User’s Manual for Keil Compiler -187 – N 2 + 1 ≤ j ≤ N1 – 1 [4.64] V 1.2, 2003-11 DSP Library for XC16x Family The above definitions contain the full-length cross-correlation of the real input signals x1 and x2 with N points output, which consists of N2 points of the negative-side and N1 points of the positive-side. If the input vectors x1 and x2 are same and equal to x with the size of N, we have the following definitions of the positive-side of the autocorrelation. Raw Autocorrelation: N–j–1 r( j) = ∑ x ( i )x ( i + j ) for j = 0 to Nr ≤ N – 1 [4.65] for j = 0 to Nr ≤ N – 1 [4.66] for j = 0 to Nr ≤ N – 1 , [4.67] i=0 Biased Autocorrelation: 1- × r ( j ) = --N N–j–1 ∑ x ( i )x ( i + j ) i=0 Unbiased Autocorrelation: 1 -× r ( j ) = ----------N–j N–j–1 ∑ x ( i )x ( i + j ) i=0 where j is the lag value, as it indicates the shift/lag considered for the r(j) autocorrelation. Nr is the correlation length and it determines how much data is used for each autocorrelation result. Note that the full-length autocorrelation of vector x will have 2N-1 points with even symmetry around the lag 0 point r(0). The above definitions define only the positive half for memory and computational savings. 4.10.1.2 Implementation Note Directly implementing the cross-correlation according to definitions in Equation [4.62] , Equation [4.63] and Equation [4.64] has difficulty due to the negative index. To make the algorithms realizable in assembly we need to rewrite the definitions. Raw Cross-correlation: User’s Manual for Keil Compiler -188 V 1.2, 2003-11 DSP Library for XC16x Family The negative-side can be rewriten as with positive index j r(j) = ∑ x1 ( i )x2 ( N2 – j – 1 + i ) , 0 ≤ j ≤ N2 – 1 [4.68] , 0 ≤ j ≤ N1 – N2 – 1 [4.69] i=0 and the positive-side as N2 – 1 r ( j + N2 ) = ∑ x1 ( i + j )x2 ( i ) i=0 N1 – j – 1 ∑ r ( j + N2 ) = x1 ( i )x2 ( N2 – j – 1 + i ) N1 – N2 < j ≤ N1 – 1 , . [4.70] i=0 Biased Cross-correlation: The negative-side: 1 r ( j ) = ------- × N1 j ∑ x1 ( i )x2 ( N2 – j – 1 + i ) 0 ≤ j ≤ N2 – 1 , i=0 [4.71] The positive-side: 1- × r ( j + N2 ) = -----N1 1- × r ( j + N2 ) = -----N1 N2 – 1 ∑ x1 ( i + j )x2 ( i ) i=0 , 0 ≤ j ≤ N1 – N2 – 1 N1 – j – 1 ∑ x1 ( i )x2 ( N2 – j – 1 + i ) , [4.72] N1 – N2 < j ≤ N1 – 1 [4.73] i=0 Unbiased Cross-correlation: The negative-side: 1 r ( j ) = ------------------------ × abs ( j + 1 ) j ∑ x1 ( i )x2 ( N2 – j – 1 + i ) i=0 User’s Manual for Keil Compiler -189 , 0 ≤ j ≤ N2 – 1 [4.74] V 1.2, 2003-11 DSP Library for XC16x Family The positive-side: 1 r ( j + N2 ) = ------- × N2 N2 – 1 ∑ x1 ( i + j )x2 ( i ) i=0 1 r ( j + N2 ) = ----------------------------× N1 – abs ( j ) 4.10.2 , 0 ≤ j ≤ N1 – N2 – 1 [4.75] N1 – j – 1 ∑ x1 ( i )x2 ( N2 – j – 1 + i ) , N1 – N2 < j ≤ N1 – 1 [4.76] i=0 Implementation Description The following routines are described. • • • • • • Raw autocorrelation Biased autocorrelation Unbiased autocorrelation Raw cross-correlation Biased cross-correlation Unbiased cross-correlation User’s Manual for Keil Compiler -190 V 1.2, 2003-11 DSP Library for XC16x Family Auto_raw Raw autocorrelation Signature void Auto_raw( DataS* DataS* DataS DataS x, y, N_x, N_y Inputs x : Pointer to input vector in 1Q15 format N_x : The length of input vector N_y (Nr) : The length of output vector y : Pointer to output vector in 1Q15 format ) Output Return : Implementation Description The function performs the positive side of the raw autocorrelation function of real vector x according to the definition in Equation [4.65] . The arguments to the function are pointer to the input vector, pointer to output buffer to store autocorrelation result, size of input buffer and number of auto correlated outputs desired. The input values and output values are both in 16 bit fractional format. Pseudo code { DataS DataS DataS DataS DataS *x; *y; N_x; N_y; i,j; //Ptr to input vector //Ptr to output vector //size of input //size of autocorrelation result for(j=0; j<N_y; j++) { y[j] = 0; for(i=0; i<N_x-j; i++) y[j] = y[j] + x[i+j]*x[i]; } } Assumption • N_y<=N_x Techniques • Use of CoMUL instructions User’s Manual for Keil Compiler -191 V 1.2, 2003-11 DSP Library for XC16x Family Auto_raw Register Usage Raw autocorrelation (cont’d) • From .c file to .asm file: Defined by the compiler Memory Note High Addr. Low Addr. Input vector x(i) shifted input vector x(i+j) x(Nx-1) x(Nx-j-1) x(Nx-2) . . . . x(1) . x(0) x(2) . x(1) . x(0) 0 CoMAC Memory Memory Output y High Addr. y(Ny-1) y(Ny-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-55 Raw autocorrelation User’s Manual for Keil Compiler -192 V 1.2, 2003-11 DSP Library for XC16x Family Auto_raw Example Raw autocorrelation (cont’d) C166Lib\Keil\Examples\Static_funcs\Auto_correlation\Autoc.c Cycle Count Memory Initialization 4 Loop_cyc Ny – 1 2⋅ ∑ ( Nx – j + 8 ) j=0 Return 1 Total 5 + loop_cyc Example: Nx = 10, Ny = 3 cycle = 107 Code Size Memory Initialization Loop_cyc Return Total User’s Manual for Keil Compiler 8 46 bytes 2 bytes 56 -193 bytes bytes V 1.2, 2003-11 DSP Library for XC16x Family Auto_bias Biased autocorrelation Signature void Auto_bias( DataS* DataS* DataS DataS ) Inputs x : Pointer to input vector in 1Q15 format N_x : The length of input vector N_y (Nr) : The length of output vector y : Pointer to output vector in 1Q15 format Output Return x, y, N_x, N_y : Implementation Description The function performs the positive side of the biased autocorrelation function of real vector x according to the definition in Equation [4.66]. The arguments to the function are pointer to the input vector, pointer to output buffer to store autocorrelation result, size of input buffer and number of auto correlated outputs desired. The input values and output values are both in 16 bit fractional format. Pseudo code { DataS DataS DataS DataS DataS *x; *y; N_x; N_y; i,j; //Ptr to input vector //Ptr to output vector //size of input //size of autocorrelation result for(j=0; j<N_y; j++) { y[j] = 0; for(i=0; i<N_x-j; i++) y[j] = (y[j] + x[i+j]*x[i])/N_x; } } Assumption • N_y<=N_x Techniques • Use of CoMUL instructions User’s Manual for Keil Compiler -194 V 1.2, 2003-11 DSP Library for XC16x Family Auto_bias Register Usage Biased autocorrelation (cont’d) • From .c file to .asm file: Defined by the compiler Memory Note High Addr. Input vector x(i) shifted input vector x(i+j) x(Nx-1) x(Nx-j-1) x(Nx-2) . Low Addr. . . . x(1) . x(0) x(2) . x(1) . x(0) 0 CoMAC Memory Memory Output y High Addr. y(Ny-1) y(Ny-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-56 Biased autocorrelation User’s Manual for Keil Compiler -195 V 1.2, 2003-11 DSP Library for XC16x Family Auto_bias Example Biased autocorrelation (cont’d) C166Lib\Keil\Examples\Static_funcs\ Auto_correlation\Autoc.c Cycle Count Memory Initialization 4 Loop_cyc Ny – 1 2⋅ ∑ ( Nx – j + 17 ) j=0 Return 1 Total 5 + loop_cyc Example: Nx = 10, Ny = 3 cycle = 161 Code Size Memory Initialization Loop_cyc Return Total User’s Manual for Keil Compiler -196 8 bytes 54 bytes 2 bytes 64 bytes V 1.2, 2003-11 DSP Library for XC16x Family Auto_unbias Unbiased autocorrelation Signature void Auto_unbias( DataS* DataS* DataS DataS ) Inputs x : Pointer to input vector in 1Q15 format N_x : The length of input vector N_y (Nr) : The length of output vector y : Pointer to output vector in 1Q15 format Output Return x, y, N_x, N_y : Implementation Description The function performs the positive side of the unbiased autocorrelation function of real vector x according to the definition in Equation [4.67]. The arguments to the function are pointer to the input vector, pointer to output buffer to store autocorrelation result, size of input buffer and number of auto correlated outputs desired. The input values and output values are both in 16 bit fractional format. Pseudo code { DataS DataS DataS DataS DataS *x; *y; N_x; N_y; i,j; //Ptr to input vector //Ptr to output vector //size of input //size of autocorrelation result for(j=0; j<N_y; j++) { y[j] = 0; for(i=0; i<N_x-j; i++) y[j] = (y[j] + x[i+j]*x[i])/(N_x-j); } } Assumption • N_y<=N_x Techniques • Use of CoMUL instructions User’s Manual for Keil Compiler -197 V 1.2, 2003-11 DSP Library for XC16x Family Auto_unbias Register Usage Unbiased autocorrelation (cont’d) • From .c file to .asm file: Defined by the compiler Memory Note High Addr. Input vector x(i) shifted input vector x(i+j) x(Nx-1) x(Nx-j-1) x(Nx-2) . . . . x(1) . x(0) x(2) . Low Addr. x(1) . x(0) 0 CoMAC Memory Memory Output y High Addr. y(Ny-1) y(Ny-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-57 Unbiased autocorrelation User’s Manual for Keil Compiler -198 V 1.2, 2003-11 DSP Library for XC16x Family Auto_unbias Example Unbiased autocorrelation (cont’d) C166Lib\Keil\Examples\Static_funcs\ Auto_correlation\Autoc.c Cycle Count Memory Initialization 4 Loop_cyc Ny – 1 2⋅ ∑ ( Nx – j + 18 ) j=0 Return 1 Total 5+ loop_cyc Example: Nx = 10, Ny = 3 cycle = 167 Code Size Memory Initialization Loop_cyc Return Total User’s Manual for Keil Compiler -199 8 bytes 56 bytes 2 bytes 66 bytes V 1.2, 2003-11 DSP Library for XC16x Family Cross_raw Raw cross-correlation Signature void Cross_raw( DataS* DataS* DataS* DataS DataS x1, x2 y, N_x1, N_x2 Inputs x1 : Pointer to first input vector in 1Q15 format x2 : Pointer to second input vector in 1Q15 format N_x1 (N1) : The length of the first input vector N_x2 (N2) : The length of the second input vector y : Pointer to output vector in 1Q15 format ) Output Return Implementation Description : The function performs the full-length raw cross-correlation function of real vector x1 and x2 according to Equation [4.68], Equation [4.69] and Equation [4.70]. The arguments to the function are pointers to the input vectors, pointer to output buffer to store autocorrelation result and sizes of input buffers. The input values and output values are both in 16 bit fractional format. User’s Manual for Keil Compiler -200 V 1.2, 2003-11 DSP Library for XC16x Family Cross_raw Raw cross-correlation (cont’d) Pseudo code { DataS DataS DataS DataS DataS DataS *x1; *x2; *y; N1; N2; i,j; //Ptr to the first input vector //Ptr to the second input vector //Ptr to output vector //size of the first input //size of the second input //negative side for(j=0; j<N2; j++) { y[j] = 0; for(i=0; i<j; i++) y[j] = y[j] + x1[i]*x2[N2-j-1+i]; } //positive side if(0<=j<=(N1-N2-1)) { y[j+N2] = 0; for(i=0; i<N2; i++) y[j+N2] = y[j+N2] + x1[i+j]*x2[i]; } else if((N1-N2)<=j<=(N1-1)) { y[j+N2] = 0; for(i=0; i<N1-j; i++) y[j+N2] = y[j+N2] + x1[i]*x2[N2-j-1+i]; } } Assumption • N2<=N1 • x1 must be stored in DPRAM area. Techniques • Use of CoMUL instructions Register Usage • From .c file to .asm file: Defined by the compiler User’s Manual for Keil Compiler -201 V 1.2, 2003-11 DSP Library for XC16x Family Cross_raw Raw cross-correlation (cont’d) Memory Note Input vector x2 Input vector x1 High Addr. Low Addr. x1(N1-1) IDX0 x2(N2-1) x1(N1-2) x2(N2-2) . . . . . . x1(2) x2(2) x1(1) x2(1) x1(0) CoMAC DPRAM x2(0) Memory Output y High Addr. y(N-1) y(N-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-58 Raw cross-correlation Example C166Lib\Keil\Examples\Static_funcs\ Cross_correlation\Cross.c User’s Manual for Keil Compiler -202 V 1.2, 2003-11 DSP Library for XC16x Family Cross_raw Raw cross-correlation (cont’d) Cycle Count Memory Initialization 4 Neg_cyc N2 – 1 ∑ ( j + 18 ) j=0 Positive-side: Intialization 3 Pos_cyc1 (N1-N2)*(N2+19) Pos_cyc2 N2 – 1 ∑ ( N1 – j + 19 ) j=0 Return 1 Total 8 + Neg_cyc + Pos_cyc1 + Pos_cyc2 Example: N1 = 3, N2 = 3 cycle = 128 Code Size Memory Initialization 8 bytes 50 bytes 6 bytes Pos_cyc1 36 bytes Pos_cyc2 44 bytes 2 bytes 146 bytes Neg_cyc Positive-side: Intialization Return Total User’s Manual for Keil Compiler -203 V 1.2, 2003-11 DSP Library for XC16x Family Cross_bias Biased cross-correlation Signature void Cross_bias( DataS* DataS* DataS* DataS DataS x1, x2 y, N_x1, N_x2 Inputs x1 : Pointer to first input vector in 1Q15 format x2 : Pointer to second input vector in 1Q15 format N_x1 (N1) : The length of the first input vector N_x2 (N2) : The length of the second input vector y : Pointer to output vector in 1Q15 format ) Output Return Implementation Description : The function performs the full-length biased cross-correlation function of real vector x1 and x2 according to Equation [4.71],Equation [4.72] and Equation [4.73]. The arguments to the function are pointers to the input vectors, pointer to output buffer to store autocorrelation result and sizes of input buffers. The input values and output values are both in 16 bit fractional format. User’s Manual for Keil Compiler -204 V 1.2, 2003-11 DSP Library for XC16x Family Cross_bias Biased cross-correlation (cont’d) Pseudo code { DataS DataS DataS DataS DataS DataS *x1; *x2; *y; N1; N2; i,j; //Ptr to the first input vector //Ptr to the second input vector //Ptr to output vector //size of the first input //size of the second input //negative side for(j=0; j<N2; j++) { y[j] = 0; for(i=0; i<j; i++) y[j] = (y[j] + x1[i]*x2[N2-j-1+i])/N1; } //positive side if(0<=j<=(N1-N2-1)) { y[j+N2] = 0; for(i=0; i<N2; i++) y[j+N2] = (y[j+N2] + x1[i+j]*x2[i])/N1; } else if((N1-N2)<=j<=(N1-1)) { y[j+N2] = 0; for(i=0; i<N1-j; i++) y[j+N2] = (y[j+N2] + x1[i]*x2[N2-j-1+i])/N1; } } Assumption • N2<=N1 • x1 must be stored in DPRAM area. Techniques • Use of CoMUL instructions Register Usage • From .c file to .asm file: Defined by the compiler User’s Manual for Keil Compiler -205 V 1.2, 2003-11 DSP Library for XC16x Family Cross_bias Biased cross-correlation (cont’d) Memory Note input vector x2 Input vector x1 High Addr. Low Addr. IDX0 x1(N1-1) x2(N2-1) x1(N1-2) x2(N2-2) . . . . . . x1(2) x2(2) x1(1) x2(1) x1(0) CoMAC DPRAM x2(0) Memory Output y High Addr. y(N-1) y(N-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-59 Biased cross-correlation Example C166Lib\Keil\Examples\Static_funcs\ Cross_correlation\Cross.c User’s Manual for Keil Compiler -206 V 1.2, 2003-11 DSP Library for XC16x Family Cross_bias Biased cross-correlation (cont’d) Cycle Count Memory Initialization 4 Neg_cyc N2 – 1 ∑ ( j + 36 ) j=0 Positive-side: Intialization 3 Pos_cyc1 (N1-N2)*(N2+37) Pos_cyc2 N2 – 1 ∑ ( N1 – j + 34 ) j=0 Return 1 Total 8 + Neg_cyc + Pos_cyc1 + Pos_cyc2 Example: N1 = 3, N2 = 3 cycle = 156 Code Size Memory Initialization 8 bytes 56 bytes 6 bytes Pos_cyc1 32 bytes Pos_cyc2 52 bytes 2 bytes Neg_cyc Positive-side: Intialization Return Total User’s Manual for Keil Compiler 156 bytes -207 V 1.2, 2003-11 DSP Library for XC16x Family Cross_unbias Unbiased cross-correlation Signature void Cross_unbias( DataS* DataS* DataS* DataS DataS ) Inputs x1 : Pointer to first input vector in 1Q15 format x2 : Pointer to second input vector in 1Q15 format N_x1 (N1) : The length of the first input vector N_x2 (N2) : The length of the second input vector y : Pointer to output vector in 1Q15 format Output Return Implementation Description x1, x2 y, N_x1, N_x2 : The function performs the full-length unbiased cross-correlation function of real vector x1 and x2 according to Equation [4.74],Equation [4.75] and Equation [4.76] . The arguments to the function are pointers to the input vectors, pointer to output buffer to store autocorrelation result and sizes of input buffers. The input values and output values are both in 16 bit fractional format. User’s Manual for Keil Compiler -208 V 1.2, 2003-11 DSP Library for XC16x Family Cross_unbias Unbiased cross-correlation (cont’d) Pseudo code { DataS DataS DataS DataS DataS DataS *x1; *x2; *y; N1; N2; i,j; //Ptr to the first input vector //Ptr to the second input vector //Ptr to output vector //size of the first input //size of the second input //negative side for(j=0; j<N2; j++) { y[j] = 0; for(i=0; i<j; i++) y[j] = (y[j] + x1[i]*x2[N2-j-1+i])/abs(j+1); } //positive side if(0<=j<=(N1-N2-1)) { y[j+N2] = 0; for(i=0; i<N2; i++) y[j+N2] = (y[j+N2] + x1[i+j]*x2[i])/N2; } else if((N1-N2)<=j<=(N1-1)) { y[j+N2] = 0; for(i=0; i<N1-j; i++) y[j+N2] = (y[j+N2] + x1[i]*x2[N2-j-1+i])/(N1-abs(j)); } } Assumption • N2<=N1 • x1 must be stored in DPRAM area. Techniques • Use of CoMUL instructions Register Usage • From .c file to .asm file: Defined by the compiler User’s Manual for Keil Compiler -209 V 1.2, 2003-11 DSP Library for XC16x Family Cross_unbias Unbiased cross-correlation (cont’d) Memory Note Input vector x2 Input vector x1 High Addr. Low Addr. x1(N1-1) IDX0 x2(N2-1) x1(N1-2) x2(N2-2) . . . . . . x1(2) x2(2) x1(1) x2(1) x1(0) CoMAC DPRAM x2(0) Memory Output y High Addr. y(N-1) y(N-2) . . . y(2) y(1) Low Addr. y(0) Memory Figure 4-60 Unbiased cross-correlation Example C166Lib\Keil\Examples\Static_funcs\ Cross_correlation\Cross.c User’s Manual for Keil Compiler -210 V 1.2, 2003-11 DSP Library for XC16x Family Cross_unbias Unbiased cross-correlation (cont’d) Cycle Count Memory Initialization 4 Neg_cyc N2 – 1 ∑ ( j + 38 ) j=0 Positive-side: Intialization 3 Pos_cyc1 (N1-N2)*(N2+39) Pos_cyc2 N2 – 1 ∑ ( N1 – j + 41 ) j=0 Return 1 Total 8 + Neg_cyc + Pos_cyc1 + Pos_cyc2 Example: N1 = 3, N2 = 3 cycle = 212 Code Size Memory Initialization 8 bytes 60 bytes 6 bytes Pos_cyc1 36 bytes Pos_cyc2 64 bytes 2 bytes 176 bytes Neg_cyc Positive-side: Intialization Return Total User’s Manual for Keil Compiler -211 V 1.2, 2003-11 DSP Library for XC16x Family 4.10.3 Convolution Discrete convolution is a process, whose input is two sequences, that provide a single output sequence. Convolution of two time domain sequences results in a time domain sequence. Same thing applies to frequency domain. Both the input sequences should be in the same domain but the length of the two input sequences need not be the same. Convolution of two sequences X(k) and H(k) of length nX and nH respectively can be given mathematically as nX + nH – 1 R(n) = ∑ H ( k )X ( n – k ) [4.77] k=0 The resulting output sequence R(n) is of length nX+nH-1. The convolution in time domain is multiplication in frequency domain and vice versa. User’s Manual for Keil Compiler -212 V 1.2, 2003-11 DSP Library for XC16x Family Convol Convolution Signature DataS Convol (DataS* x, DataS* h, DataS* y, DataS* d_buffer, DataS N_x, DataS N_h ) Inputs Output x : Pointer to input vector x in 1Q15 format h : Pointer to input vector h in 1Q15 format d_buffer : Pointer to delay buffer N_x : Size of x N_h : Size of h y : Convolution output in 1Q15 format Return Implementation Description The convolution of the two sequences x and h is done.The delay line is implemented in parallel to the multiplyaccumulate operation using instructions CoMACM.The delay buffer is located in the DPRAM area. Pseudo code { DataS DataS DataS DataS *x; *h; i,j; N_h,N_x; //pointer to the input vector x //pointer to the input vector h for(j=0 to N_h+N_x-1) { for(k=0 to N_h-1) { y(j)=y(j)+h(k)*x(j-k); } } return y; } Techniques Use of CoMAC instructions User’s Manual for Keil Compiler -213 V 1.2, 2003-11 DSP Library for XC16x Family Convol Convolution (cont’d) Assumption Delay buffer must be located in DPRAM area(0xf200-0xfe00) Register Usage • From .c file to .asm file: Defined by the compiler Memory Note High Addr. Low Addr. High Addr. Low Addr. DPRAM x(n-N_h) . . . x(n-3) x(n-2) x(n-1) x(n) DPRAM x(n_N_h+1) x(n-N_h+2) . . x(n-2) x(n-1) x(n) x(n) IDX0 Memory h(N_h-1) h(N_h-2) . . h(3) h(2) h(1) h(0) IDX0 Memory h(N_h-1) h(N_h-2) . . h(3) h(2) h(1) h(0) CoMACM 1Q15 1Q15 Before After Figure 4-61 Convolution Example C166Lib\Keil\Examples\Static_funcs\Convolution\conv.c Cycle Count Read parameters 4 Register Initialization 7 Set counters 5 User’s Manual for Keil Compiler -214 V 1.2, 2003-11 DSP Library for XC16x Family Convol Convolution (cont’d) Convolution 2N_h+8N_x-1 Write output 2 Return 1 Total 2N_h+8N_x+18 Example: N_h=4 ,N_x=8 cycle:90 Code Size Read parameters 8 bytes Register Initialization 14 bytes Set counters 10 bytes Convolution 42 bytes Write output 8 bytes Return 2 bytes Total User’s Manual for Keil Compiler 84 bytes -215 V 1.2, 2003-11 DSP Library for XC16x Family References 5 References 1. C166S V2, 16-bit Microcontroller, User Manual, V 1.7, Jan. 2001. 2. C166S V2,A DSP Library for C166S V2 Core,User’s Manual,V1.1,Sept.2002. 3. XC161CJ, Peripheral Units, 16-Bit Single-Chip Microcontroller with C166S V2 Core, V1.1, Dec. 2000. 4. XC16Board, Hardware Manual, Board REV. 200, V1.0, Sept. 2001. 5. TriLib, A DSP Library for TriCore, User’s Manual, V 1.2, Jan. 2001. 6. Application Note: ST10 DSP MAC Signal Processing Algorithms, 1998. 7. Application Note: Fast - Fourier - Transformation for C166 Microcontroller Family, 1997. TMS320C54x DSPLIB User’s Guide, Texas Instruments, Sept. 1998. User’s Manual for Keil Compiler 5-216 V 1.2, 2003-11 -217 Total Quality Management Qualität hat für uns eine umfassende Bedeutung. Wir wollen allen Ihren Ansprüchen in der bestmöglichen Weise gerecht werden. Es geht uns also nicht nur um die Produktqualität – unsere Anstrengungen gelten gleichermaßen der Lieferqualität und Logistik, dem Service und Support sowie allen sonstigen Beratungs- und Betreuungsleistungen. Quality takes on an allencompassing significance at Semiconductor Group. For us it means living up to each and every one of your demands in the best possible way. So we are not only concerned with product quality. We direct our efforts equally at quality of supply and logistics, service and support, as well as all the other ways in which we advise and attend to you. Dazu gehört eine bestimmte Geisteshaltung unserer Mitarbeiter. Total Quality im Denken und Handeln gegenüber Kollegen, Lieferanten und Ihnen, unserem Kunden. Unsere Leitlinie ist jede Aufgabe mit „Null Fehlern“ zu lösen – in offener Sichtweise auch über den eigenen Arbeitsplatz hinaus – und uns ständig zu verbessern. Part of this is the very special attitude of our staff. Total Quality in thought and deed, towards co-workers, suppliers and you, our customer. Our guideline is “do everything with zero defects”, in an open manner that is demonstrated beyond your immediate workplace, and to constantly improve. Unternehmensweit orientieren wir uns dabei auch an „top“ (Time Optimized Processes), um Ihnen durch größere Schnelligkeit den entscheidenden Wettbewerbsvorsprung zu verschaffen. Geben Sie uns die Chance, hohe Leistung durch umfassende Qualität zu beweisen. Wir werden Sie überzeugen. http://www.infineon.com Published by Infineon Technologies AG Throughout the corporation we also think in terms of Time Optimized Processes (top), greater speed on our part to give you that decisive competitive edge. Give us the chance to prove the best of performance through the best of quality – you will be convinced.