Intersil HSP48908VC-32 Two dimensional convolver Datasheet

HSP48908
Data Sheet
May 1999
File Number
Two Dimensional Convolver
Features
The Intersil HSP48908 is a high speed Two Dimensional
Convolver which provides a single chip implementation of a
video data rate 3 x 3 kernel convolution on two dimensional
data. It eliminates the need for external data storage through
the use of the on-chip row buffers which are programmable
for row lengths up to 1024 pixels.
• Single Chip 3 x 3 Kernel Convolution
2456.5
• Programmable On-Chip Row Buffers
• DC to 32MHz Clock Rate
• Cascadable for Larger Kernels and Images
• On-Chip 8-Bit ALU
There are Internal Register banks for storing two
independent 3 x 3 filter kernels, thus facilitating the
implementation of adaptive filters and multiple filter
operations on the same data. The pixel data path also
includes an on-chip ALU for performing real-time arithmetic
and logical pixel point operations.
Data is provided to the HSP48908 in a raster scan
noninterlaced fashion, and is internally buffered on images
up to 1024 pixels wide for the 3 x 3 convolution operation.
Images with larger rows and convolution with larger kernel
sizes can be accommodated by using external row buffers
and/or multiple HSP48908s. Coefficient and pixel input data
are 8-bit signed or unsigned integers, and the 20-bit
convolver output guarantees no overflow for kernel sizes up
to 4 x 4. Larger kernel sizes can be implemented however,
since the filter coefficients will normally be less than their
maximum 8-bit values.
The HSP48908 is manufactured using an advanced CMOS
process, and is a low power fully static design. The
configuration of the device is controlled through a standard
microprocessor interface and all inputs/outputs are TTL
compatible.
1
• Dual Coefficient Mask Registers, Switchable in a
Single Clock Cycle
• 8-Bit Signed or Unsigned Input and Coefficient Data
• 20-Bit Extended Precision Output
• Standard µP Interface
• Low Power CMOS
Applications
• Image Filtering
• Edge Detection
• Adaptive Filtering
• Real Time Video Filter
Ordering Information
PART NUMBER
TEMP.
RANGE (oC)
PACKAGE
PKG. NO.
HSP48908VC-20
0 to 70
100 Ld MQFP
Q100x14x20
HSP48908VC-32
0 to 70
100 Ld MQFP
Q100x14x20
HSP48908JC-20
0 to 70
84 Ld PLCC
N84.1.15
HSP48908JC-32
0 to 70
84 Ld PLCC
N84.1.15
HSP48908GC-20
0 to 70
84 Ld PGA
G84.A
HSP48908GC-32
0 to 0
84 Ld PGA
G84.A
CAUTION: These devices are sensitive to electrostatic discharge; follow proper IC Handling Procedures.
http://www.intersil.com or 407-727-9207 | Copyright © Intersil Corporation 1999
HSP48908
Pinouts
84 PIN PGA
TOP VIEW
DOUT5 DOUT6 DOUT8 DOUT10 DOUT12 DOUT13 DOUT15
11
CAS06
DOUT0 DOUT1
10
CAS04
CAS06
9
CAS03
GND
8
CAS01
CAS02
7
OE
GND
VCC
CASI1 FRAME CASI0
6
DIN1
CASO0
DIN0
CASI2
5
DIN2
DIN3
DIN4
CASI6 CASI14 CASI13
4
DIN5
DIN6
3
DIN7
CIN1
2
CIN0
CIN3
CIN4
1
CIN2
CIN5
A
B
2
CAS07
GND
DOUT2 DOUT4 DOUT9
GND
DOUT3 DOUT7
VCC
DOUT11 DOUT14
GND
DOUT17
DOUT16 DOUT18
DOUT19
VCC
GND
RESET
CASI7 CASI16
CIN9
HOLD
L0
CIN7
GND
VCC
A2
EALU
CASI13 CASI11 CASI9
CIN6
CIN8
CLK
A1
CS
A0
CASI16 CASI14 CASI12
C
D
E
F
G
H
CASI10 CASI18
J
K
L
HSP48908
(Continued)
GND
CASO0
CASO1
CASO2
CASO3
CASO4
GND
CASO5
OE
CIN2
CIN1
CIN0
DIN7
DIN6
DIN5
DIN4
DIN3
DIN2
DIN1
DIN0
VCC
84 LEAD PLCC
TOP VIEW
11 10 9 8 7 6 5 4 3 2 1 84 83 82 81 80 79 78 77 76 75
CIN3
CIN4
CIN5
CIN6
CIN7
CIN8
CIN9
GND
CLK
VCC
HOLD
LD
CS
A2
A1
A0
EALU
CASI15
CASI14
CASI13
CASI12
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
54
3
GND
DOUT19
DOUT18
DOUT17
DOUT16
DOUT15
FRAME
RESET
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
CASI11
CASI10
CASI9
CASI8
CASI7
CASI6
CASI5
CASI4
CASI3
VCC
CASI2
CASI1
CASI0
Pinouts
CASO6
CASO7
DOUT0
DOUT1
DOUT2
GND
DOUT3
DOUT4
DOUT5
DOUT6
DOUT7
VCC
DOUT8
GND
DOU9
DOUT10
DOUT11
DOUT12
DOUT13
DOUT14
GND
HSP48908
(Continued)
GND
GND
CASO0
CASO1
CASO2
CASO3
CASO4
GND
OE
CIN0
DIN7
DIN6
DIN5
DIN4
DIN3
DIN2
DIN1
DIN0
VCC
VCC
100 LEAD MQFP
TOP VIEW
100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81
CIN1
CIN2
NC
NC
CIN3
CIN4
CIN5
CIN6
CIN7
CIN8
CIN9
GND
GND
CLK
VCC
VCC
HOLD
LD
CS
A2
A1
A0
EALU
CASI15
CASI14
CASI13
CASI12
NC
NC
CASI11
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
80
79
78
77
76
75
74
73
72
71
70
69
68
67
66
65
64
63
62
61
60
59
58
57
56
55
54
53
52
51
4
GND
GND
DOUT19
DOUT18
DOUT17
FRAME
RESET
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
CASI10
CASI9
CASI8
CASI7
CASI6
CASI5
CASI4
CASI3
VCC
VCC
CASI2
CASI1
CASI0
Pinouts
GND
CASO5
NC
CASO6
CASO7
DOUT0
DOUT1
DOUT2
GND
GND
DOUT3
DOUT4
DOUT5
DOUT6
DOUT7
VCC
VCC
DOUT8
GND
GND
DOUT9
DOUT10
DOUT11
DOUT12
DOUT13
DOUT14
GND
GND
DOUT15
DOUT16
HSP48908
Block Diagram
DATA DELAY
ALU
DIN0 - 7
Z -1
Z -1
Z -1
CASIO - 7
r
Z -1
ROW
BUFFER
Z -1
CIN0 - 9
CASIO0 - 7
ROW
BUFFER
2:1
CASCADE
MODE
ALU
REGISTER
CASIO - 15
r
2:1
2:1
CASCADE
MODE
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
CONTROL
LOGIC
G
H
I
D
E
F
A
B
C
FRAME
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
Z -1
RESET
OE
CASCADE
MODE
16
CASIO - 15
Z-1
+
+
+
Z -1
Z -1
Z -1
+
20
SHIFT
20
2:1
Z -1
0
Z -1
3
DOUTO - 19
A0 - 2
ADDRESS
DECODER
LD
CS
CLK
CLOCK
GEN
HOLD
5
INTERNAL CLOCK
Z -1
HSP48908
Pin Descriptions
NAME
PLCC PIN
TYPE
VCC
21, 42, 63, 84
The +5V power supply pins. 0.1µF capacitors between the VCC and GND pins are
recommended.
GND
19, 48, 54, 61,
69, 76, 82
The device ground.
CLK
20
I
Input and System Clock. Operations are synchronous with the rising edge of this clock signal.
DIN-07
1-8
I
Pixel Data Input Bus. This bus is used to provide the 8-bit pixel input data to the HSP48908. The
data must be provided in a synchronous fashion, and is latched on the rising edge of the CLK signal.
ClN0-9
9-18
I
Coefficient Input Bus. This input bus is used to load the Coefficient Mask Register(s), the Initialization Register, the Row Buffer Length Register and the ALU microcode. It may also be used to provide a second operand input to the ALU. The definition of the ClN0-9 bits is defined by the register
address bits A0-2. The CIN0-9 data is loaded to the Addressed Register through the use of the CS
and LD inputs.
DOUT0-19
49-53, 55-60,
62, 64-68,
70-72
0
Output Data Bus. This 20-Bit output port is used to provide the convolution result. The result is the
sum of products of the input data samples and their corresponding coefficients. The Cascade inputs
CASl0-15 may also be added to the result by selecting the appropriate cascade mode in the Initialization Register.
CASIO-15
29-41, 43-45
I
Cascade Input Bus. This bus is used for cascading multiple HSP48908s to allow convolution with
larger kernels or row sizes. It may also be used to interface to external row buffers. The function of
this bus is determined by the Cascade Mode bit (Bit 0) of the Initialization Register. When this bit is
set to a ‘0’, the value on CASI0-15 is left shifted and added to DOUT0-19. The amount of the shift
is determined by bits 7-8 of the Initialization Register. While this mode is intended primarily for cascading, it may also be used to add an offset value, such as to increase the brightness of the convolved image.
When the Cascade mode bit is set to a “1”, this bus is used for interfacing to external row buffers.
In this mode the bus is divided into two 8-bit busses (CASl0-7 and CASl8-15), thus allowing two additional pixel data inputs. The cascade data is sent directly to the internal multiplier array which allows for larger row sizes without using multiple HSP48908s.
CASO0-7
73-75, 77-81
0
Cascade Output Bus. This bus is used primarily during cascading to handle larger frames and/or
kernel sizes. This output data is the data on DIN0-7 delayed by twice the programmed internal row
buffer length.
FRAME
46
I
Frame is an asynchronous new frame or vertical sync input. A low on this input resets all internal
circuitry except for the Coefficient, ALU, AMC, EOR and INT Registers. Thus, after a Frame reset
has occurred, a new frame of pixels may be convolved without reloading these registers.
EALU
28
I
Enable ALU Input. This control line gates the clock to the ALU Register. When it is high, the data on
CIN0-7 is loaded on the next rising clock edge. When EALU is low, the last value loaded remains in
the ALU Register.
HOLD
22
I
The Hold Input is used to gate the clock from all of the internal circuitry of the H5P48908. This signal
is synchronous, is sampled on the rising edge of CLK and takes effect on the following cycle. While
this signal is active (high), the clock will have no effect on the HSP48908 and
internal data will remain undisturbed.
RESET
47
I
Reset is an asynchronous signal which resets all internal circuitry of the HSP48908. All outputs are
forced low in the reset state.
OE
83
I
Output Enable. The OE input controls the state of the Output Data bus (DOUT0-19). A LOW on this
control line enables the port for output. When OE is HIGH, the output drivers are in the high impedance state. Processing is not interrupted by this pin.
A0-2
25-27
I
Control Register Address. These lines are decoded to determine which register in the control logic
is the destination for the data on the ClN0-9 inputs. Register loading is controlled by the A0-2, LD
and CS inputs.
LD
23
I
Load Strobe. LD is used for loading the Internal Registers of the HSP48908. When CS and LD are
active, the rising edge of LD will latch the CIN0-7 data into the register specified by A0-2.
CS
24
I
Chip Select. The Chip Select input enables loading of the Internal Registers. When CS is low, the
A0-2 address lines are decoded to determine the meaning of the data on the CIN0-7 bus. The rising
edge of LD will then load the Addressed Register.
6
DESCRIPTION
HSP48908
Functional Description
The HSP48908 two-dimensional convolver performs
convolution of 3 x 3 filter kernels. It accepts the image data
in raster scan, non-interlaced format, convolves it with the
filter kernel and outputs the filtered image. The input and
filter kernel data are both 8 bits, while the output data is 20
bits to prevent overflow during the convolution operation.
The HSP48908 has internal storage for two 3 x 3 filter
kernels and is capable of buffering two 1024 x 8-bit rows for
true single chip operation at video frame rates. An 8-bit ALU
in the input pixel data path allows the user to perform
arithmetic and logical operations on the input data in real
time during the convolution. Multiple devices can also be
cascaded together for larger kernel convolution, larger frame
sizes and increased precision.
Image data is input to the convolver via the DIN0-7 bus. The
data is then operated on by the ALU, stored in the row
buffers and convolved with the 3 x 3 array of filter
coefficients. The resultant output data is then latched into
the Output Register. The row buffers are preprogrammed to
the length of one row of the input image to enable the user to
input the image data one pixel at a time in raster scan format
without having to provide external storage.
Initialization of the convolver is done using the ClN0-7 bus to
load configuration data, such as the filter kernel(s) and the
length of the row buffers. The address lines A0-2 are used to
address the Internal Registers for initialization. The
configuration data is loaded using the A0-2, CIN0-9, CS and
LD controls as address, data, chip select and write enable,
respectively. This interface is compatible with standard
microprocessors without the use of any additional glue logic.
Filtered image data comes out of the convolver over the
DOUT0-1 9 bus. This output bus is 20 bits wide to provide
room for growth during the convolution operation. The 20-bit
bus will allow the use of up to 4 x 4 kernels (using multiple
48908s) without overflow. However, in practical applications,
much larger kernel sizes can be implemented without
overflow since the filter coefficients are typically much
smaller than 8-bit full scale values. DOUT0-19 is also a
registered, three state bus to facilitate cascading multiple
chips and to allow the HSP48908 to reside on a standard
microprocessor system bus.
Initialization Register during configuration setup (See
Control Logic). Delays greater than one are used primarily in
cascading multiple HSP48908s to align data sequences for
proper output (See Operation).
Arithmetic Logic Unit
The on-chip ALU provides the user with the capability of
performing pixel point operations on incoming image data.
Depending on the instruction in the ALU Microcode Register,
the ALU can perform any one of 19 arithmetic and logical
functions, and shift the resulting number left or right by up to
3 bits. Tables 1 and 2 show the available ALU functions and
the 10-bit associated microcode to be loaded into the ALU
Microcode Register. Note that the shifts take place on the
output of the ALU and are completely independent of the
logical or arithmetic operation being performed. The first
input (A) of the ALU is taken from the pixel input bus (DlN07). The second input (B) is taken from the ALU Register. The
ALU Register is loaded via the ClN0-7 bus while the EALU
control line is valid (see EALU).
TABLE 1. ALU SHIFT OPERATIONS
ALU MICROCODE REGISTER
REGISTER BIT
9
8
7
OPERATION
0
0
0
No Shift (Default)
0
0
1
Shift Right 1
0
1
0
Shift Right 2
0
1
1
Shift Right 3
I
0
0
Shift Left 1
1
0
1
Shift Left 2
1
1
0
Shift Left 3
1
1
1
Not Valid
TABLE 2. ALU PIXEL OPERATIONS
REGISTER BIT
6
5
4
3
2
1
0
0
0
0
0
0
0
0
Logical (00000000)
1
1
1
1
0
0
0
Logical (11111111)
Multiple convolvers can also be cascaded together for kernel
sizes larger than 3 x 3 and for convolution on images with
row lengths longer than 1024 pixels. The maximum kernel
size is dependent upon the magnitude of the image data and
the coefficients in a given application; care must always be
taken with very large kernel sizes to prevent overflow of the
20-bit output.
0
0
1
1
0
0
0
Logical (A) (Default)
0
1
0
1
0
0
0
Logical (B)
1
1
0
0
0
0
0
Logical (A)
1
0
1
0
0
0
0
Logical (B)
0
1
1
0
0
0
1
Arithmetic (A + B)
1
0
0
1
0
1
0
Arithmetic (A -B)
Data Input
1
0
0
1
1
0
0
Arithmetic (B -A)
0
0
0
1
0
0
0
Logical (A AND B)
0
0
1
0
0
0
0
Logical (A AND B)
Image data coming into the 2D Convolver passes through a
programmable pipeline delay before being sent to the ALU.
The amount of delay (1 to 4 clock cycles) is set in the
7
OPERATION
HSP48908
TABLE 2. ALU PIXEL OPERATIONS (Continued)
REGISTER BIT
6
5
4
3
2
1
0
OPERATION
0
1
0
0
0
0
0
Logical (A AND B)
0
1
1
1
0
0
0
Logical (A OR B)
1
0
1
1
0
0
0
Logical (A OR B)
1
1
0
1
0
0
0
Logical (A OR B)
1
1
1
0
0
0
0
Logical (A NAND B)
1
0
0
0
0
0
0
Logical (AN OR B)
0
1
1
0
0
0
0
Logical (A XOR B)
1
0
0
1
0
0
0
Logical (A XNOR B)
EALU
The EALU control pin enables loading of the ALU Register.
While the EALU line is high, the data on ClN0-7 is latched
into the ALU Register on the rising edge of CLK. When
EALU goes low, the current value in the ALU Register is held
until EALU is again asserted. Note that the ALU loading
operation makes use of the ClN0-7 inputs, but is completely
independent of CS and LD. Therefore, in order to prevent
overwriting an internal register, care must be taken to ensure
that CS and LD are not active during an EALU cycle.
Programmable Row Buffers
The programmable row buffers are used for buffering raster
input data for the convolution operation. They can be thought
of as Programmable Shift Registers which can each store up
to 1024 8-bit values, thus, delaying each pixel by up to 1024
clock cycles. Functionally, each row buffer can be
represented as a set of registers connected as a 1024 x 8-bit
Serial Shift Register. The output of each buffer can be
represented by the equation Q = D(n-r), where Q is the row
buffer output, D is the buffer input, n is the current clock
cycle and r is the preprogrammed row length of the input
image. Since the two buffers are connected in series, the
data at the cascade outputs (CASO0-7) is delayed by two
row delays and may be used for cascading multiple
convolvers for larger kernel sizes and/or row lengths. The
programmable row buffers can also be bypassed by
selecting the appropriate cascade mode in the Initialization
Register. This mode allows the use of external row buffers
for convolving with row lengths longer than 1024 pixels.
8
8-BIt Multiplier Array
The multiplier array consists of nine 8 x 8 multipliers. Each
multiplier forms the product of a filter coefficient with a
corresponding pixel in the input image. Input and coefficient
data may be in either two’s complement or unsigned integer
format. The nine coefficients form a 3 x 3 filter kernel which
is multiplied by the input pixel data and summed to form a
sum of products for implementation of the convolution
operation as shown below:
INPUT DATA
FILTER KERNEL
P1
P2
P3
ABC
P4
P5
P6
DEF
P7
P8
P9
GHI
OUTPUT =
(A x P1)
+ (B x P2)
+ (C x P3)
+ (D x P4)
+ (E x P5)
+ (F x P6)
+ (G x P7)
+ (H x P8)
+ (l x P9)
Control Logic
The control logic (Figure 1) contains the ALU Microcode
Register, the Initialization Register, the Row Length Register,
and the Coefficient Registers. The control logic is updated
by placing data on the CIN0-9 bus and using the A0-2, CS
and LD control lines to write to the Addressed Register (see
Address Decoder). All of the Control Logic Registers are
loaded with their default values on RESET, and are
unaffected by FRAME.
ALU Microcode Register
The ALU Microcode Register is used to store the command
word for the ALU. The ALU command word is a 10-bit
instruction divided into two fields: the lower 7 bits determine
the ALU operation and the upper 3 bits specify the number
of shifts which occur. The ALU command words are defined
in Tables 1 and 2 (See ALU Section).
HSP48908
3
A0 - 2
ENCR1
ENCR0
CAS
ADDRESS
DECODE
CR1
CR0
LD
CS
LMC
EOR
10
CIN0 - 9
ALU MICROCODE REGISTER
(AMC)
ALU
MICROCODE
LMC
8
INITIALIZATION REGISTER
(INT)
INITIALIZATION
DATA
CAS
10
ROW
LENGTH (r)
ROW LENGTH REGISTER
(RLR)
EOR
CRO
CR1
S Q
R Q
ENCR1
ENCRO
COEFFICIENT
REGISTER 0
I0 E
H0 E
I
H
I1 E
H1 E
G0 E
G
G1 E
F0 E
E0 E
D0 E
C0 E
B0 E
A0 E
F
E
D
C
B
A
F1 E
E1 E
D1 E
C1 E
B1 E
A1 E
COEFFICIENT
REGISTER 1
FIGURE 1. CONTROL LOGIC BLOCK DIAGRAM
9
HSP48908
Initialization Register
The Initialization Register is used to appropriately configure
the convolver for a particular application. It is loaded through
the use of the ClN0-7 bus along with the CS an LD inputs.
Bit 0 defines the type of cascade mode to be used; Bits 1
and 2 select the number of delays to be included in the input
pixel data path; Bits 3 and 4 define the input and coefficient
data format; Bits 5 and 6 determine the type of rounding to
occur on the DOUT0-19 bus; Bits 7 an 8 define the shift
applied to the cascade input data. The complete definition of
the Initialization Register bits is give in Table 3.
TABLE 3. INITIALIZATION REGISTER DEFINITION
INITIALIZATION REGISTER
BIT 0
FUNCTION = CASCADE MODE
0
Multiplier input from internal row buffers.
1
Multiplier input from external buffers.
2 BIT 1
FUNCTION = INPUT DATA DELAY
0
0
No Data Delay Registers used.
0
1
One Data Delay Register used.
1
0
Two Data Delay Registers used.
1
1
Three data Delay Registers used.
BIT 3
FUNCTION = INPUT DATA FORMAT
0
Unsigned integer format.
1
Two’s complement format.
BIT 4
FUNCTION = COEFFICIENT DATA FORMAT
0
Unsigned integer format.
1
Two’s complement format.
6 BIT 5
FUNCTION = OUTPUT ROUNDING
0
0
No rounding.
0
1
Round to 16 bits (i.e., DOUT19-4).
1
0
Round to 8 bits (i.e., DOUT19-12).
1
1
Not Valid.
8 BIT 7
FUNCTION = CASI0-15 INPUT SHIFT
0
0
No shift.
0
1
Shift CASI0-15 left two.
1
0
Shift CASI0-15 left four.
1
1
Shift CASI0-15 left eight.
Row Length Register
The Row Length Register is used to store the programmed
number of delays for the internal row buffers. The
Programmed delay is set equal to the row length (r) of the
input image. The input pixel data is stored in the row buffers
to allow corresponding pixels of adjacent rows to be
synchronously sent to the multiplier array for the convolution
operation. The Row Length Register is programmable with
the values from 0 to 1023, with 0 defined as a row length of
10
1024. Row lengths of 1 or 2 lead to meaningless results for a
3 x 3 kernel convolution, while a row length of 3 define 1 x 9
filter (See Operation Section). The Row Length Register is
written through the use of A0-2, CS and LD. Once the Row
Length Register has been loaded, the convolver must reset
before a new row length can be entered, or else new value
will be ignored. After RESET returns high, user has 1024
cycles of CLK to load the Row Length Register. After 1024
CLK cycles, the Row Length Register is automatically set to
0 (row length = 1024) and further writes to this register are
ignored.
Coefficient Registers (CREG0, CREG1)
The control logic contains two Coefficient Register banks CR
EG0 and CREG1. Each of these register banks is capable of
storing nine 8-bit filter coefficient values (3 x 3 Kernel). The
output of the registers are connected to the coefficient input
of the corresponding multiplier in the 3 x 3 multiplier array
(designated A through I). The register bank to be used for
the convolution is selectable by writing to the appropriate
address (See address decoder). All registers in a given bank
are enabled simultaneously, and one of the banks is always
active.
For most applications, only one of the register banks is
necessary. The user can simply load CREG0 after power up,
and use it for the entire convolution operation. (CREG0 is the
Default Register). The alternate register bank allows the
user to maintain two sets of filter coefficients and switch
between them in real time. The coefficient masks are loaded
via the CIN bus by using A0-2, CS and LD. The selection of
the particular register bank to be used in processing is also
done by writing to the appropriate address (see address
decoder). For example, if CREG0 is being used to provide
coefficients to the multipliers, CREG1 can be updated at a
low rate by an external processor; then at the proper time,
CREG1 can be selected, so that the new coefficient mask is
used to process the data. Thus, no clock cycles have been
lost when changing between alternate 3 x 3 filter kernels.
The nine coefficients must be loaded sequentially over the
ClN0-7 bus from A to I. The address of CREG0 or CREG1 is
placed on A0-2, and then the nine coefficients are written to
the corresponding Coefficient Register one at a time by
using the CS and LD inputs.
Address Decoder
The address decoder (see Figure 1) is used for writing to the
control logic of the HSP48908. Loading an Internal Register
is done by selecting the Destination Register with the A0-2
address lines, placing the data on CIN0-9, asserting the CS
and LD control lines. When either CS or LD goes high, the
data on the CIN0-9 lines is latched into the Addressed
Register. The address map for the A0-2 bus is shown in
Table 4.
While loading of the Control Logic Registers is
asynchronous to CLK, the Target Register in the control logic
HSP48908
is being read synchronous to the internal clock. Therefore,
care must be taken when modifying the convolver setup
parameters during processing to avoid changing the
contents of the registers near a rising edge of CLK. The
required setup time relative to CLK is given by the
Specification TLCS. For example, in order to change the
active Coefficient Register from CREG0 to CREG1 during an
active convolution operation, a write will be performed to the
address for selecting CREG1 for internal processing (A2 -0 =
110). In order to provide proper uninterrupted operation, LD
should be deasserted at least TLCS prior to the next rising
edge of CLK. Failure to meet this setup time may result in
unpredictable results on the output of the convolver for one
clock cycle. Keep in mind that this requirement applies only
to the case where changes are being made in the control
logic during an active convolution operation. In a typical
convolver configuration routine, this specification would not
be applicable.
TABLE 4. ADDRESS MAP
CONTROL LOGIC ADDRESS MAP
A2-0
FUNCTION
The data on the cascade inputs (CASl0-15) can also be left
shifted by 0, 2, 4, or 8 bits. The amount of shift is determined
by bits 7 and 8 of the Initialization Register (See Table 3).
CASl0-15 is shifted by the specified number of bits and is
added to the 20-bit output DOUT 0-19. The shifting function
provides a method for cascading multiple HSP48908s and
allowing a selectable amount of output growth while
maximizing the resolution of the convolver result.
The cascade inputs can also be used as a simple way to add
an offset to the convolved image. Bit 0 of the Configuration
Register would be set to ‘0’, and the desired offset placed on
the CASl0-15 inputs. While multiple offsets can be used and
changed during the convolution operation, note that the
required data setup and hold times with respect to CLK
(TDS and TDH) must be met.
Cascade Output
The cascade output lines (CASO0-7) are outputs from the
second row buffer. Data at these outputs is the input pixel
data delayed by two times the preprogrammed value in the
Row Length Register. The cascade outputs are used to
cascade multiple convolvers by connecting the cascade
outputs of one device to the data inputs of another (see
Operation Section).
000
Load Row Length Register (RLR).
001
Load ALU Microcode Register (AMC).
010
Load Coefficient Register 0 (CREG0).
Control Signals
011
Load Coefficient Register 1 (CREG1).
HOLD
100
Load Initialization Register (INT).
101
Select CREG0 for Internal Processing.
110
Select CREG1 for Internal Processing.
111
No Operation.
The HOLD control input provides the ability to disable
internal clock and stop all operations temporarily. HOLD is
sampled on the rising edge of CLK and takes effect during
the following clock cycle (refer to Figure 2). This signal can
be used to momentarily ignore data at the input of the
convolver while maintaining its current output data and
operational state.
Cascade I/O
Cascade Input
The cascade input lines (CASl0-15) have two primary
functions. The first is used to allow convolutions with kernel
sizes larger than 3 x 3. This can be implemented by
connecting the DOUT bus of one convolver to the cascade
inputs of another. The second function is for convolution on
images wider than 1024 pixels. This type of operation can be
implemented by using external row buffers to supply the
pixel input data to the CASl0-15 inputs. The cascade input
functions are determined by Initialization Register bit 0.
When this bit is set to a “0”, the cascade input data is added
to the convolver output. In this manner, multiple convolvers
can be used to implement larger kernel convolution. When
Initialization Register bit 0 is “1”, the data on CASl0-15 is
divided into two 8-bit portions and is sent to the 3 x 3
multiplier array (refer to Block Diagram). This mode of
operation allows the use of external row buffers for
convolution of images with row sizes larger than 1024.
Examples of these configurations are given in the
Operations Section of this specification.
11
CLK
HOLD
INTERNAL
CLOCK
FIGURE 2. HOLD OPERATION
RESET
The RESET signal initializes all internal flip flops and
registers in the HSP48908. It is an asynchronous signal, and
the convolver will remain in the reset state as long as
RESET is asserted. On reset, all internal registers are set to
zero or their default values, and all outputs are forced low.
Following a reset, the default values in the internal registers
will define the following mode of operation: internal row
buffers used, line length = 1024, no input data delay, logical
HSP48908
A operation: output of ALU = A input (DIN0-7) output
rounding and unsigned input data format
The convolver can be reset at any time, but must be reset
before updating the Row Length Register in order to provide
proper operation. After RESET returns high, the user has
1024 cycles of CLK to load the Row Length Register. After
1024 OLK cycles, the Row Length Register is automatically
set to 0 (row length = 1024) and further writes to this register
are ignored.
FRAME
This FRAME input initializes all internal flip flops and
registers except for the coefficient, ALU, ALU microcode,
Row Length, and Initialization Registers. It is used to reset
the convolver between video frames and eliminates the need
to reinitialize the entire convolver or reload the coefficients.
FRAME is an asynchronous input and may occur at any
time. However, it must be deasserted at least TFS ns prior to
the rising clock edge that is to begin operation for the next
frame. While FRAME is asserted, the registers and flip- flops
will remain in the reset state.
Operation
The HSP48908 has three basic modes of operation: single
chip mode, operation with external row buffers and multiple
devices cascaded together for larger convolution kernels
and/or longer row lengths. The mode of operation is defined
by the contents of the Initialization Register, and can be
modified at any time by a microprocessor or other external
means.
Single Chip Mode
A single HSP48908 can be used to perform 3 x 3 convolution
on 8-bit image data with row lengths up to 1024. A block
diagram of this configuration is shown in Figure 3. In this
mode of operation, the image data is input into the DlN0-7
bus in a raster scan order starting with the upper left pixel. To
perform the convolution operation, a group of nine image
pixels is multiplied by the 3 x 3 array of filter coefficients and
their products are summed and sent to the output. For the
example in Figure 3, the pixel value in the output image at
location (m, n) is given by:
POUT(m,n) (A x Pm-1, n-1) + (B x Pm-1, n) + (C x Pm-1, n+1)
+ (D x Pm, n-1) + (E x Pm, n)
+ (F x Pm, n+1)
+ (G x Pm+1, n-1) + (H x Pm+1, n)+ (I x Pm+1, n+1)
change this default setup by loading new values into the ALU
microcode, initialization and Row Length Registers. RESET
also clears the Coefficient Registers and CREG0 is selected
for internal processing. The user can now load the
coefficients one at a time from A to I, via the ClN0-7 inputs
and the LD and CS control lines.
Multiple filter kernels can also be used on the same image
data using the dual Coefficient Registers CREG0 and
CREG1. This type of filtering is used when the
characteristics of the input pixel data change over the image
in such a way that no single filter produces satisfactory
results for the entire image. In order to filter such an image,
the characteristics of the filter itself must change while the
image is being processed. The HSP48908 can perform this
function with the use of an external processor. The
processor is used to calculate the required new filter
coefficients, loads them into the Coefficient Register not in
use, and selects the newly loaded Coefficient Register at the
proper time. The first Coefficient Register can then be
loaded with new coefficients in preparation for the next
change. This can be carried out with no interruption in
processing, provided that the new register is selected
synchronous to the convolver CLK signal.
The HSP48908 can also operate as a one dimensional 9 tap
FIR filter by programming the Row Buffer Length Register
with a value of 3 and setting the Initialization Register bit-0 to
a “0”. This configuration will provide for nine sequential input
values in the input to be multiplied by the coefficient values
in the selected Coefficient Register and provide the proper
filtered output. The equation for the output then becomes:
DOUTn =
A x Dn-8 + B x Dn-7 + C x Dn-6 + D x Dn-5
+ E x Dn-4 + F x Dn-3 + G x Dn-2 + H x Dn-1
+ I x Dn
IMAGE
DATA
20
8
FILTERED
IMAGE
HSP48908
CLK
INITIALIZATION
DATA
FILTER KERNEL
IMAGE DATA
ABC
Pm-1, n -1
Pm-1, n
Pm-1, n + 1
DEF
PM, n -1
Pm, n
Pm, n + 1
GHI
Pm + 1, n -1
Pm + 1, n
Pm + 1, n + 1
This process is continually repeated until the last pixel of the
last row of the image has been input. It can then start again
with the first row of the next frame. The FRAME pin is used
to clear the row buffers, multiplier input latches and
DOUTO19 registers between frames.
Use Of External Row Buffers
The setup for single chip operation is straightforward. After
reset, the convolver is configured for row lengths of 1024
pixels, no input data delay, no ALU pixel point operations, no
output rounding, and an unsigned input format. The user can
External row buffers may be used when frames with row
sizes larger than 1024 pixels are desired. To use the
HSP48908 in this mode, the cascade mode control bit (bit 0)
of the Initialization Register is set to ‘1’ to allow the data on
12
FIGURE 3. 3 x 3 KERNEL ON AN 8-BIT, 1024 x N IMAGE
HSP48908
the cascade inputs CASI0-15 to go to the multiplier array.
The inputs of one external row buffer (such as the HSP9500)
are connected to the input data in parallel with the DlN0-7
lines of the convolver; and its outputs are connected to the
CASl0-7 inputs (See Figure 4). A second external row buffer
is connected between the outputs of the first row buffer and
the CASl8-15 inputs of the convolver. The convolution
operation can then be performed by the HSP48908 in the
same manner as the single chip mode. The row length in this
configuration is limited only by the maximum length of the
external row buffers. Note that when using the convolver in
this configuration, the programmable input data delays and
ALU will only operate on the data entering the DIN0-7 inputs
(i.e., the bottom row of the 3 x 3 sum of products). If higher
order filters or pixel point operations are required when using
external row buffers, these functions must be implemented
externally by the user.
IMAGE
DATA
8
DIN0 - 7
DOUT0 - 19
HSP48908
ROW
BUFFER
CASI0 - 7
ROW
BUFFER
CASI0 - 16
20
FILTERED
IMAGE
DATA
FIGURE 4. USING EXTERNAL ROW BUFFERS WITH THE
HSP48908
COEFFICIENT MASKS
3 x 3 FILTER
KERNEL
CONVOLVER #1
CONVOLVER #2
ABC
DEF
ABC
DEF
000
000
GHI
GHI
000
IMAGE
DATA
8
Z -2
DIN0 - 7
FILTERED
IMAGE
DATA
20
DIN0 - 7
DOUT0 - 19
DOUT0 - 19
HSP48908
#1
HSP48908
#2
CASO0 - 7
CASI0 - 15
CASO0 - 7
CASI0 - 15
FIGURE 5. 3 x 3 KERNAL CONVOLUTION ON A 2K x N IMAGE
The same configuration can be used to perform 3 x 5
convolution on a 1K x N frame simply by setting up the
coefficients of the convolvers to implement the 3 x 5 mask as
indicated below:
3 x 5 FILTER
KERNEL
CONVOLVER COEFFICIENT MASKS
ABC
GHI
ABC
Cascading Multiple HSP48908s
DEF
JKL
DEF
Multiple HSP48908s are capable of being cascaded to perform convolution on images with row lengths longer than
1024 pixels and with kernel sizes larger than 3 x 3. Figure 5
illustrates the use of two HSP48908s to perform a 3 x 3 kernel convolution on a 2K x N frame. In this case, the cascade
mode control bit (Bit 0) of both Initialization Registers are set
to a ‘0’. The loading of the coefficients is accomplished just
as before. However, the 3 x 3 mask is divided into two portions for proper convolution output as follows: Convolver #1 =
DEF000GHl and Convolver #2 = ABC000000.
GHI
MNO
000
13
JKL
MNO
In addition to larger frames, larger kernels can also be
addressed through cascadability. An example of the
configuration for a 5 x 5 kernel convolution on a 1K x N
frame is shown in Figure 6. Note that in this configuration,
convolver #2 incorporates a 3 clock cycle delay (z -3) and
convolvers 3 and 4 incorporate 2 clock cycle delays (z -2) at
their pixel inputs. These delays are required to ensure
proper data alignment in the final sum of products output of
the cascaded convolvers. The number of delays required at
the pixel input is programmable through the use of bits 1 and
2 of the Initialization Register (Refer to Table 3).
HSP48908
IMAGE
DATA 8
Z -2
DIN0 - 7
DOUT0 - 19
HSP48908
#1
HSP48908
#3
CASO0 - 7
CASI0 - 16
Z -3
DIN0 - 7
DOUT0 - 19
CASO0 - 7
the use of bits 7 and 8 of the Initialization Register (see
Cascade I/O). Referring to Figure 6, if the maximum growth
out of convolver #1 extends into bit 16 or 17, then DOUT2-17
is connected to the cascade inputs of convolver #3, which is
programmed to shift the input data left by two bits. Likewise,
if the data out of convolver #3 grows into bit 18 or 19, then
DOUT4-19 are connected to the CASI0-15 inputs of
convolver #2, which is programmed to shift the input left by 4
bits.
CASI0 - 16
Z -2
DIN0 - 7
DOUT0 - 19
DIN0 - 7
Cascading For Row Sizes Larger Than 1024
20
Combining large images with large kernels is accomplished
by implementing external row buffers, external Data Delay
Registers and external adders. Figure 7 illustrates a circuit
for implementation of a 5 x 5 convolution on a 2K x N image.
The 5 x 5 coefficient mask is again distributed among the
four HSP48908’s. The width of the DOUT path to be used in
this case is dependent on the amount of resolution required
and the amount of growth expected at the output.
DOUT0 - 19
HSP48908
#2
HSP48908
#4
CASO0 - 7
FILTERED
IMAGE
DATA
CASO0 - 7
CASI0 - 16
CASI0 - 16
5x5
FILTER KERNEL
CONVOLVER
COEFFICIENT MASKS
ABCDE
OKL
OAB
Frame Rate
FGHlJ
OPQ
OFG
The total time to process an image is given by the formula:
KLMNO
OUV
000
MNO
CDE
RST
HIJ
R = number of rows in the image.
WXY
000
C = number of pixels in a row.
T = R x C/F, where:
PQRST
UVWXY
T = time to process a frame.
FIGURE 6. 5 x 5 KERNEL CONVOLUTION ON A 1K x N IMAGE
In any of the cascade configurations, only 16 bits of the 20bit output (DOUT0-19) can be connected to the 16 cascade
inputs (CASI0-15) of another convolver. Which 16 bits are
chosen, depends upon the amount of growth expected at the
convolver output. The amount of growth is dependent on the
input pixel data and the coefficients selected for the
convolution operation. The maximum possible growth is
calculated in advance by the user, and the convolvers are
set up to appropriately shift the cascade input data through
F = clock rate of the HSP48908.
Note that the size of the kernel does not enter into the
equation. Convolvers cascaded for larger kernels or larger
frame sizes, as in the examples shown, process the image in
the same amount of time as a single HSP48908 convolving
the image with a 3 x 3 kernel. Therefore, there is no
performance degradation when cascading multiple
HSP48908s.
Z -1 Z -1 Z -1
IMAGE
DATA
DIN0 - 7
DOUT0 - 19
DIN0 - 7
DOUT0 - 19
ROW
BUFFER
CASI0 - 7
ROW
BUFFER
ROW
BUFFER
CASI8 - 15
ROW
BUFFER
CASI0 - 7
CASI8 - 15
+
+
DIN0 - 7
DIN0 - 7
DOUT0 - 19
DOUT0 - 19
ROW
BUFFER
CASI0 - 7
ROW
BUFFER
CASI0 - 7
ROW
BUFFER
CASI8 - 15
ROW
BUFFER
CASI8 - 15
+
FIGURE 7. 5 x 5 KERNEL CONVOLUTION ON A 2K x N IMAGE
14
FILTERED
IMAGE
DATA
HSP48908
Absolute Maximum Ratings
Thermal Information
Supply Voltage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . +8.0V
Input, Output or I/O Voltage Applied . . . . .GND -0.5V to VCC +0.5V
ESD Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Class 1
Thermal Resistance (Typical, Note 1)
θJA (oC/W) θJC (oC/W)
MQFP Package . . . . . . . . . . . . . . . . . .
48.0
N/A
PLCC Package . . . . . . . . . . . . . . . . . .
34.0
N/A
PGA Package . . . . . . . . . . . . . . . . . . .
35.0
6.0
Maximum Junction Temperature (TJ)
MQFP Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150oC
PLCC Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150oC
PGA Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175oC
Maximum Storage Temperature Range . . . . . . . . . .-65oC to 150oC
Maximum Lead Temperature (Soldering 10s) . . . . . . . . . . . . . 300oC
(PLCC, MQFP - Lead Tips Only)
Operating Conditions
Temperature Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . 0oC to 70oC
Voltage Range . . . . . . . . . . . . . . . . . . . . . . . . . . . . +4.75V to 5.25V
Die Characteristics
Number of Transistors or Gates . . . . . . . . . . . . .190,000 Transistors
CAUTION: Stresses above those listed in “Absolute Maximum Ratings” may cause permanent damage to the device. This is a stress only rating and operation of the
device at these or any other conditions above those indicated in the operational sections of this specification is not implied.
NOTE:
1. θJA is measured with the component mounted on an evaluation PC board in free air.
DC Electrical Specifications
PARAMETER
VCC = 5.0V + 5%, TA = 0oC to 70oC
SYMBOL
TEST CONDITIONS
MIN
MAX
UNITS
Logical One Input Voltage
VIH
VCC = 5.25V
2.0
-
V
Logical Zero Input Voltage
VIL
VCC = 4.75V
-
0.8
V
High Level Clock Input
VIHC
VCC = 5.25V
3.0
-
V
Low Level Clock Input
VILC
VCC = 4.75V
-
0.8
V
Output HIGH Voltage
VOH
IOH = 400µA, VCC = 4.75V
2.6
-
V
Output LOW Voltage
VOL
IOL = +2.0mA, VCC = 4.75V
-
0.4
V
Input Leakage Current
II
VIN = VCC or GND, VCC = 5.25V
-10
10
µA
I/O Leakage Current
IO
VOUT = VCC or GND
-10
10
µA
Standby Power Supply Current
ICCSB
VIN = VCC or GND, VCC = 5.25V,
Outputs Open
-
500
µA
Operating Power Supply Current
ICCOP
f = 20MHz, VIN = VCC or GND, (Note 2)
-
140
mA
f = 1MHz, VCC = Open, All Measurements
are referenced to device GND (Note 3).
-
10
pF
-
12
pF
Input Capacitance
CIN
Output Capacitance
CO
NOTES:
2. Power supply current is proportional to operating frequency. Typical rating for ICCOP is 7.0mA/MHz.
3. Not tested, but characterized at initial design and at major process/design changes.
15
HSP48908
AC Electrical Specifications
VCC = 5.0V ±5%, TA = 0oC to 70oC
-32 (32MHz)
PARAMETER
SYMBOL
MIN
MAX
MIN
MAX
UNITS
tCYCLE
31
-
50
-
ns
Clock Pulse Width High
tPWH
12
-
20
-
ns
Clock Pulse Width Low
tPWL
13
-
20
-
ns
Data Input Setup Time
tDS
13
-
14
-
ns
Data Input Hold Time
tDH
0
-
0
-
ns
Clock to Data Out
tOUT
-
16
-
22
ns
Address Setup Time
tAS
13
-
13
-
ns
Address Hold Time
tAH
0
-
0
-
ns
Configuration Data Setup Time
tCDS
14
-
16
-
ns
Configuration Data Hold Time
tCDH
0
-
0
-
ns
LD Pulse Width
tLPW
12
-
20
-
ns
LD Setup Time
tLCS
25
-
30
-
ns
CIN0-7 Setup to CLK
tCS
14
-
16
-
ns
CS Setup to LD
tCSS
0
-
0
-
ns
CIN0-7 Hold Time from CLK
tCH
0
-
0
-
ns
CS Hold from LD
tCSH
0
-
0
-
ns
RESET Pulse Width
tRPW
31
-
50
-
ns
21
-
25
-
ns
tFPW
31
-
50
-
ns
EALU Setup Time
tES
12
-
14
-
ns
EALU Hold Time
tEH
0
-
0
-
ns
HOLD Setup Time
tHS
11
-
12
-
ns
HOLD Hold Time
tHH
1
-
1
-
ns
Output Enable Time
tEN
Note 6
-
16
-
22
ns
Output Disable Time
tOZ
Note 8
-
28
-
32
ns
Output Rise Time
tR
From 0.8V to 2.0V, Note 8
-
6
-
6
ns
Output Fall Time
tF
From 2.0V to 0.8V, Note 8
-
6
-
6
ns
Clock Period
FRAME Setup to Clock
tFS
FRAME Pulse Width
NOTES
-20 (20MHz)
Note 4
Note 5
NOTES:
4. This specification applies only to the case where the HSP48908 is being written to during an active convolution cycle. It must be met in order to
achieve predictable results at the next rising clock edge. In most applications, the configuration data and coefficients are loaded asynchronously
and the TLCS Specification may be disregarded.
5. While FRAME is an asynchronous signal, it must be deasserted a minimum of TFS ns prior to the rising clock edge which is to begin loading
pixel data for a new frame.
6. Transition is measured at ±200mV from steady state voltage with loading as specified in test load circuit with CL = 40pF.
7. AC Testing is performed as follows: Input levels (CLK Input) 4.0 and 0V, Input levels (all other Inputs) 0V and 3.0V, Timing reference levels (CLK)
= 2.0V, (Others) = 1.5V; output load per test load circuit with CL = 40pF. Output transition is measured at VOH ≥ 1.5V and VOL ≤ 1.5V.
8. Controlled via design or process parameters and not directly tested. Characterized upon initial design and after major process and/or design
changes.
16
HSP48908
Test Load Circuit
S1
DUT
(NOTE 9) CL
±
IOH
1.5V
IOL
EQUIVALENT CIRCUIT
NOTES:
9. Includes stray and jig capacitance.
10. Switch S1 Open for ICCSB and ICCOP Tests.
Timing Waveforms
tCYCLE
tPWL
CLK
tPWH
tDS
tDH
DIN0 - 7, CASI0 - 15
tOUT
DOUT0 - 19, CASO0 - 7
tCH
tCS
CIN0 - 7 (TO ALU REGISTER)
FIGURE 8. FUNCTIONAL TIMING
CLK
OE
tES
tEN
tEH
tOZ
1.7V
EALU
DOUT0 - 19
FIGURE 9. EALU TIMING
17
1.3V
1.5V
FIGURE 10. THREE-STATE CONTROL
HSP48908
Timing Waveforms
(Continued)
LD
t CSS
t CSH
t LPW
CS
t AS
t AH
A0 - 2
t CDS
t CDH
CIN0 - 9
FIGURE 11. CONFIGURATION TIMING
CLK
t LCS
LD
FIGURE 12. SYNCHRONOUS LOAD TIMING
CLK
t HH
t HS
t HS
HOLD
INTERNAL
CLOCK
FIGURE 13. HOLD TIMING
CLK
t RPW
RESET
t FPW
t FS
FRAME
FIGURE 14. FRAME AND RESET TIMING
All Intersil semiconductor products are manufactured, assembled and tested under ISO9000 quality systems certification.
Intersil semiconductor products are sold by description only. Intersil Corporation reserves the right to make changes in circuit design and/or specifications at any time without notice. Accordingly, the reader is cautioned to verify that data sheets are current before placing orders. Information furnished by Intersil is believed to be accurate and
reliable. However, no responsibility is assumed by Intersil or its subsidiaries for its use; nor for any infringements of patents or other rights of third parties which may result
from its use. No license is granted by implication or otherwise under any patent or patent rights of Intersil or its subsidiaries.
For information regarding Intersil Corporation and its products, see web site http://www.intersil.com
18
Similar pages