# DCE2 - a small footprint sequential clustering algorithm for the DHP in Belle 2 PXD

#### A. Wassatsch

Max-Planck-Institut für Physik Semiconductor Laboratory Munich (Germany)

6th International Workshop on DEPFET Detectors and Applications Bonn (Germany) 7.-9.2.2011



A. Wassatsch (MPI Physik/HLL) DCE2 - clustering algorithm for Belle 2 PXD

## Outline

## DCE2 sequential clustering algorithm

- the algorithm
- performance: compression
- performance: cluster distribution
- parameter
- dce2 test chip architecture
- joined submission of the dce2 test design and lvds TX
- integration



## • area reduction by sequential data processing compared to the full parallel operation of first dce

- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- lossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- lossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- lossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- lossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- lossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)

#### lossless data compression

 can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)



- area reduction by sequential data processing compared to the full parallel operation of first dce
- software algorithm inspired architecture
  - weakly coupled data handling agents
  - waiting queues for input scheduler, free and ready clustering agents
  - binary tree based selection algorithm (recursive VHDL)
- Iossless data compression
- can handle any shaped cluster structure in the limits given by the design constrains (fifo depth)





## • 64 channel core with 8 agents running $@500MHz \Rightarrow \sim 95GOPSs$

A. Wassatsch (MPI Physik/HLL) DCE2 - clustering algorithm for Belle 2 PXD

#### compression



 can be further improved by using data depending sorting and adapted relative address coding



- background Kolja's testdata from 27.10.2010
- overlayed to reach 5% filling



- high cut values increase non compressable single pixel event number
- in max 19 pixels in one row

## • dce2 algorithm is implemented as a parameterizable VHDL model

- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000 µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000µm per each
- could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with  $10.000 \mu m$  per each

• could these be the final values ?



- dce2 algorithm is implemented as a parameterizable VHDL model
- parameter should be carefully chosen to balance the performance and area requierements
  - relation of row clock to core clock : numbers of pixel which can be directly handled per row : 10 : 5-14 depending on SynLib and power budget
  - number of clustering agents : how many different structures can be handled in a common strip of adjanced rows : 8 : 2<sup>x</sup> with 20.000µm<sup>2</sup> per each
  - depth of the internal pixel fifo in the clustering agents : maximum number of pixel which belongs to a real cluster : 16
  - count width of the pixel counter in the clustering agents : maximum number of pixel which belongs to a single background event pattern : 31
  - depth of the input fifo structure : factor to increase the max numbers of pixels per row : 1 : with 10.000μm per each
- could these be the final values ?





- pad ring dominated layout, only selected signals are available via direct io pad
- full test implemented via internal JTAG accessable dual port testpattern and spy memories





- 64 channel dce2 core with jtag enabled test pattern generation
- Ivds TX test from Bonn



A. Wassatsch (MPI Physik/HLL) DCE2 - clustering algorithm for Belle 2 PXD

- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples

## • digital IP more or less equivalent

- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (✓); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (√); ..
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (√); ...
- mpw vendor statement " Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (√); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?



- dce2lvds design submmited and accepted (only IP drc's)
  - but the run on 28.02.2011 are again canceled (also the run for 06.12.2010 and in the mid of the year)
    - "a large number of designs is still missing" (end of Jan 2011)
  - each cancelation add's additional 3 months
- alternatives
  - TSMC: a full run every month via europractice, miniasic every 3 months; 40-100 samples
  - UMC: a full run every 4 months via europractice, miniasic every 4 months; 20-45 samples
- digital IP more or less equivalent
- submission cost comparable
- open points: analog blocks (pll,lvdsTX,..); C4 bumps (√); ...
- mpw vendor statement "Overall TSMC 90nm MPWs may be the best alternate choice "
- different mpw providers recommend to think about the use of 65nm technologies
- also true for us ?

## • inputs (single minimum inverter load)

- row wise full parallel data input quilified by a strobe signal
- clk (times x of the row clk)
- row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- power
  - single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



## • inputs (single minimum inverter load)

- row wise full parallel data input quilified by a strobe signal
- clk (times x of the row clk)
- row count

## output

- static information for cluster data (size, position, total energy)
- fifo stream for pixel data
- power
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



## • inputs (single minimum inverter load)

- row wise full parallel data input quilified by a strobe signal
- clk (times x of the row clk)
- row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- o power
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- power
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count

## output

- static information for cluster data (size, position, total energy)
- fifo stream for pixel data

o power

- single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000  $\mu m^2$  synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- power
  - single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000  $\mu m^2$  synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- o power
  - single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000  $\mu m^2$  synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- opwer
  - single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- opower
  - single power supply (1.2V with approx. <100mA@500MHz)
- size and integration
  - 260.000μm<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- opower
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib

 synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- opower
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - $260.000 \mu m^2$  synthesised to a dense high speed ARM lib

 synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



- inputs (single minimum inverter load)
  - row wise full parallel data input quilified by a strobe signal
  - clk (times x of the row clk)
  - row count
- output
  - static information for cluster data (size, position, total energy)
  - fifo stream for pixel data
- power
  - single power supply (1.2V with approx. <100mA@500MHz)</li>
- size and integration
  - 260.000 $\mu$ m<sup>2</sup> synthesised to a dense high speed ARM lib
- synthesis to a XILINX xcv5fx100-3 : 6500 slices 10% at 100 MHz for an 8 agent implementation



Conclusion

## Conclusion

## DCE2 algorithm

- with carefully chosen parameters, the dce2 algorithm provides a lossless data compression whitout any pixelloss
- the identified complete cluster structures can also be used in the subsequent dag system to speedup the further datareduction there

## Outlook

- definition of the final design parameter set
- integrate the dce2 into the dataflow of the DHP
- solve the "foundry" problem



Conclusion

## Conclusion

## DCE2 algorithm

- with carefully chosen parameters, the dce2 algorithm provides a lossless data compression whitout any pixelloss
- the identified complete cluster structures can also be used in the subsequent dag system to speedup the further datareduction there

## Outlook

- definition of the final design parameter set
- integrate the dce2 into the dataflow of the DHP
- solve the "foundry" problem

