# Reduction of Power Dissipation Through Parallel Optimization of Test Vector and Scan Register Sequences

Zdeněk Kotásek, Jaroslav Škarvada, Josef Strnadel Brno University of Technology, Faculty of Information Technology Božetěchova 2, 61266 Brno, Czech Republic {kotasek | skarvada | strnadel} @fit.vutbr.cz

Abstract-In the paper, novel method for reducing power dissipation during test application time is presented. When compared to existing methods, its advantage can be seen in the fact that power dissipation is evaluated by means of precise and fast simulation based metric rather than by means of commonly utilized simple metric based on evaluating Hamming distance between test vectors. In our method, the metric is evaluated over CMOS primitives from AMI technological libraries. In order to reduce power dissipation, the sequence of test vectors to be applied and proper ordering of registers within scan chains are optimized. In existing approaches, the optimizations are typically performed separately in a sequence because problems they correspond to are seen to be independent. On contrary to that, we have united the search spaces and solved these two problems as a single optimization task. Genetic algorithm operating over an appropriate encoding of the problem was utilized to optimize the problem. Proposed method was implemented in both single and multiprocessor environments and it was successfully tested to cooperate with commercial tools. At the end of the paper, results achieved over benchmarks from ISCAS85, ISCAS89 and ITC99 sets are presented and compared to results of existing methods.

Keywords – power dissipation, test ser reorganization, scan chain reordering, test application time, genetic algorithm, optimization, search space investigation

## I. INTRODUCTION

Portable computer systems and embedded systems are examples of electronic devices which are powered from batteries, therefore they are designed with the goal of low power dissipation. Low power dissipation becomes important not only during normal functional mode but during test application as well when switching activity is higher than in functional mode. The methods for power reduction during test application are used for the following reasons: to reduce 1) the effects of high current (electro-migration, induction effects, the decline of voltage on power wires etc.) and 2) the requirements on heat dissipation and power supplier during test application. The criterion to compare the result achieved by the methods can be defined in the following way: Let  $P_1$  be mean power dissipation value during test application before applying power optimizing procedure,  $P_2$  be the mean power dissipation during test application after applying an optimizing procedure,  $t_1$  be test application time before applying an optimizing procedure,  $t_2$  be test application time after applying an optimizing procedure. Then, the methods satisfying the condition  $P_1t_1 > P_2t_2$  are able to reduce power dissipation during test application. In general, two approaches to low power testing exist: some of them are directed to reducing *dynamic power* dissipation (*switching power*), while the others have a goal to reduce *static power* dissipation (*leakage power*). It is important to say that in older implementations, dynamic power dissipation was higher than the static one (in [25], it is reported that the dynamic power dissipation is about 90% of the total power dissipation).

The structure of the paper is as follows: First, the background related to our research is mentioned briefly. Then, our approach is explained together with the information about implementation of algorithms developed. As a result of our research, valuable results were gained, they are presented in the subsequent section. In the conclusions, the results are summarized.

## II. BACKGROUND

## A. Power Dissipation – Basic Concepts and Trends

It is evident that a very effective way how to reduce power dissipation is through the reduction of supply voltage. Therefore, it is a trend in modern VLSI technologies to have the power supply voltage lower than in previous technologies. To maintain the value of noise immunity, with the reduction of supply voltage threshold voltage must be reduced as well which causes the static power dissipation to raise exponentially. As a consequence, in 90 nm technology the dynamic power dissipation is 58% of total power dissipation. According to [23] 65 nm technology is seen as the technology in which the values of static power dissipation begins to prevail over the dynamic one). It is even more evident in technologies with higher level of integration (32 nm, 25 nm etc.) in which the static power dissipation is higher than the dynamic one. Thus, to choose proper and effective optimizing procedures to decrease power dissipation, the information about the target technology must be known. Another criterion to be used to categorize methods reducing power dissipation is based on categorizing test set developed to test the component under design. In this way, Test Set Dependent (TSD) and Test Set Independent (TSI) methods can be distinguished [12]. While TSI based methods (static [18] and dynamic [4] power reduction) are based on circuit structure modification only (i.e., the reduction of power dissipation is independent of the test set used), TSD based methods use both test and circuit structure

modifications. In general, TSD based methods are able to achieve better results comparing to TSI based methods. In [27] special ATPG is proposed that increases the correlation between test vectors, in [11] don't care bits are set to 0s or 1s in order to reduce switching activity during test application time, in [8] approach based on elimination of selected pseudorandom vectors produced by an ATPG is presented. In many works, solutions are suggested for scan based structures – e.g., in [17] it is shown that power dissipation can be reduced if test sequence is reorganized, in [16] more scan chains equipped with scan chain disable signal are utilized to reduce the dissipation etc. For tests applied through scan registers, power dissipation can be also influenced by the phase at which test vectors are applied to primary inputs of the component under test. Two basic strategies exist [12]: ASAP (As Soon As Possible) - test vectors are applied to primary inputs as soon as possible and ALAP (As Last As Possible) - test vectors are applied to primary inputs as late as possible. The strategy used can have a strong impact on the number of transitions during test application procedure. An improvement can be recognized in BPIC (Best Primary Input Change time) approach [12] able to calculate the most convenient times for applying test vectors to primary inputs: up to 95 % reduction can be achieved comparing to ASAP/ALAP strategies. Significant portion of TSD based methods is based on test scheduling methods [20] typically applied at SoC level.

## B. Power Dissipation Evaluation

In order to be able to optimize power dissipation, it is important to develop methods which allow to evaluate power dissipation during test application. It is evident that direct measuring of voltage and current delivered to device under test is certainly the most precise and reliable evaluation of power dissipation during test application but it is very difficult to be implemented. To avoid this, indirect methods can be used which are based on measuring temperature during test application [2]. Various statistical and simulation methods are frequently used to evaluate the results of optimizing procedures. These procedures usually use some simplifying *metrics*. To compare the quality of solutions aiming at reducing power dissipation, NTC (Number of Transition Count) appears to be applicable metric [15]. For better comparison of solutions, other metrics can be used, e.g. WNTO (Weighted Number of Transition Count) [12], WSA (Weighted Switching Activity) [6] or TPS (Test Per Scan) strategy [17]. Statistic methods for power dissipation specification indicate low computational complexity (high speed) but lower precision [14]. To attain the highest possible precision, it is necessary to work with the immediate value of power dissipation. Because this is computationally complex problem, an approximation of the dissipation is typically utilized in practice. In [14], the following simulation methods are distinguished: methods utilizing full synthesis, methods utilizing limited synthesis and black boxes method. In the first group of simulation methods, the simulation is performed on the level of chip physical layout. The simulation is the most precise one but the most time consuming one. In the second group of methods, the design is mapped to the predefined set of elements (technological library). From models in the library simulation data with required accuracy needed for simulation can be

gained. These models are developed by means of simulation on physical level and results possibly verified by measuring. The black boxes method is based on grouping selected components into blocks. On these blocks, the responses on predefined input data are gained. During simulation the responses to input data are gained through extrapolation/interpolation from responses gained in previous step. Modern commercial tools are able to generate high quality sets of test vectors with high degree of fault coverage which are not usually optimized to reduce power dissipation. Therefore various methods were developed to optimize the sequences of test vectors [1][3][5][7][10]. If full scan chain is implemented in the circuit, then for test generation process the circuit is seen as combinational and ATPG can be used to generate the test. In that case, fault coverage does not depend on the sequence of test vectors. Thus, also the sequence of scan registers within scan chains can be reorganized to reduce power dissipation. However, the problem of identifying the proper sequence of test vectors/scan registers belongs to the category of NP-hard problems [5].

# III. NEW APPROACH TO POWER DISSIPATION

## A. Motivation for the Research

We analyzed the state of the art of existing methodologies aiming at optimizing power dissipation during test application. To solve this issue, it is necessary to reorganize the sequence of test vectors and the sequence of scan registers. In previous approaches these two issues were solved separately. The drawbacks of previously published methodologies can be summarized in the following way: The results of various methodologies are not evaluated on platforms to which they will be later implemented. The previously published approaches have test vectors as the only input data to the methodology without any information about internal structure of the component under test through which test vectors will propagate. The propagation of test vectors through component internal structure represents additional switching activity which can have a significant impact on power dissipation. Most of methodologies are based on the evaluation of Hamming distance between input test vectors without any coupling with implementation platform. As a result, the impact of test vectors reorganization on power dissipation through switching activity reduction is rather difficult to be precisely evaluated. We also see that both procedures, i. e. test vectors reorganization and scan chain reorganization are performed in sequence as two separate procedures in previous methodologies. Our approach is based on concurrent optimization of both procedures. For this purpose genetic algorithm (GA) was used.

#### B. Basic Principles of the Methodology

For the purposes of the methodology, formal model was developed first. It is based on discrete mathematics concepts. The methodology operates on the formal model. Operators and auxiliary algorithms were defined to encode/decode the task into genotype to phenotype of GA population entities. These algorithms are described by formal language developed for this purpose. The algorithms for test application simulation were also defined together with the principles of power dissipation metrics evaluation. After this procedure, algorithms for the selection of solutions which satisfy the predefined requirements on the value of power dissipation process the data. If acceptable solution is not available then the algorithm continues with developing next generation of solutions, crossover and mutation operators are used for this purpose. The evaluation is concluded if the number of iterations is run out or acceptable solution is attained. It is important to say that it is not guaranteed that the solution of the problem gained by our approach is the optimal one. It is so because the complete state space of the problem is not investigated. It is guaranteed that the solution satisfies the predefined requirement on power dissipation. The computational complexity of the problem is so extensive that the use of GA or some other optimizing procedure is justified.



## C. Implementation Details

In Figure 1., the complete scheme of our methodology is shown. Rhomboids indicate data which are delivered into the optimization process while rectangles demonstrate the steps of the procedure with component under analysis specification in HDL as the input into it. HDL specification is mapped into AMI technological library. Full scan chain is then configured into the design (it can be possibly split into several scan chains), DFTAdvisor is used for this purpose and the sequence of test vectors generated by FlexTest from Mentor Graphics. All these steps belong to initialization phase of the methodology. The set of test vectors is converted into ASCII file, the VHDL or Verilog description of the component is converted into binary format. It is easier to work with these types of data than with VHDL/Verilog formats. During initializing phase, the starting population of entities is developed – consisting of the sequence of test vectors and the order of scan registers, they are then solved as a unified problem. A predefined condition (number of iterations) is determined before the optimizing procedure is started. It is also defined which solutions will be seen as satisfactory ones (in terms of power dissipation during test application). The optimizing procedure is terminated either when the predefined number of iterations is reached or the value of power dissipation achieved. In each step, population entities are assessed. The assessment lies in decoding genome into the vector of priorities which determines the sequence of applying test vectors and the organization of scan chain/chains. The vectors of priorities are utilized in the simulation of test application. The simulation is performed on technological library. As a result of the simulation, a value in a selected

metrics (for example NTC metric) is gained which reflects power dissipation during test application. This value is then converted to fitness value which reflects the quality of the solution. The best solutions are then identified, the crossover and mutation of these solutions is then performed. In this way a new generation of entities (solutions) is produced. In genetic algorithms elitism is used so that the best solutions are not lost during population development. As soon as the termination condition is reached, the best entity is identified, its genome is decoded, the sequence of applying test vectors and the organization of scan chain/chains is derived which is the output of the methodology. In Figure 1. , the application is identified as *permfind* application. The steps of the optimizing procedures can be recognized in dashed line area.

For the communication with the software tool user-friendly interface was developed. The user chooses one of possible dissipation metrics which will be quantified during simulation, sets parameters for genetic algorithm, and identifies both input files: the file describing the component and the file containing test set. If the test set is recognized to be incompatible (see section C below), the application converts it to a compatible version first. The reorganized sequence of test vectors together with the new (i.e. optimized) sequence of scan registers are the outputs of the application. User is also provided with the information how the values of power dissipation metrics were improved (their values before and after executing the optimization).

#### D. Compatibility with Professional DfT Tools

It is important to develop the methodology in such way that it is compatible with professional DfT tools. It means that the outputs of DfT tools can be used as inputs to our methodology and its software implementation [21]. The following rules were defined to satisfy this requirement: 1) The test must be developed in the way which allows to sub-divide it into test cycles. During test cycle either test vectors are applied or scan chain is emptied. 2) For full scan based components, the number of test cycles must be equal to the number of test vectors plus one. For non scan components the number of test cycles must be the same as the number of test vectors. 3)

For full scan based components, the number of clock pulses is equal to the number of FFs in the longest scan chain plus one. 4) In each test cycle (except of the last one in case of full scan), one test vector is applied. 5) For full scan components in each test cycle one test vector is serially loaded into the register. In the last test cycle any value can be loaded to scan chain, the scan chain is emptied from diagnostic data. 6) In full scan components, a response to preceding test cycle is serially read out in each test cycle. 7) To all scan chains which are shorter that the longest one, a fill must be loaded.

For the situations when test sequences do not satisfy these rules, the principles were developed to convert an incompatible test to a compatible one. We recognized two types of possible incompatibilities in sequences of test vectors developed by professional DfT tools: 1) During one test cycle more than one test vector is supposed to be applied via primary inputs. 2) During one test cycle more than one test vector is supposed to be applied through scan chain.

#### IV. EXPERIMENTAL RESULTS

In the paragraph, experimental results gained by our approach are described compared to results of other published methods. It should be noted that comparison in an objective way is fairly problematic because parameters of the methods differ a lot - e.g., circuits analyzed by the methods are mapped onto various platforms, various test pattern generators with different settings are used by the methods or the methods differ in a way they summarize achieved results. Moreover, some data are not available for some methods, so it is impossible to guarantee the equality of input conditions to experiments. In all experiments related to our optimization method, the circuits were mapped onto AMI 0.5um library by means of Leonardo Spectrum tool. While test vector set is the only input to the optimizing procedure for combinational circuits, organization of registers in scan chains must be also taken into account if sequential circuits are processed (the circuits were modified to their full scan versions by DFTAdvisor tool; for simplification, one scan chain was utilized). Test vectors under the stack-atfault model were generated by Flextest tool. PC equipped with two AMD Opteron 2220 dual core CPUs operating at 2.8 GHz was utilized to perform the experiments.

## A. Results and Their Discussion

In TABLE I., results achieved by the proposed method are compared to results produced by the method published in [10]. As a common comparison base, circuits from ISCAS85 and ISCAS89 benchmark suites were used. For each of the methods, averages of the best results attained over 20 genetic algorithm runs are presented in the table. The method published in [10] is based on determining Hamming distance among test vectors and optimizes test vector sequence before reordering of registers is started.

 
 TABLE I.
 DETAIL COMPARISON OF OUR RESULTS TO RESULTS PRESENTED IN [10]

| circuit | [10]<br>(results<br>for HV<br>TSP,<br>GA) | Proposed method<br>(results for Mentor, AMI 0.5 um) |        |                    |                         |                       |  |
|---------|-------------------------------------------|-----------------------------------------------------|--------|--------------------|-------------------------|-----------------------|--|
|         | r1 [%]                                    | # TC                                                | FC [%] | r <sub>2</sub> [%] | r <sub>2HD</sub><br>[%] | r <sub>2SHD</sub> [%] |  |
| c432    | 69.0                                      | 81                                                  | 98.64  | 64.8               | 59.9                    | 84.9                  |  |
| c880    | 80.0                                      | 95                                                  | 100.00 | 75.0               | 75.1                    | 90.9                  |  |
| c1355   | 84.4                                      | 69                                                  | 99.80  | 80.0               | 68.0                    | 89.6                  |  |
| c1908   | 65.7                                      | 8                                                   | 42.47  | 76.1               | 63.7                    | 81.5                  |  |
| c2670   | 90.1                                      | 59                                                  | 59.22  | 86.4               | 88.6                    | 94.0                  |  |
| c7552   | 91.5                                      | 332                                                 | 99.87  | 91.2               | 91.1                    | 98.9                  |  |
| s298    | 58.2                                      | 50                                                  | 96.87  | 81.4               | 79.7                    | 96.7                  |  |
| s444    | 68.9                                      | 65                                                  | 97.07  | 64.6               | 73.5                    | 92.7                  |  |
| s641    | 77.0                                      | 88                                                  | 99.13  | 65.9               | 79.9                    | 91.0                  |  |
| s1423   | 84.7                                      | 133                                                 | 99.52  | 74.8               | 82.3                    | 95.5                  |  |
| s1488   | 58.7                                      | 156                                                 | 88.50  | 70.6               | 72.5                    | 88.6                  |  |
| s5378   | 90.3                                      | 309                                                 | 99.16  | 85.3               | 89.0                    | 99.1                  |  |

Because information about tools utilized to achieve the results produced by the method as well as information about experimental setup are not available in [10], it was impossible to compare other results than reductions achieved.

In TABLE I., the following information is stored for each circuit:  $r_1$  – reduction attained by methodology described in [10], # TC – number of test cycles, FC – fault coverage of test vector set,  $r_2$  – reduction achieved by our method, NTC metric was used,  $r_{2HD}$  – reduction using the metric evaluating Hamming distance among test vectors,  $r_{2SHD}$  – reduction gained if the switching activity was evaluated by simulation, NTC metric was used (test vector application sequence and ordering of registers in scan chains corresponded to results presented in  $r_{2HD}$ ). Results produced by proposed method which are better than those presented in  $r_1$  are visualized in boldface (this holds also for other tables). In the table, it can be seen that for most of the circuits, reduction achieved by our method  $(r_2)$  is better than reduction  $(r_7)$  presented in [10]. Alike for  $r_1$  and  $r_{2HD}$ columns, it can be concluded that even when our method worked also with Hamming distance metric among test vectors, it produced better results than those achieved by method presented in [10]. Thus, even though the same metric was utilized, our method produced better results because the search spaces were not investigated in a sequence, but in parallel.

For further comparison, methods based on metrics others than Hamming distance evaluation were utilized because more information related to them is published. In TABLE II., results attained by our method are compared to those presented in [7]. For each of the methods, the best results achieved over 20 runs of a genetic algorithm are presented in the table. The following information can be identified in columns of the table: TV number of test vectors,  $NTC_1 -$  original NTC of a circuit determined by the method published in [7],  $NTC_{opt1} -$  NTC gained if  $r_1$  reduction was applied, FC - fault coverage gained for test set (these values are not available in [7]),  $NTC_{opt2} -$ NTC gained after reduction produced by our method,  $r_2$ reduction gained by our method. While NTC values provided in the table differ a lot because different technological libraries are utilized by these methods,  $r_1, r_2$  values are very similar.

 
 TABLE II.
 DETAIL COMPARISON OF OUR RESULTS TO RESULTS PRESENTED IN [7]

| cir.  |         | [7] (HITEST,<br>ES2, heuristic) |                     |                       |         | Proposed method<br>(AG/MG mode,<br>Mentor, AMI 0.5 um) |                     |                       |  |
|-------|---------|---------------------------------|---------------------|-----------------------|---------|--------------------------------------------------------|---------------------|-----------------------|--|
|       | #<br>TV | NTC <sub>1</sub>                | NTC <sub>opt1</sub> | r <sub>1</sub><br>[%] | #<br>TC | FC<br>[%]                                              | NTC <sub>opt2</sub> | r <sub>2</sub><br>[%] |  |
| c880  | 27      | 7140                            | 6403                | 89.7                  | 95      | 100.0                                                  | 7487                | 75.0                  |  |
| c1355 | 89      | 33181                           | 28511               | 85.9                  | 69      | 99.8                                                   | 5945                | 80.0                  |  |
| b12   | 33      | 7214                            | 4429                | 61.4                  | 299     | 98.83                                                  | 8214                | 58.0                  |  |
| c1908 | 69      | 48826                           | 38250               | 78.3                  | 107     | 99.46                                                  | 7872                | 75.1                  |  |
| c3540 | 82      | 82241                           | 67603               | 82.2                  | 233     | 98.86                                                  | 67204               | 81.3                  |  |
| c5315 | 56      | 99154                           | 90101               | 90.9                  | 173     | 99.61                                                  | 102950              | 90.1                  |  |
| c6288 | 41      | 462018                          | 424086              | 91.8                  | 69      | 99.96                                                  | 25681               | 91.0                  |  |
| c7552 | 72      | 232123                          | 159414              | 68.7                  | 332     | 99.87                                                  | 237008              | 91.2                  |  |

In the table, it can be seen that (except of c7552 circuit) the results gained by our method are better (i.e., having lower numerical values) for the circuits included in the table. Benefits typical for parallel optimization of both test vector sequence and ordering of registers in scan chains were applicable only to b12, because it was the only sequential circuit involved in the experiment.

In TABLE III. , results of our method are compared to results presented in [5]. The meaning of symbols in the table is the same as in previous table. There was no information available in [5] concerning fault coverage and test vector set, thus it was not possible to compare these parameters to results of our method. Except of c27 circuit, better reduction values were gained by our method than by [5]. All the circuits are sequential circuits, so benefits typical for parallel optimization of both test vector sequence and ordering of registers in scan chains could be applied to all of them.

In TABLE IV., results gained for a subset of ITC99 benchmarks are presented and compared to reduction  $(r_1)$  published in [1] and [3]. In the table,  $r_2$  attained by our method is available in the table together with other parameters having the same meaning as in previous tables. In all cases, it is evident our  $r_2$  values are better than  $r_1$  values presented in [1] and [3].

As WSA was the most precise metric actually implemented in the method presented in the paper, relation between WSA metric and quality of produced results was analyzed in the next experiment. The results are summarized in TABLE V. In the columns of the table, following information is available: *WSA*  $(WSA_{opt}) - WSA$  value before (after) optimization, r – reduction achieved, t – computation time.

 TABLE III.
 Detail Comparison of Our Results to Results

 PRESENTED IN [5]

| circuit | [5] (ATALANTA, greedy search) |                     |                       | AG mode, Mentor, AMI 0.5 um |           |                     |                       |  |
|---------|-------------------------------|---------------------|-----------------------|-----------------------------|-----------|---------------------|-----------------------|--|
| circun  | NTC <sub>1</sub>              | NTC <sub>opt1</sub> | r <sub>1</sub><br>[%] | # TC                        | FC<br>[%] | NTC <sub>opt2</sub> | r <sub>2</sub><br>[%] |  |
| s27     | 49                            | 22                  | 44.9                  | 11                          | 83.06     | 166                 | 66.8                  |  |
| s298    | 28644                         | 26034               | 90.9                  | 187                         | 83.67     | 33186               | 75.0                  |  |
| s344    | 20440                         | 18923               | 92.6                  | 46                          | 97.44     | 19610               | 83.7                  |  |
| s349    | 20790                         | 19088               | 91.8                  | 133                         | 83.59     | 14481               | 71.0                  |  |
| s382    | 58667                         | 54147               | 92.3                  | 773                         | 84.69     | 47437               | 73.3                  |  |
| s386    | 7996                          | 6715                | 84.0                  | 100                         | 98.09     | 38413               | 70.8                  |  |
| s444    | 66186                         | 63050               | 95.3                  | 65                          | 97.07     | 41851               | 64.6                  |  |
|         |                               |                     |                       |                             |           |                     |                       |  |

## TABLE IV. Results Achieved Over ITC99 Benchmarks

| cir | [1]<br><i>r</i> 1<br>[%] | [3]<br><i>r</i> 1<br>[%] | #<br>TC | FC<br>[%] | NTCopt <sub>2</sub> | <i>r</i> <sub>2</sub><br>[%] |
|-----|--------------------------|--------------------------|---------|-----------|---------------------|------------------------------|
| b01 | 92.78                    | -                        | 24      | 77.09     | 1713                | 68.3                         |
| b02 | 87.37                    | -                        | 25      | 70.99     | 757                 | 67.2                         |
| b03 | 98.68                    | 85.2                     | 52      | 96.83     | 59044               | 75.0                         |
| b04 | -                        | 95.9                     | 101     | 95.13     | 747610              | 59.6                         |
| b06 | 71.3                     | -                        | 32      | 88.81     | 4267                | 66.7                         |
| b07 | -                        | 91.4                     | 117     | 96.30     | 445179              | 74.7                         |
| b09 | 87.06                    | 98.7                     | 68      | 98.21     | 136552              | 75.3                         |
| b10 | 94.93                    | 77.3                     | 97      | 95.51     | 100269              | 74.6                         |
| b11 | -                        | -                        | 160     | 92.65     | 1039014             | 88.1                         |
| b13 | -                        | -                        | 100     | 97.52     | 444572              | 81.0                         |
| b14 | -                        | -                        | 620     | 96.61     | 98169725            | 87.6                         |
| b15 | -                        | -                        | 1297    | 96.44     | 249128549           | 93.0                         |

Because fan-outs and capacities are considered when WSA metric is applied, these data are also present in the table:  $Cin_{avg}$  – average capacity of circuit's inputs and  $FO_{avg}$  – average fan-out of circuit's outputs.

# B. Scalability of the Task

All results presented above were achieved in a 1-CPU environment, i.e., just one solution within the search space was evaluated at particular moment (no parallelism). In order to reduce time needed to find high quality results and to demonstrate the scalability of the task, we tried to determine parallelizable parts within the optimized task and to run the task in a multiprocessor environment. Corresponding results are presented in the next. Computational system composed of two 4-core Intel Xeon X5355 CPUs (i.e., 2x4 = 8 CPUs in total) running on 2.66 GHz was utilized. In TABLE VI. results gained for b03 circuit are presented. The meaning of symbols utilized in the table is as follows: # CPUs (n) – number of CPUs utilized to execute the method,  $t_{exe}$  - time needed to execute the method on a given number of CPUs, speedup –  $(t_{exe}$  for 1 CPU)/n ratio, overhead – communication overhead related to the execution on a given number of CPUs, evaluated by means of (1-speedup/#CPUs) × 100 formula.

Because execution times related to such steps as loading dynamic libraries, circuit verification, generation of look-up tables utilized during simulation etc. are included in the overhead it is evident that pure communication overhead will be smaller than presented in the table.

 TABLE V.
 Results Achieved when WSA Metric WAS APPLIED

| cir  | cin <sub>avg</sub><br>[fF] | FO <sub>avg</sub> | WSA     | WSA <sub>opt</sub> | r [%] | t [s]    |
|------|----------------------------|-------------------|---------|--------------------|-------|----------|
| s27  | 14.691                     | 1.148             | 214.5   | 145.6              | 67.9  | 2.321    |
| s298 | 16.648                     | 1.852             | 47413.4 | 37852.8            | 79.8  | 867.010  |
| s344 | 15.937                     | 1.803             | 28241.2 | 24480.2            | 86.7  | 714.826  |
| s349 | 15.996                     | 1.809             | 22414.8 | 15581.3            | 69.5  | 616.456  |
| s382 | 16.334                     | 1.903             | 67192.4 | 51791.3            | 77.1  | 2322.522 |
| s386 | 16.565                     | 1.864             | 55742.6 | 39240.1            | 70.4  | 2309.653 |
| s444 | 16.532                     | 1.923             | 71537.9 | 47215.0            | 66.0  | 2279.354 |
| b01  | 11.539                     | 1.868             | 2913.0  | 1900.7             | 65.3  | 29.251   |
| b02  | 17.717                     | 1.806             | 1182.0  | 832.5              | 70.4  | 18.054   |
| b03  | 17.493                     | 2.227             | 96885.3 | 78356.4            | 80.9  | 2409.439 |
|      |                            |                   |         |                    |       |          |

| TABLE VI. SCALABI | LITY OF THE TASK |
|-------------------|------------------|
|-------------------|------------------|

| #CPUs<br>(n) | t <sub>exe</sub> [s] | speedup | overhead<br>[%] |
|--------------|----------------------|---------|-----------------|
| 1            | 8258.839             | 1.000   | -               |
| 2            | 5679.108             | 1.454   | 27.29           |
| 3            | 4055.467             | 2.036   | 32.12           |
| 4            | 3194.716             | 2.585   | 35.37           |
| 5            | 2562.390             | 3.223   | 35.54           |
| 6            | 2180.236             | 3.788   | 36.87           |
| 7            | 1908.748             | 4.327   | 38.19           |
| 8            | 1716.419             | 4.812   | 39.85           |

#### V. CONCLUSIONS

The results of our activities in the area of power dissipation reduction can be summarized in the following way: 1. methodology to reduce power dissipation based on concurrent optimization of test vectors and scan registers sequences was developed and implemented, 2. library for AMI technology describing features of elements used in the design was developed. The library was used for test application simulation in components implemented into AMI platform, 3. simulation techniques for power dissipation metrics was developed, AMI library was used for this purpose. The results of simulation were used in genetic algorithm to evaluate the value of fitness function reflecting the quality of particular solution, 4. genetic algorithm to investigate the state space of the task and find solutions which satisfy the requirements on power dissipation was developed and implemented. 5. the principles of coding the problem (i. e. the problem of reorganizing test vectors and scan registers) into genotype was developed together with the principles of algorithms to transform genotype into phenotype and vice versa. 6. the methodology was verified, valuable results were gained and compared with other approaches.

It can be concluded that combining two procedures together, namely the reorganization of test vector and scan register sequences into one procedure brought better results than previous methodologies which solved these two problems separately. Detailed information about all aspects of the methodology is available in [21][22].

## **ACKNOWLEDGEMENTS**

This work was supported by the Grant Agency of the Czech Republic (GACR) No.102/09/1668 – "SoC circuits reliability and availability improvement", by GACR No.102/09/H042 – "Mathematical and Engineering Approaches to Developing Reliable and Secure Concurrent and Distributed Computer Systems", by Research Project No.MSM 0021630528 – "Security-Oriented Research in Information Technology" and the grant "BUT FIT-S-10-1".

#### REFERENCES

- Almukhaizim, S., Makris. Y., Yang Y.-S., Veneris, A.: Seamless Integration of SER in Rewiring-Based Design Space Exploration. In: Proceedings of International Test Conference, 2006, pp. 1–9
- [2] Altet, J., Rubio, A.: Thermal Testing of Integrated Circuits. Boston, USA: Kluwer Academic Publishers, 2002, ISBN 1-4020-7076-4
- [3] Babighian, P., Kamhi, G., Vardi, M. PowerQuest: Trace Driven Data Mining for Power Optimization. In: Proceedings of Design, Automation & Test in Europe Conference & Exhibition, 2007, pp. 1–6
- [4] Bonhomme, Y., Girard, P., Guiller, L.: A Gated Clock Scheme For Low Power Testing Of Logic Cores. Journal Of Electronic Testing: Theory And Applications, Vol. 22, 2006: pp. 89 – 99

- [5] Chakravarty. S.. Dabholkar. V.: Minimizing Power Dissipation in Scan Circuits During Test Application. In Proceedings of International Workshop on Low-Power Design. 1994. p. 20.
- [6] Debjyoti, G., Swarup, B., Kaushik, R.: Multiple Scan Chain Design Technique for Power Reduction during Test Application in BIST. In 18th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT'03), 2003, pp. 191- 198
- [7] Girard. P., Landrault. C., Pravossoudovitch. S.: Reducing power consumption during test application by test vector ordering. In Proceedings of the 1998 IEEE International Symposium on Circuits and Systems. IEEE Computer Society. 1998. pp. 296–299
- [8] Girard, P., Guiller, L., Landrault, C.:: A Test Vector Inhibiting Technique for Low Energy BIST Design. In Proceedings of the 1999 17TH IEEE VLSI Test Symposium, 1999, p. 407
- [9] Huang, T. C., Lee, K. J.: Reduction Of Power Consumption in Scan-Based Circuits During Test Application by an Input Control Technique. IEEE Transaction on Computer Design of Integrated Circuits and Systems, vol. 20, ?. 7, 2001: pp. 911–917
- [10] Jelodar. M. S., Aavani. A.: Reducing Scan Base Testing Power Using Genetic Algorithm. In Proceedings of 11th Iranian Computer Engineering Conference. Vol. 2, 2006. pp. 308–312
- [11] Kajihara, S., Ishida, K., Miyase, K.: Test Power Reduction for Full Scan Sequential Circuits by Test Vector Modification. In Proceedings of the Second Workshop on RTL ATPG & DFT, 2001, pp. 140 – 145
- [12] Nicolici, N., Al-Hashimi, B. M.: Power-Constrained Testing of VLSI Circuits, Kluwer Academic Publishers, 2003, 178 p.
- [13] Niraj, K. J., Gupta, S.: Testing Of Digital Systems. Cambridge University Press, 2003
- [14] Ravi, S., Raghunathan, A., Chakradhar, S.: Efficient RTL Power Estimation for Large Designs. In Proceedings of the 16th International Conference on VLSI Design, IEEE CS, 2003, pp. 431-439
- [15] Roy, K., Prasad, S. C.: Low-Power CMOS VLSI Circuit Design. USA: A Wiley-Interscience publication, 2000, ISBN 0-471-11488-X, 359 p.
- [16] Sankaralingam, R., Pouya, B., Touba, N. A.: Reducing Power Dissipation During Test Using Scan Chain Disable. In Proceedings of the IEEE VLSI Test Symposium, 2001, pp. 319 - 324
- [17] Sankaralingam, R., Oruganti, R. R., Touba, N. A.: Static Compaction Techniques to Control Scan Vector Power Dissipation. In Proceedings of IEEE VLSI Test Symposium, 2000, pp. 35-40. 29
- [18] Sathanur, A., Pullini, A., Benini, L..: Timing-Driven Row-Based Power Gating. In Proceedings Of Acm/Ieee International Symposium On Low Power Electronics And Design, 2007, pp. 104-109
- [19] Schmitz, M. T., Al-Hashimi, B. M., Eles, P.: System-Level Design Techniques For Energy-Effcient Embedded Systems. Boston, Usa: Kluwer Academic Publishers, 2004, 211 p.
- [20] Schuele, T., Stroele, A.: Test Scheduling For Minimal Energy Consumption under Power Constraints. In Proceedings of the 19th IEEE VLSI Test Symposium, Washington, USA: IEEE Computer Society, 2001, pp. 312 – 318
- [21] Škarvada, J.: BEAST Homepage. [on-line]. c2008, last rev. Nov. 27, 2009. Document available at http://www.fit.vutbr.cz/~skarvada/beast/
- [22] Škarvada, J.: Digital systems test application optimization for low power dissipation. PhD Thesis, Brno, VUT, 2009, 125 p (in Czech)
- [23] Thompson, S., Packan, P., Bohr, M.: MOS Scaling: Transistor Challanges for the 21st Century. Intel Technol. Journal, Vol. 19, 1998
- [24] Usami, K., Horovitz, M.: Clustered Voltage Scaling Techniques For Low-Power Design. In Proceedings Of The International Symposium On Low Power Electronics And Design, pp. 3 - 8, 1995
- [25] Veendrick, H. J. M.: Short-circuit dissipation of static CMOS circuitry and its impact on the design of buffer circuits. IEEE Journal of Solid State Circuits, vol. 19, 1984: pp. 468 – 473
- [26] Vranken, H., Waayers, T., Fleury, H.: Enhanced Reduced-Pin-Count Test For Full-Scan Design. In Proceedings Of The Ieee International Test Conference, 2001, pp. 738 – 747
- [27] Wang, S., Gupta, S. K.: Atpg For Heat Dissipation Minimization During Test Application. IEEE Transactions on Computers, vol. 47, No. 2, 1998: pp. 256 - 262