1. ILP Processor Design

    S.J.Lee and K.J.Kim,"Design of a Parallel Pipelined Processor Architecture", Journal of the Institute of Electronics Engineers of Korea, Vol.32-B No.3, p.11-23, Mar. 1995

    In this paper, a parallel pipelined processor model which acts as a small VLIW processor architecture and a scheduling algorithm for extracting instruction-level parallelism on this architecture are proposed. The proposed model has a dual-instruction mode which has maximum 4 basic operations being executed in parallel. By combining these basic operations, variable instruction set can be designed for various applications. The scheduling algorithm schedules basic operations for parallel execution and removes pipeline hazards by examining data dependency and resource conflict relations. In order to examine operation and evaluate the performance,a C compiler and a simulator are developed. By simulating various test programs with the compiler and the simulator, the characteristics and the performance result of the proposed architecture are measured.

    B.Y.Yoo and S.J.Lee, "A Predicate-Sensitive Scheduling Algorithm in Instruction Level Parallelism Processors", Transactions of the Korea Information Processing Society, Vol.5 No.1, p.202-p.214, Jan. 1998

    Exploitation of instruction-level parallelism(ILP) is an effective mechanism for improving the performance of modern super-scalar and VLIW processors. Various software techniques can be applied to increase ILP. Among these techniques, predicated execution is the one that increases the degree of ILP by allowing instructions from different basic blocks to be converted to a single basic block by removing branch instructions.
    In this paper, a global predicate-sensitive scheduling algorithm is proposed to improve the performance for ILP processors that support predicated execution. In order to examine the performance of proposed algorithm, a C compiler and a simulator are developed. By simulating various benchmark programs with the compiler and the simulator, the performance results of this algorithm are measured and the effectiveness of the algorithm is verified. As a result of measure performance with 1,2,4 issue execution, this study was confirmed average performance by 20% or more.

    S.J.Lee, "Reducing Branch Penalties by a Branch Scheme with Selective Squashing in ILP Processors", Journal of the Korea Information Science Society, Vol.25 No.7, p.766-776, July. 1998.

    Pipeline hazards, which break sustained pipeline flow, are the major impediment to improving performance in ILP processors which utilize instruction-level parallelism (ILP) by issuing multiple instructions. Especially, control hazards arising from branch instructions are major hurdle of enhancing the performance. In this paper, a branch scheme with selective squashing is proposed to reduce branch penalties. This scheme schedules the unsafe instructions from the branch predicted target path into the unfilled branch-delay slots which a compiler does not schedule safe instructions. In the case of branch misprediction, these unsafe instructions are squashed selectively. To make the unsafe instructions squashed selectively,a minimal hardware is added. It consists of a squashing decoder and branch squashing bit queues which take sqashing bits and a prediction bit from a branch instruction. The performance is evaluated by simulating various test programs and their results are compared with those of other branch schemes. The experimental results show that the proposed scheme reduces branch penalties effectively.

    S.J.Lee, "A Performance Measurement and Evaluation System for ILP Processors", Transactions of the Korea Information Processing Society, Vol.5 No.8, p.2164-2178, Aug. 1998.

    In this paper, a performance measurement and evaluation system for ILP(Instruction Level Parallelism) processors which issue multiple instructions and execute them in parallel is developed. The system consists of a C compiler and a simulator. The compiler takes C source programs as an input and generates 3-address style intermediate code. Then the simulator accepts the intermediate code and simulates it. The results of simulation are the contents of memory before and after simulation, the number of executed clocks, the trace and the dynamic count of executed instructions, the prediction hit ratio and profiling information for each branch instruction. To verify and understand the behavior of the system, the performance of predicated execution and one of branch schemes is measured and its results are analyzed.

  2. Optimizing Compiler

    J.K.Choi and S.J.Lee, "An Extended Graph Coloring Register Allocation Scheme to Enhance the Efficiency of Code Scheduling" Proceedings of The 1998 International Technical Conference on Circuit/Systems, Computers and Communications(ITC-CSCC'98), Vol.II, p.1243-1246, July 1998.

    In this paper, an Extended graph coloring Register Allocation(ERA) scheme is proposed. In the Compiler optimization phases, when assigns the registers, it uses the whole available registers and all registers distribute evenly using an ERA heuristic, if possible. Because we utilize the whole available registers, the dependent relations among instructions are reduced. Thus, an ERA scheme has more possibility of performance enhancement than in Chaitin's scheme which is the traditional graph coloring scheme. By various experimental method, it is shown that the proposed scheme enhances the efficiency of code scheduling.

    J.K.Choi and S.J.Lee, "A Register Allocation Algorithm to Improve Code Scheduling Efficiency", Submitted to Journal of the Korea Information Science Society, 1998.

    As ILP processors have been developed, the importance for optimizing compiler is increased. Among various optimization methods, register allocation and code scheduling are essential to improve the performance of ILP processors. But it is required to consider carefully to apply these schemes to them because the results of one conflict with another. In this paper, we presented an extended register allocation algorithm which allocates registers to improve the effects of code scheduling. The conventional register allocation algorithm has possible to leave redundant registers after register allocation. The proposed register allocation algorithm allocates registers without redundant registers and decreases the data dependence relation to achieve a high code scheduling opportunity. Experimental results shown that the proposed register allocation algorithm was reduced the 7% execution clock cycles and the 73% complexity than the conventional scheme.

    J.K.Choi and S.J.Lee, "An Aggressive Register Allocation Algorithm for EPIC Architectures", Submitted to Transactions of the Korea Information Processing Society, 1998.

    Recently, many parallel processing technologies were developed, ILP(Instruction Level Parallelism) processor's performance have been growed very rapidly. Especially, EPIC(Explicitly Parallel Instruction Computing) architectures attempt to enhance the performance in the predicated execution and speculative execution with the hardware. In this paper, to improve the code scheduling possibility by applying to the characteristics of EPIC architectures, a new register allocation algorithm is proposed. And we proves that proposed register allocation algorithm is more efficient scheme than the conventional scheme when predicated execution is applied to our scheme by experiments. In experimental results, it shows much more performance enhancement , about 19% in proposed scheme than the conventional scheme. So, our scheme is verified that it is an effective register allocation method.