MindMap Gallery Chapter 5 Central Processing Unit
This is a mind map about Chapter 5 Central Processing Unit, which summarizes the functions and basic structure of the CPU, the instruction execution process, the functions and basic structure of the data path, and other knowledge points.
Edited at 2024-01-16 15:53:05El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
Chapter 5 Central Processing Unit
CPU
CPU functions and basic structure
CPU functions
composition
operator
The function of the operator is to process data.
controller
The function of the controller is to coordinate and control the sequence of instructions for each component of the computer to execute the program, including fetching instructions, analyzing instructions, and executing instructions.
Specific functions
command control
Operation control
time control
data processing
Interrupt handling
Basic structure of CPU
operator
arithmetic logic unit
scratchpad
accumulation register
General purpose register set
program status word register
shifter
counter
controller
program counter
instruction register
instruction decoder
memory address register
memory data register
timing system
micro operation signal generator
Instruction execution process
instruction cycle
There are CPU access operations
The fetch cycle is for fetching instructions
The indirect address cycle is to obtain the effective address (operand)
The execution cycle is to obtain the operand
The interrupt cycle is to save program breakpoints
instruction cycle data flow
fetch cycle
PC → MAR → Address Bus → Main Memory CU issues read command → control bus → main memory Main memory → data bus → MDR → IR (storage instructions) CU sends a control signal → PC content increases by 1
The task of the fetch cycle is to fetch the instruction code from the main memory based on the contents of the PC and store it in the IR.
indirect address cycle
Ad(IR)(or MDR) → MAR → Address Bus → Main Memory CU issues read command → control bus → main memory Main memory → data bus → MDR (storage effective address)
The task of the indirect address cycle is to obtain the effective address of the operand.
execution cycle
The task of the execution cycle is to take the operand and generate the execution result through the ALU operation according to the opcode of the instruction word in the IR.
interrupt cycle
CU controls decrement SP by 1, SP → MAR → address bus → main memory CU issues write command → control bus → main memory PC → MDR → Data bus → Main memory (program breakpoints are stored in main memory) CU (entry address of interrupt service routine) → PC
The CU sends the special address of the memory used to save the program breakpoint (such as the contents of the stack pointer) to the MAR and sends it to the address bus. Then the CU sends a write command to the memory and sends the contents of the PC (program breakpoint) to the MAR. to the MDR, and finally the program breakpoint is stored in the memory via the data bus. In addition, the CU also needs to send the population address of the interrupt service program to the PC to prepare for the instruction fetch cycle of the next instruction cycle.
The task of the interrupt cycle is to handle interrupt requests.
Three-wire (data bus, address bus, control bus) PC
Instruction execution plan
single instruction cycle
multiple instruction cycles
Pipeline solution
Data path functions and basic structure
Data path functions
The function of the data path is to realize data exchange between the arithmetic units and registers within the CPU.
Basic structure of data path
basic structure
bus structure data path
CPU internal single bus mode
Only one component can send data at the same time, otherwise signal conflicts will occur. The solution is: all component outputs (and leading to the bus or other directions but with signal conflicts) must be connected to the bus through tri-state gates, except for components with their own tri-state output function.
Since the output of a combinational logic component only depends on its input, the input and output terminals of the combinational logic component cannot be connected on the same bus, otherwise signal conflicts will occur. The first solution is to add a register (or latch) so that only one of all input terminals and output terminals is directly connected to the same bus. The second solution is to increase the number of buses so that each input and output terminal is connected to a different bus.
A, B and F cannot be connected to the same bus, so 2 additional registers (or latches) are required, or a three-bus structure is adopted, or a dual-bus structure is adopted and a register is added.
The contradictions solved by adding registers and adding three-state gates are different. Tri-state gates focus on resolving conflicts on the bus, and registers focus on resolving conflicts on combinational logic components.
CPU internal multi-bus mode
Dedicated data path approach
Since the same input terminal is connected to multiple component output terminals, in order to receive data from only one component output terminal at the same time, each input terminal of the component must be connected to a different component output terminal through a multiplexer. The component input terminal is only connected to one component output terminal. Except when the component output is connected.
Since there are no output conflicts, GPRs can set up two read ports to improve data transfer performance.
Data transfer between registers
(PC)→MAR PCout and MARin are valid, PC content→MAR
Data transfer between main memory and CPU
(PC)→MAR PCout and MARin are valid, the current command address→MAR 1→R CU sends read command MEM(MAR)→MDR MDRin is valid (MDR)→IR MDRout and IRin are valid, the current command→IR
Perform arithmetic or logical operations
(MDR)→MAR MDRout and MARin are valid, the operand effective address→MAR 1→R CU sends read command MEM(MAR)→MDR operand from memory→MDR (MDR)→Y MDRout and Yin are valid, operand Y (ACC) (Y)→Z ACCout and ALUin are valid, CU sends an add command to ALU, the result→Z (Z)→ACC Zout and ACCin are valid, the result→ACC
The functional component required to analyze the instruction phase is the instruction decoder, but it does not belong to the data path.
Data path of single bus structure
Due to the single bus structure, the arithmetic unit ALU in the figure uses two temporary registers X and Z. where X is used to stage the ALU's operand A. The other operand B of the ALU comes from the internal bus. Z is used to temporarily store operation results.
The PSW register is a program status register, used to store the operation status flags of the ALU, and the temporary status flags will be sent to the operation controller.
PC, AR, DR, IR, X, Z registers and register file Regs are directly connected to the internal bus. In addition, the AR and DR registers are also connected to the memory MEM through the external bus.
In a bus structure, the number of data transfers that can occur simultaneously depends on the number of buses. For a single bus structure, there can be multiple modules on the bus receiving data at the same time, but only one module can send data to the bus at a certain time, otherwise data conflicts will occur.
Therefore, components connected to the bus require output control to prevent data conflicts on the bus. For this reason, all functional components that output to the inward bus in the figure use three-state gates for output control (indicated by triangular hollow arrows in the figure).
Control signals and their functions
Typical MIPS32 instructions
Execution process of lw instruction
The function of the lw instruction is to read a 32-bit memory word from the main memory. The assembly code is lw rt, imm (rs)
The memory access address is the register corresponding to the rs field plus the 16-bit signed immediate imm. This is a typical indexed addressing. lw takes out 4 bytes from the corresponding main memory unit and sends them to the rt register for storage.
Since the register bit width is 32 bits, the 16-bit immediate imm needs to be sign-extended to 32 bits before it can be sent to the ALU to calculate the memory access address.
lw instruction operation process and control signals
The operation of fetch cycle T3 beat M[AR] → DR can also be placed in T2 beat, and the two functions are equivalent. Each M[AR] → DR can be accompanied by an on-chip bus transmission transaction to improve the instruction execution speed.
Execution flow of sw command
The function of the sw instruction is to write a 32-bit memory word in the main memory. The assembly code is sw rt,imm(rs)
sw command operation process and control signals
Execution flow of beq instruction
The beq instruction is a conditional branch instruction, and the assembly code is beq rs,rt,imm. The function of the beq instruction is to compare the values of registers rs and rt, and perform a branch jump if they are equal.
The value of imm indicates the branch target address relative to the next instruction, which is the number of instructions of PC 4. Therefore, when calculating the branch target address, the value of PC (updated to PC 4 during the instruction fetching stage) and the sign of imm should be extended to 32 bits and then left The values shifted by two bits are added. The purpose of shifting two bits to the left here is to calculate the byte offset.
beq instruction operation process and control signals
data path components
Operating components (combinational logic circuits)
data processing unit
The data processing unit is composed of combinational logic circuits, whose output is only related to the current input and is responsible for processing data, such as ALU, sign extension unit, decoder, etc.
State components (sequential logic circuits)
state storage unit
State storage unit (state unit) refers to a unit with storage function, such as memory and register.
Single-cycle processor typical data path
A single-cycle MIPS processor is a MIPS processor in which all instructions are completed in one clock cycle.
Although the execution time of different instructions may vary, based on the barrel principle, the clock cycle of a single-cycle processor is determined by the instruction that executes the slowest.
Since the fetching and execution of instructions can only be completed within one clock cycle, any resources in the data path during instruction execution cannot be reused. They should be dedicated data paths and resources that need to be used multiple times (such as adders) ) need to set multiple.
Both instruction fetching and operand fetching operations require memory access, so instructions and data are stored in instruction memory and data memory respectively to avoid resource conflicts.
Since the single-cycle MIPS processor must complete the instruction within one clock cycle, the instruction register IR is not set, but the instruction word fetched from the instruction memory is directly parsed. Otherwise, just fetching the instruction into the IR will require one clock cycle.
R type arithmetic instruction data path
The arithmetic and logical operation instructions in MIPS are R-type instructions. The following takes the addition instruction as an example. add rd,rs,rt #RTL function description: R[rs] R[rt] → R[rd]
The functional components involved in the execution of instructions mainly include the register file and ALU. It only needs to send the source register fields rs and rt in the instruction word read from the instruction memory to the two read register number terminals R1# and R2# of the register file respectively, and send the destination register field rd to the write register of the register file. Number terminal W#, the values of the two source registers read from the register file are output to the arithmetic unit through the R1 and R2 ports; the funct field in the instruction word determines the AluOp to control the ALU to perform the corresponding operation (addition should be selected here). The result is sent to the write data port WD of the register file. When the rising edge of the clock arrives, the operation result will be written into the destination register rd.
Type I memory access instruction data path
MIPS memory access instructions belong to type I instructions, including fetching data and storing data, taking word access instructions as an example. lw rt,imm16(rs) #RTL function description:M[R[rs] SignExt(imm16)] → R[rt] sw rt,imm16(rs) #RTL function description: R[rt] → M[R[rs] SignExt(imm16)]
The rs field in the instruction word is still sent to the R1# end of the register file; the destination register field rt is sent to the write register number end W# of the register file; in addition, the 16-bit immediate number imm16 must be converted into 32 bits through the sign extension unit Then it is sent to the ALU, and is added to the value of the index register rs to form the final access address. Then the data in the data memory is read and sent to the register file write data port WD.
Send the rs and rt fields in the instruction word to the R1# and R2# ends of the register file respectively; convert the 16-bit immediate data into 32 bits through the sign extension unit and send it to the ALU, and add it to the value of the index register rs Form the final main memory address; send the value of the rt register read from the register file to the data memory write data port WD.
Controller functions and working principle
Controller structure and function
The main function
Retrieve an instruction from main memory and indicate the location of the next instruction in main memory.
Decode or test the instructions and generate corresponding operation control signals to initiate specified actions.
Directs and controls the direction of data flow between the CPU, main memory, input and output devices.
hardwired controller
Hardwired control unit diagram
CU input signal source
The instruction information generated by decoding the instruction decoder.
Machine period signals and beat signals generated by timing systems.
The feedback information from the execution unit is the flag.
Control signals from the system bus (control bus), such as interrupt requests and DMA requests.
Timing system and micro-operations for hardwired controllers
timing system
clock cycle
The width of each beat corresponds to exactly one clock cycle.
machine cycle
The machine cycle can be considered as a baseline time during the execution of all instructions.
The shortest time to read an instruction word from the memory is regarded as the machine cycle.
Several micro-operations can be completed in one machine cycle. Each micro-operation requires a certain amount of time. The clock signal can be used to control the generation of each micro-operation command.
instruction cycle
The time it takes for the CPU to fetch and execute an instruction from main memory is called the instruction cycle.
Instruction cycles are often represented by several machine cycles, and one machine cycle contains several clock cycles (also called beats or T cycles, which are the most basic unit of CPU operation).
The number of machine cycles in each instruction cycle can vary, and the number of beats in each machine cycle can also vary.
Atomic operations refer to operations that cannot be refined. Usually, the atomic operations inside the CPU are called micro-operations (μOP), and the component control signals that implement μOP are called micro-operation control signals, also called micro-operation commands (μOPCmd). The time to complete a μOP is called a beat, and multiple μOPs can be timing controlled through different beat signals to form a μOP sequence.
Micro-operation command analysis
Micro-operation commands in the fetch cycle
Micro-operation commands for indirect address cycles
Execute periodic micro-operation commands
non-memory access instructions
memory access instructions
transfer instruction
CPU control method
Synchronous control method
Can be divided into the following types
Fixed length instruction cycle
The number of machine cycles is fixed, but the number of beats is not fixed
Combined central and local control
The timing relationship of synchronous control is relatively simple and the controller design is convenient, but there is a problem of low CPU efficiency when using slow components.
Asynchronous control method
The timing of each functional component and operation is implemented using a response mechanism. After the control component sends an operation control signal to the functional component, it must wait until the functional component sends a response signal before starting the next operation.
The advantage is that each component can work according to its actual required time, and there is no process of the fast one waiting for the slow one, thus improving the speed of the system, but the structure of the asynchronous control method is more complicated.
joint control method
Most operation control sequences are controlled synchronously using machine cycles and beat potentials. For a small number of operations that are difficult to determine at a certain time, asynchronous control can be used.
Hardwired Control Unit Design Steps
microprogrammed controller
Basic concepts of microprogram control
The design idea is to write each machine instruction into a microprogram. Each microprogram contains several microinstructions, and each microinstruction corresponds to one or several microoperation commands.
basic terminology
Microcommands and microoperations
A machine instruction can be decomposed into a sequence of micro-operations. These micro-operations are the most basic and irreducible operations in the computer.
The various control commands issued by the control component to the execution component are called microcommands, which are the smallest units that constitute a control sequence.
Microcommands and microoperations have a one-to-one correspondence. Microcommands are the control signals of microoperations, and microoperations are the execution processes of microcommands.
Microinstructions and microcycles
microinstructions
Operation control fields
sequence control field
Microcycle refers to the time required to execute a microinstruction, usually one clock cycle.
Microinstructions control the execution of a corresponding set of microoperations to realize part of the functions of an instruction.
Main memory and control memory
Main memory is used to store programs and data, and is implemented outside the CPU using RAM.
Control memory (CM) is used to store microprograms, which is implemented inside the CPU using ROM.
Programs and microprograms
A program is an ordered collection of instructions.
The function of an instruction is implemented by a microprogram.
A microprogram is equivalent to a μOPCmd sequence, a microinstruction is equivalent to all μOPCmds in one step of the μOPCmd sequence, and a microcommand is equivalent to a μOPCmd.
Each microprogram consists of several microinstructions
Each microinstruction corresponds to a set of μOPCmd
Microcommands point to control signals sent by components, corresponding to μOPCmd one-to-one
Microprogrammed controller composition and working process
Basic components of microprogrammed controller
control memory
Store the microprogram corresponding to each instruction.
microinstruction register
It is used to store microinstructions fetched from CM. Its number of bits is equal to the word length of the microinstruction.
microaddress register
Receive the micro-address sent from the micro-address forming component to prepare for reading the micro-instructions in the CM.
The working process of microprogrammed controller
Note that the lower address field of the last microinstruction of a microprogram usually points to the entry address of the instruction fetching microprogram to ensure that the instruction fetching phase can be entered again after the instruction is executed.
Microprograms and machine instructions
The number of microprograms should be the number of machine instructions plus the number of public microprograms corresponding to instruction fetching, indirect addressing, and interrupt cycles.
How microinstructions are encoded
Direct encoding (direct control) method
Each one represents a microcommand.
Field direct encoding method
Group mutually exclusive microcommands in the same field, and group compatible microcommands in different fields. Therefore, among the microcommands defined by each subfield, there is at most one valid microcommand at the same time.
Each field is independently coded, each code represents a microcommand and the meaning of each field code is defined separately.
Microcommands must be issued after passing through the decoding circuit, so it is slower than the direct encoding method.
The worst case scenario for field direct encoding is the direct encoding method.
Field indirect encoding method
Certain microcommands in one field need to be interpreted by certain microcommands in another field.
Among the three encoding methods, the direct encoding method has the longest operation control field and the simplest μOP control signal formation. The field indirect encoding method has the shortest operation control field and the most complicated μOP control signal formation. The field direct encoding method is a compromise between the two. .
The address formation method of microinstructions
Directly indicated by the lower address field of the microinstruction.
Opcodes are formed based on machine instructions.
Microinstruction format
horizontal microinstructions
A horizontal microinstruction defines and performs several basic parallel operations.
vertical microinstructions
A vertical microinstruction can only define and execute one basic operation.
Mixed microinstructions
Design steps of microprogram control unit
List all μOPCmd sequences
Design microinstruction format
Compile microprogram
Design related circuits
Dynamic microprogramming and nanoprogramming
If the microprogram can be changed according to the user's requirements, the machine has dynamic microprogramming capabilities.
If the hardware is not directly controlled by microprograms, but is interpreted by nanoprograms stored in the second-level control memory, this second-level control memory is called nanomemory, and the hardware is directly controlled by femto instructions.
Features of Hardwiring and Programmable Controllers
Exception and interrupt mechanism
Basic concepts of exceptions and interrupts
Unexpected events generated internally by the CPU are called exceptions.
An interrupt request issued to the CPU by a device external to the CPU is called an interrupt.
Classification of exceptions and interrupts
Abnormal classification
Fault
Trap
Traps are usually detected at the end of instruction execution, and once a trap is detected, exception handling occurs immediately.
System call instructions, conditional self-trap instructions (such as teq, teqi, tme, tnei, etc. in MIPS) are all trap instructions.
In single-step debugging mode, every ordinary instruction can be used as a trap instruction to generate a trap exception. The trap exception is triggered by the execution of the trap instruction. Similar to a function call, there is no program breakpoint. Executing these instructions will cause an unconditional or Conditionally call the operating system kernel program and execute it. After the execution is completed, it returns to the next instruction of the self-trap instruction for execution. (When the trap instruction is a branch instruction, it does not return to the next instruction for execution, but returns to the branch target instruction for execution.)
Abort
A random hardware failure that prevents the CPU from continuing to execute has nothing to do with specific instructions.
The detection of internal anomalies is implemented by the internal logic of the CPU. It is not necessary to notify the CPU through an external signal.
Classification of interrupts
Maskable interrupt
When interrupts are turned off, maskable interrupts cannot get a response from the CPU.
non-maskable interrupt
Non-maskable interrupts also need to be responded to in interrupt-off mode.
Exception and interrupt response process
Turn off interrupts
Save breakpoints and program state
Identify exceptions and interrupts and go to appropriate handlers
instruction pipeline
Basic concepts of instruction pipeline
Improving processor parallelism from two aspects
temporal parallelism
Assembly line technology
spatial parallelism
superscalar processor
Definition of instruction pipeline
The execution process of instructions
fetch(IF)
Fetch instructions from instruction memory or cache
Decoding/reading register (D)
The operation controller decodes the instruction and fetches the operands from the register file.
Execution/calculation address (EX)
Memory access (MEM)
Read and write memory
Write back (WB)
Write instruction execution results back to the register file
The principle of pipeline design is that the number of instruction pipeline segments is based on the number of functional segments used by the most complex instructions.
How to express the pipeline
Basic implementation of pipeline
The introduction of pipeline registers allows instructions in each segment to be parallelized in time.
Pipeline data path
Add a long pipeline register component at the dotted line position in the figure.
Note that there are no pipeline registers behind the WB segment, but the data in this segment is eventually written back to the register file. The program counter PC can also be regarded as a pipeline register, providing data for IF segment instruction fetching.
The register file in the ID segment is a relatively special functional component. It is responsible for reading register operands in the ID segment. The read operation belongs to combinational logic. At the same time, the register file of the ID segment is also responsible for the write-back operation of the instruction execution results of the WB segment. The write operation requires the cooperation of the clock and is a sequential logic.
The input source of the register file write register number W# port is selected by the RegDst signal control multiplexer according to the instruction word of the ID segment; while the write data WD comes from the WB segment, that is, the write address and write data belong to Different instructions, which will cause data confusion.
First, adjust the output position of the write register number WriteReg# output by the ID segment multiplexer. It is no longer sent to the W# end of the register file, but directly sent to the ID/EX pipeline register for latch, and then segment by segment. Passed to the WB segment; finally, the MEM/WB pipeline register of the WB segment returns it to the write register number W# port of the register file. Notice in the figure that the multiplexer for the ID segment has been slightly repositioned.
The data information transferred by different pipeline register latches is not the same.
The IF/ID pipeline register needs to latch the instruction word fetched from the instruction memory and the value of PC 4.
The ID/EX pipeline register needs to latch the two operands RS and RT taken out from the register file (the values of the registers corresponding to the rs and rt fields in the instruction word) and the write register number WriteReg#, as well as the immediate sign-extended value, PC 4 and other operands that may be used later.
EX/MEM pipeline registers need to latch ALU operation results, data to be written in the data memory WriteData, write register number WriteReg# and other data.
The MEM/WB pipeline register needs to latch the ALU operation results, data read from the data memory, write register number WriteReg# and other data.
Pipeline control signals
Control signal classification
The execution process of the pipeline (taking the write instruction from the data memory to the register as an example)
fetch(IF)
Although the lw instruction will not use PC 4 in subsequent function sections, PC 4 will still be transferred to the IF/ID pipeline register for use by other instructions (such as beq).
When the clock arrives, the instruction word will be latched in the IF/ID pipeline register, and PC is updated to the value of PC 4.
Decoding/reading register (ID)
In the ID segment, the operation controller generates the operation control signals required for subsequent segments based on the instruction words in the IF/ID pipeline register and transmits them backward.
In addition, the ID segment will also read the values RS and RT of the rs and rt registers in the register file according to the rs and rt fields in the instruction word.
The sign extension unit sign-extends the 16-bit literal in the instruction word to 32 bits.
The multiplexer generates the possible write register number of the instruction WriteReg# based on the instruction word (some instructions do not require writing registers).
These 4 data will be transferred to the ID/EX pipeline register together with the sequential instruction address PC 4.
Execution/calculation address (EX)
For the lw instruction, the EX segment is mainly used to calculate the memory access address. The memory access address obtained by adding the RS value in the ID/EX pipeline register and the sign-extended immediate value is sent to the EX/MEM pipeline register.
The EX segment also needs to calculate the branch target address and generate the branch jump signal BranchTaken.
The value of RT in the ID/EX pipeline register will be used as write data in the MEM segment, so the value of RT will be sent to the EX/MEM pipeline register as write data WriteData.
The write register number WriteReg# in the ID/EX pipeline register will also be directly transferred to the EX/MEM pipeline register.
Memory access (MEM) (read or write)
Mainly based on the ALU operation results latched in the EX/MEM pipeline register - memory access address, write data and memory read and write control signal MemWrite, the memory is read or written.
The ALU operation results, WriteReg#, and data read from the data memory in the EX/MEM pipeline register will be sent to the input end of the MEM/WB pipeline register.
Write back (WB)
The WB segment selects the ALU operation result or memory access data from the MEM/WB pipeline register and writes it back to the designated register WriteReg# of the register file.
Each functional section in the pipeline does not distinguish the function of the instruction. All data information and operation control signals are output from the pipeline register at the beginning of the section, so any data and operation control signals that may be used by subsequent functional sections must be passed backwards. .
Adventures and Processing of the Assembly Line
structure adventure
Conflicts caused by multiple instructions using the same operating unit in the same clock cycle are called structural conflicts.
Calculating PC 4, calculating branch target address, and arithmetic operation all require the use of arithmetic units.
Both accessing instructions and accessing data require the use of memory.
There are also structural conflicts between the operations of the ID segment read register and the WB segment write register. However, since the read and write logic of the MIPS register file is completely independent logic, the read and write addresses and data enter through different ports, and the read and write logic can operate concurrently. Therefore this structural conflict does not exist.
Solution
Use independent instruction memory and data memory.
Block the program counter PC, causing the IF segment to pause for one clock cycle. When the next clock arrives, the IF/ID pipeline register is synchronously cleared. Entering the ID segment is a no-op (a MIPS instruction of all 0s is equivalent to a no-op). Wait until the Load instruction accesses the After the save operation is completed, the IF segment is restarted.
data adventure
The current instruction needs to use the operation result of the previous instruction, but this result has not yet been generated or has not been delivered to the specified location, which will cause the current instruction to be unable to continue execution. This is called a data conflict.
The possible data conflicts between the two instructions are as follows:
Write before read conflict (RAW)
If the source operand of instruction I2 is the destination operand of instruction I1, this data conflict is called a write-before-read conflict.
When instructions are executed in a pipelined manner, since instruction I2 uses the result of instruction I1, if instruction I2 reads the old value of the register in the ID segment before instruction I1 writes the result to the register, it will cause a read Data error.
Read before write conflict (WAR)
If the destination operand of instruction I2 is the source operand of instruction I1, this data conflict is called a read-before-write conflict.
This data dependence does not have any impact on the execution of instructions.
Write after write conflict (WAW)
If the destination operands of instructions I2 and I1 are the same, this data conflict is called a write-after-write conflict.
When instructions are executed in a pipelined manner, this write-after-write conflict has no impact on the execution of instructions.
The solution
Hardware stall and software insertion of "NOP" instructions
Register RAW adventure handling
data bypass technology
If there is data correlation, the register operands RS and RT of the EX segment are incorrect data. The correct data should come from the destination operands of the MEM and WB segment instructions, and these instructions have completed the operation through the EX segment.
Except for the Load class memory access instructions, the destination operands are actually stored in the EX/MEM and MEM/WB pipeline registers. The correct operands can be directly redirected (Forwarding) from their locations to the appropriate locations in the EX segment ( Also called Bypass).
There is no need to insert bubbles, which can solve most data-related problems, avoid pipeline performance degradation caused by bubble insertion, and greatly optimize pipeline performance.
Instruction compilation optimization and adjustment of instruction order
Take control of the adventure
When the pipeline encounters a branch instruction or other instructions that will change the PC value, the adjacent instructions loaded into the pipeline after the branch instruction may not enter the execution stage due to branch jumps. This conflict is called a control conflict, also known as a branch. conflict.
Solution
Perform branch prediction on transfer instructions and generate transfer target addresses as early as possible.
Prefetch target instructions in both successful and unsuccessful control flow directions.
Speed up and advance condition code formation.
Improve the accuracy of guessing the transfer direction.
Pipeline performance indicators
Pipeline throughput
Pipeline speedup
Advanced pipeline technology
Superscalar pipeline technology
CPI<1
Multiple independent instructions can be concurrently executed in each clock cycle, that is, two or more instructions are compiled and executed in parallel operation.
The processor is required to be equipped with multiple functional components and instruction decoding circuits, as well as multiple register ports and buses, so that multiple operations can be executed simultaneously.
Super pipeline technology
The more pipeline functional segments are divided, the shorter the clock cycle and the higher the instruction throughput rate. Therefore, super-pipeline technology improves pipeline performance by increasing the main frequency of the pipeline.
In the original clock cycle, the functional component is used three times, causing the pipeline to run at three times the original clock frequency.
CPI=1
Very long instruction word technology
Using an architecture in which multiple instructions are processed in parallel in multiple processing units, multiple instructions can be flowed out in one clock cycle.
Basic concepts of multiprocessors
Basic concepts of SISD, SIMD and MIMD
Single instruction stream single data stream (SISD) architecture
Single Instruction Multiple Data (SIMD) architecture
Multiple instruction stream single data stream (MISD) architecture
Multiple Instruction Multiple Data (MIMD) architecture
Basic concepts of hardware multithreading
Fine-grained multithreading
Coarse-grained multithreading
Simultaneous multi-threading
Basic concepts of multi-core processors
Basic concepts of shared memory multiprocessors
Even though these systems share the same physical address space, they can still run programs independently in their own virtual address spaces.
two types
Unified Memory Access (UMA) multiprocessor
Non-uniform memory access (NUMA) multiprocessors