Features
Features:

Product Tour >

Edraw AI >

Paid Plans:

Individuals >

Business >

Eduaction >
Resources
Blog

History

How-tos & Tips

Discovery

Biography

Business Analysis

Examples

AI concept Map

Free AI Mind Map Generator

Onenote Mind Map

Bcg Matrix Examples

Nike Marketing Strategy

Unilever SWOT Analysis

Make Mind Maps in Google Docs

Guide

FAQs

What's New

Resource Center
Templates
All Templates

Brain Storming Templates

Strategy and Planning Templates

Project Management Templates

Product Management Templates

Human Resources Templates

Agile Workflow Templates

Marketing Templates

Education Templates

Fun and Games Templates

User Gallery
Download
Pricing
Enterprise

MindMap Gallery Chapter 5 Central Processing Unit

Chapter 5 Central Processing Unit

This is a mind map about Chapter 5 Central Processing Unit, which summarizes the functions and basic structure of the CPU, the instruction execution process, the functions and basic structure of the data path, and other knowledge points.

Edited at 2024-01-16 15:53:05

PlotWizard

Recent works View more works>>

Chapter 5 Central Processing Unit

PlotWizard

Recent works View more works>>

Recommended to you
Outline

Mind map of computer composition principles
- 10
PlotWizard
2023-13 Xinchuang industry system review (software hardware network security)
- 5
PlotWizard
Computer composition principles
- 9
- 1
- 1
PlotWizard
How the program runs
- 13
PlotWizard
Chapter 1 Computer System Overview
- 18
PlotWizard
Chapter 6 Bus
- 11
PlotWizard
Chapter 2 Data Representation and Operation
- 6
PlotWizard
Chapter 7 Input and Output System
- 8
PlotWizard
Chapter 3 Storage System
- 8
PlotWizard
Chapter 4 Command System
- 15
PlotWizard

Chapter 5 Central Processing Unit

CPU

CPU functions and basic structure

CPU functions

composition

operator

The function of the operator is to process data.

controller

The function of the controller is to coordinate and control the sequence of instructions for each component of the computer to execute the program, including fetching instructions, analyzing instructions, and executing instructions.

Specific functions

command control

Operation control

time control

data processing

Interrupt handling

Basic structure of CPU

operator

arithmetic logic unit

scratchpad

accumulation register

General purpose register set

program status word register

shifter

counter

controller

program counter

instruction register

instruction decoder

memory address register

memory data register

timing system

micro operation signal generator

Instruction execution process

instruction cycle

There are CPU access operations

The fetch cycle is for fetching instructions

The indirect address cycle is to obtain the effective address (operand)

The execution cycle is to obtain the operand

The interrupt cycle is to save program breakpoints

instruction cycle data flow

fetch cycle

PC → MAR → Address Bus → Main Memory CU issues read command → control bus → main memory Main memory → data bus → MDR → IR (storage instructions) CU sends a control signal → PC content increases by 1

The task of the fetch cycle is to fetch the instruction code from the main memory based on the contents of the PC and store it in the IR.

indirect address cycle

Ad(IR)(or MDR) → MAR → Address Bus → Main Memory CU issues read command → control bus → main memory Main memory → data bus → MDR (storage effective address)

The task of the indirect address cycle is to obtain the effective address of the operand.

execution cycle

The task of the execution cycle is to take the operand and generate the execution result through the ALU operation according to the opcode of the instruction word in the IR.

interrupt cycle

CU controls decrement SP by 1, SP → MAR → address bus → main memory CU issues write command → control bus → main memory PC → MDR → Data bus → Main memory (program breakpoints are stored in main memory) CU (entry address of interrupt service routine) → PC

The CU sends the special address of the memory used to save the program breakpoint (such as the contents of the stack pointer) to the MAR and sends it to the address bus. Then the CU sends a write command to the memory and sends the contents of the PC (program breakpoint) to the MAR. to the MDR, and finally the program breakpoint is stored in the memory via the data bus. In addition, the CU also needs to send the population address of the interrupt service program to the PC to prepare for the instruction fetch cycle of the next instruction cycle.

The task of the interrupt cycle is to handle interrupt requests.

Three-wire (data bus, address bus, control bus) PC

Instruction execution plan

single instruction cycle

multiple instruction cycles

Pipeline solution

Data path functions and basic structure

Data path functions

The function of the data path is to realize data exchange between the arithmetic units and registers within the CPU.

Basic structure of data path

basic structure

bus structure data path

CPU internal single bus mode

Only one component can send data at the same time, otherwise signal conflicts will occur. The solution is: all component outputs (and leading to the bus or other directions but with signal conflicts) must be connected to the bus through tri-state gates, except for components with their own tri-state output function.

Since the output of a combinational logic component only depends on its input, the input and output terminals of the combinational logic component cannot be connected on the same bus, otherwise signal conflicts will occur. The first solution is to add a register (or latch) so that only one of all input terminals and output terminals is directly connected to the same bus. The second solution is to increase the number of buses so that each input and output terminal is connected to a different bus.

A, B and F cannot be connected to the same bus, so 2 additional registers (or latches) are required, or a three-bus structure is adopted, or a dual-bus structure is adopted and a register is added.

The contradictions solved by adding registers and adding three-state gates are different. Tri-state gates focus on resolving conflicts on the bus, and registers focus on resolving conflicts on combinational logic components.

CPU internal multi-bus mode

Dedicated data path approach

Since the same input terminal is connected to multiple component output terminals, in order to receive data from only one component output terminal at the same time, each input terminal of the component must be connected to a different component output terminal through a multiplexer. The component input terminal is only connected to one component output terminal. Except when the component output is connected.

Since there are no output conflicts, GPRs can set up two read ports to improve data transfer performance.

Data transfer between registers

(PC)→MAR PCout and MARin are valid, PC content→MAR

Data transfer between main memory and CPU

(PC)→MAR PCout and MARin are valid, the current command address→MAR 1→R CU sends read command MEM(MAR)→MDR MDRin is valid (MDR)→IR MDRout and IRin are valid, the current command→IR

Perform arithmetic or logical operations

(MDR)→MAR MDRout and MARin are valid, the operand effective address→MAR 1→R CU sends read command MEM(MAR)→MDR operand from memory→MDR (MDR)→Y MDRout and Yin are valid, operand Y (ACC) (Y)→Z ACCout and ALUin are valid, CU sends an add command to ALU, the result→Z (Z)→ACC Zout and ACCin are valid, the result→ACC

The functional component required to analyze the instruction phase is the instruction decoder, but it does not belong to the data path.

Data path of single bus structure

Due to the single bus structure, the arithmetic unit ALU in the figure uses two temporary registers X and Z. where X is used to stage the ALU's operand A. The other operand B of the ALU comes from the internal bus. Z is used to temporarily store operation results.

The PSW register is a program status register, used to store the operation status flags of the ALU, and the temporary status flags will be sent to the operation controller.

PC, AR, DR, IR, X, Z registers and register file Regs are directly connected to the internal bus. In addition, the AR and DR registers are also connected to the memory MEM through the external bus.

In a bus structure, the number of data transfers that can occur simultaneously depends on the number of buses. For a single bus structure, there can be multiple modules on the bus receiving data at the same time, but only one module can send data to the bus at a certain time, otherwise data conflicts will occur.

Therefore, components connected to the bus require output control to prevent data conflicts on the bus. For this reason, all functional components that output to the inward bus in the figure use three-state gates for output control (indicated by triangular hollow arrows in the figure).

Control signals and their functions

Typical MIPS32 instructions

Execution process of lw instruction

The function of the lw instruction is to read a 32-bit memory word from the main memory. The assembly code is lw rt, imm (rs)

The memory access address is the register corresponding to the rs field plus the 16-bit signed immediate imm. This is a typical indexed addressing. lw takes out 4 bytes from the corresponding main memory unit and sends them to the rt register for storage.

Since the register bit width is 32 bits, the 16-bit immediate imm needs to be sign-extended to 32 bits before it can be sent to the ALU to calculate the memory access address.

lw instruction operation process and control signals

The operation of fetch cycle T3 beat M[AR] → DR can also be placed in T2 beat, and the two functions are equivalent. Each M[AR] → DR can be accompanied by an on-chip bus transmission transaction to improve the instruction execution speed.

Execution flow of sw command

The function of the sw instruction is to write a 32-bit memory word in the main memory. The assembly code is sw rt,imm(rs)

sw command operation process and control signals

Execution flow of beq instruction

The beq instruction is a conditional branch instruction, and the assembly code is beq rs,rt,imm. The function of the beq instruction is to compare the values of registers rs and rt, and perform a branch jump if they are equal.

The value of imm indicates the branch target address relative to the next instruction, which is the number of instructions of PC 4. Therefore, when calculating the branch target address, the value of PC (updated to PC 4 during the instruction fetching stage) and the sign of imm should be extended to 32 bits and then left The values shifted by two bits are added. The purpose of shifting two bits to the left here is to calculate the byte offset.

beq instruction operation process and control signals

data path components

Operating components (combinational logic circuits)

data processing unit

The data processing unit is composed of combinational logic circuits, whose output is only related to the current input and is responsible for processing data, such as ALU, sign extension unit, decoder, etc.

State components (sequential logic circuits)

state storage unit

State storage unit (state unit) refers to a unit with storage function, such as memory and register.

Single-cycle processor typical data path

A single-cycle MIPS processor is a MIPS processor in which all instructions are completed in one clock cycle.

Although the execution time of different instructions may vary, based on the barrel principle, the clock cycle of a single-cycle processor is determined by the instruction that executes the slowest.

Since the fetching and execution of instructions can only be completed within one clock cycle, any resources in the data path during instruction execution cannot be reused. They should be dedicated data paths and resources that need to be used multiple times (such as adders) ) need to set multiple.

Both instruction fetching and operand fetching operations require memory access, so instructions and data are stored in instruction memory and data memory respectively to avoid resource conflicts.

Since the single-cycle MIPS processor must complete the instruction within one clock cycle, the instruction register IR is not set, but the instruction word fetched from the instruction memory is directly parsed. Otherwise, just fetching the instruction into the IR will require one clock cycle.

R type arithmetic instruction data path

The arithmetic and logical operation instructions in MIPS are R-type instructions. The following takes the addition instruction as an example. add rd,rs,rt #RTL function description: R[rs] R[rt] → R[rd]

The functional components involved in the execution of instructions mainly include the register file and ALU. It only needs to send the source register fields rs and rt in the instruction word read from the instruction memory to the two read register number terminals R1# and R2# of the register file respectively, and send the destination register field rd to the write register of the register file. Number terminal W#, the values of the two source registers read from the register file are output to the arithmetic unit through the R1 and R2 ports; the funct field in the instruction word determines the AluOp to control the ALU to perform the corresponding operation (addition should be selected here). The result is sent to the write data port WD of the register file. When the rising edge of the clock arrives, the operation result will be written into the destination register rd.

Type I memory access instruction data path

MIPS memory access instructions belong to type I instructions, including fetching data and storing data, taking word access instructions as an example. lw rt,imm16(rs) #RTL function description:M[R[rs] SignExt(imm16)] → R[rt] sw rt,imm16(rs) #RTL function description: R[rt] → M[R[rs] SignExt(imm16)]

The rs field in the instruction word is still sent to the R1# end of the register file; the destination register field rt is sent to the write register number end W# of the register file; in addition, the 16-bit immediate number imm16 must be converted into 32 bits through the sign extension unit Then it is sent to the ALU, and is added to the value of the index register rs to form the final access address. Then the data in the data memory is read and sent to the register file write data port WD.

Send the rs and rt fields in the instruction word to the R1# and R2# ends of the register file respectively; convert the 16-bit immediate data into 32 bits through the sign extension unit and send it to the ALU, and add it to the value of the index register rs Form the final main memory address; send the value of the rt register read from the register file to the data memory write data port WD.

Controller functions and working principle

Controller structure and function

The main function

Retrieve an instruction from main memory and indicate the location of the next instruction in main memory.

Decode or test the instructions and generate corresponding operation control signals to initiate specified actions.

Directs and controls the direction of data flow between the CPU, main memory, input and output devices.

hardwired controller

Hardwired control unit diagram

CU input signal source

The instruction information generated by decoding the instruction decoder.

Machine period signals and beat signals generated by timing systems.

The feedback information from the execution unit is the flag.

Control signals from the system bus (control bus), such as interrupt requests and DMA requests.

Timing system and micro-operations for hardwired controllers

timing system

clock cycle

The width of each beat corresponds to exactly one clock cycle.

machine cycle

The machine cycle can be considered as a baseline time during the execution of all instructions.

The shortest time to read an instruction word from the memory is regarded as the machine cycle.

Several micro-operations can be completed in one machine cycle. Each micro-operation requires a certain amount of time. The clock signal can be used to control the generation of each micro-operation command.

instruction cycle

The time it takes for the CPU to fetch and execute an instruction from main memory is called the instruction cycle.

Instruction cycles are often represented by several machine cycles, and one machine cycle contains several clock cycles (also called beats or T cycles, which are the most basic unit of CPU operation).

The number of machine cycles in each instruction cycle can vary, and the number of beats in each machine cycle can also vary.

Atomic operations refer to operations that cannot be refined. Usually, the atomic operations inside the CPU are called micro-operations (μOP), and the component control signals that implement μOP are called micro-operation control signals, also called micro-operation commands (μOPCmd). The time to complete a μOP is called a beat, and multiple μOPs can be timing controlled through different beat signals to form a μOP sequence.

Micro-operation command analysis

Micro-operation commands in the fetch cycle

Micro-operation commands for indirect address cycles

Execute periodic micro-operation commands

non-memory access instructions

memory access instructions

transfer instruction

CPU control method

Synchronous control method

Can be divided into the following types

Fixed length instruction cycle

The number of machine cycles is fixed, but the number of beats is not fixed

Combined central and local control

The timing relationship of synchronous control is relatively simple and the controller design is convenient, but there is a problem of low CPU efficiency when using slow components.

Asynchronous control method

The timing of each functional component and operation is implemented using a response mechanism. After the control component sends an operation control signal to the functional component, it must wait until the functional component sends a response signal before starting the next operation.

The advantage is that each component can work according to its actual required time, and there is no process of the fast one waiting for the slow one, thus improving the speed of the system, but the structure of the asynchronous control method is more complicated.

joint control method

Most operation control sequences are controlled synchronously using machine cycles and beat potentials. For a small number of operations that are difficult to determine at a certain time, asynchronous control can be used.

Hardwired Control Unit Design Steps

microprogrammed controller

Basic concepts of microprogram control

The design idea is to write each machine instruction into a microprogram. Each microprogram contains several microinstructions, and each microinstruction corresponds to one or several microoperation commands.

basic terminology

Microcommands and microoperations

A machine instruction can be decomposed into a sequence of micro-operations. These micro-operations are the most basic and irreducible operations in the computer.

The various control commands issued by the control component to the execution component are called microcommands, which are the smallest units that constitute a control sequence.

Microcommands and microoperations have a one-to-one correspondence. Microcommands are the control signals of microoperations, and microoperations are the execution processes of microcommands.

Microinstructions and microcycles

microinstructions

Operation control fields

sequence control field

Microcycle refers to the time required to execute a microinstruction, usually one clock cycle.

Microinstructions control the execution of a corresponding set of microoperations to realize part of the functions of an instruction.

Main memory and control memory

Main memory is used to store programs and data, and is implemented outside the CPU using RAM.

Control memory (CM) is used to store microprograms, which is implemented inside the CPU using ROM.

Programs and microprograms

A program is an ordered collection of instructions.

The function of an instruction is implemented by a microprogram.

A microprogram is equivalent to a μOPCmd sequence, a microinstruction is equivalent to all μOPCmds in one step of the μOPCmd sequence, and a microcommand is equivalent to a μOPCmd.

Each microprogram consists of several microinstructions

Each microinstruction corresponds to a set of μOPCmd

Microcommands point to control signals sent by components, corresponding to μOPCmd one-to-one

Microprogrammed controller composition and working process

Basic components of microprogrammed controller

control memory

Store the microprogram corresponding to each instruction.

microinstruction register

It is used to store microinstructions fetched from CM. Its number of bits is equal to the word length of the microinstruction.

microaddress register

Receive the micro-address sent from the micro-address forming component to prepare for reading the micro-instructions in the CM.

The working process of microprogrammed controller

Note that the lower address field of the last microinstruction of a microprogram usually points to the entry address of the instruction fetching microprogram to ensure that the instruction fetching phase can be entered again after the instruction is executed.

Microprograms and machine instructions

The number of microprograms should be the number of machine instructions plus the number of public microprograms corresponding to instruction fetching, indirect addressing, and interrupt cycles.

How microinstructions are encoded

Direct encoding (direct control) method

Each one represents a microcommand.

Field direct encoding method

Group mutually exclusive microcommands in the same field, and group compatible microcommands in different fields. Therefore, among the microcommands defined by each subfield, there is at most one valid microcommand at the same time.

Each field is independently coded, each code represents a microcommand and the meaning of each field code is defined separately.

Microcommands must be issued after passing through the decoding circuit, so it is slower than the direct encoding method.

The worst case scenario for field direct encoding is the direct encoding method.

Field indirect encoding method

Certain microcommands in one field need to be interpreted by certain microcommands in another field.

Among the three encoding methods, the direct encoding method has the longest operation control field and the simplest μOP control signal formation. The field indirect encoding method has the shortest operation control field and the most complicated μOP control signal formation. The field direct encoding method is a compromise between the two. .

The address formation method of microinstructions

Directly indicated by the lower address field of the microinstruction.

Opcodes are formed based on machine instructions.

Microinstruction format

horizontal microinstructions

A horizontal microinstruction defines and performs several basic parallel operations.

vertical microinstructions

A vertical microinstruction can only define and execute one basic operation.

Mixed microinstructions

Design steps of microprogram control unit

List all μOPCmd sequences

Design microinstruction format

Compile microprogram

Design related circuits

Dynamic microprogramming and nanoprogramming

If the microprogram can be changed according to the user's requirements, the machine has dynamic microprogramming capabilities.

If the hardware is not directly controlled by microprograms, but is interpreted by nanoprograms stored in the second-level control memory, this second-level control memory is called nanomemory, and the hardware is directly controlled by femto instructions.

Features of Hardwiring and Programmable Controllers

Exception and interrupt mechanism

Basic concepts of exceptions and interrupts

Unexpected events generated internally by the CPU are called exceptions.

An interrupt request issued to the CPU by a device external to the CPU is called an interrupt.

Classification of exceptions and interrupts

Abnormal classification

Fault

Trap

Traps are usually detected at the end of instruction execution, and once a trap is detected, exception handling occurs immediately.

System call instructions, conditional self-trap instructions (such as teq, teqi, tme, tnei, etc. in MIPS) are all trap instructions.

In single-step debugging mode, every ordinary instruction can be used as a trap instruction to generate a trap exception. The trap exception is triggered by the execution of the trap instruction. Similar to a function call, there is no program breakpoint. Executing these instructions will cause an unconditional or Conditionally call the operating system kernel program and execute it. After the execution is completed, it returns to the next instruction of the self-trap instruction for execution. (When the trap instruction is a branch instruction, it does not return to the next instruction for execution, but returns to the branch target instruction for execution.)

Abort

A random hardware failure that prevents the CPU from continuing to execute has nothing to do with specific instructions.

The detection of internal anomalies is implemented by the internal logic of the CPU. It is not necessary to notify the CPU through an external signal.

Classification of interrupts

Maskable interrupt

When interrupts are turned off, maskable interrupts cannot get a response from the CPU.

non-maskable interrupt

Non-maskable interrupts also need to be responded to in interrupt-off mode.

Exception and interrupt response process

Turn off interrupts

Save breakpoints and program state

Identify exceptions and interrupts and go to appropriate handlers

instruction pipeline

Basic concepts of instruction pipeline

Improving processor parallelism from two aspects

temporal parallelism

Assembly line technology

spatial parallelism

superscalar processor

Definition of instruction pipeline

The execution process of instructions

fetch(IF)

Fetch instructions from instruction memory or cache

Decoding/reading register (D)

The operation controller decodes the instruction and fetches the operands from the register file.

Execution/calculation address (EX)

Memory access (MEM)

Read and write memory

Write back (WB)

Write instruction execution results back to the register file

The principle of pipeline design is that the number of instruction pipeline segments is based on the number of functional segments used by the most complex instructions.

How to express the pipeline

Basic implementation of pipeline

The introduction of pipeline registers allows instructions in each segment to be parallelized in time.

Pipeline data path

Add a long pipeline register component at the dotted line position in the figure.

Note that there are no pipeline registers behind the WB segment, but the data in this segment is eventually written back to the register file. The program counter PC can also be regarded as a pipeline register, providing data for IF segment instruction fetching.

The register file in the ID segment is a relatively special functional component. It is responsible for reading register operands in the ID segment. The read operation belongs to combinational logic. At the same time, the register file of the ID segment is also responsible for the write-back operation of the instruction execution results of the WB segment. The write operation requires the cooperation of the clock and is a sequential logic.

The input source of the register file write register number W# port is selected by the RegDst signal control multiplexer according to the instruction word of the ID segment; while the write data WD comes from the WB segment, that is, the write address and write data belong to Different instructions, which will cause data confusion.

First, adjust the output position of the write register number WriteReg# output by the ID segment multiplexer. It is no longer sent to the W# end of the register file, but directly sent to the ID/EX pipeline register for latch, and then segment by segment. Passed to the WB segment; finally, the MEM/WB pipeline register of the WB segment returns it to the write register number W# port of the register file. Notice in the figure that the multiplexer for the ID segment has been slightly repositioned.

The data information transferred by different pipeline register latches is not the same.

The IF/ID pipeline register needs to latch the instruction word fetched from the instruction memory and the value of PC 4.

The ID/EX pipeline register needs to latch the two operands RS and RT taken out from the register file (the values of the registers corresponding to the rs and rt fields in the instruction word) and the write register number WriteReg#, as well as the immediate sign-extended value, PC 4 and other operands that may be used later.

EX/MEM pipeline registers need to latch ALU operation results, data to be written in the data memory WriteData, write register number WriteReg# and other data.

The MEM/WB pipeline register needs to latch the ALU operation results, data read from the data memory, write register number WriteReg# and other data.

Pipeline control signals

Control signal classification

The execution process of the pipeline (taking the write instruction from the data memory to the register as an example)

fetch(IF)

Although the lw instruction will not use PC 4 in subsequent function sections, PC 4 will still be transferred to the IF/ID pipeline register for use by other instructions (such as beq).

When the clock arrives, the instruction word will be latched in the IF/ID pipeline register, and PC is updated to the value of PC 4.

Decoding/reading register (ID)

In the ID segment, the operation controller generates the operation control signals required for subsequent segments based on the instruction words in the IF/ID pipeline register and transmits them backward.

In addition, the ID segment will also read the values RS and RT of the rs and rt registers in the register file according to the rs and rt fields in the instruction word.

The sign extension unit sign-extends the 16-bit literal in the instruction word to 32 bits.

The multiplexer generates the possible write register number of the instruction WriteReg# based on the instruction word (some instructions do not require writing registers).

These 4 data will be transferred to the ID/EX pipeline register together with the sequential instruction address PC 4.

Execution/calculation address (EX)

For the lw instruction, the EX segment is mainly used to calculate the memory access address. The memory access address obtained by adding the RS value in the ID/EX pipeline register and the sign-extended immediate value is sent to the EX/MEM pipeline register.

The EX segment also needs to calculate the branch target address and generate the branch jump signal BranchTaken.

The value of RT in the ID/EX pipeline register will be used as write data in the MEM segment, so the value of RT will be sent to the EX/MEM pipeline register as write data WriteData.

The write register number WriteReg# in the ID/EX pipeline register will also be directly transferred to the EX/MEM pipeline register.

Memory access (MEM) (read or write)

Mainly based on the ALU operation results latched in the EX/MEM pipeline register - memory access address, write data and memory read and write control signal MemWrite, the memory is read or written.

The ALU operation results, WriteReg#, and data read from the data memory in the EX/MEM pipeline register will be sent to the input end of the MEM/WB pipeline register.

Write back (WB)

The WB segment selects the ALU operation result or memory access data from the MEM/WB pipeline register and writes it back to the designated register WriteReg# of the register file.

Each functional section in the pipeline does not distinguish the function of the instruction. All data information and operation control signals are output from the pipeline register at the beginning of the section, so any data and operation control signals that may be used by subsequent functional sections must be passed backwards. .

Adventures and Processing of the Assembly Line

structure adventure

Conflicts caused by multiple instructions using the same operating unit in the same clock cycle are called structural conflicts.

Calculating PC 4, calculating branch target address, and arithmetic operation all require the use of arithmetic units.

Both accessing instructions and accessing data require the use of memory.

There are also structural conflicts between the operations of the ID segment read register and the WB segment write register. However, since the read and write logic of the MIPS register file is completely independent logic, the read and write addresses and data enter through different ports, and the read and write logic can operate concurrently. Therefore this structural conflict does not exist.

Solution

Use independent instruction memory and data memory.

Block the program counter PC, causing the IF segment to pause for one clock cycle. When the next clock arrives, the IF/ID pipeline register is synchronously cleared. Entering the ID segment is a no-op (a MIPS instruction of all 0s is equivalent to a no-op). Wait until the Load instruction accesses the After the save operation is completed, the IF segment is restarted.

data adventure

The current instruction needs to use the operation result of the previous instruction, but this result has not yet been generated or has not been delivered to the specified location, which will cause the current instruction to be unable to continue execution. This is called a data conflict.

The possible data conflicts between the two instructions are as follows:

Write before read conflict (RAW)

If the source operand of instruction I2 is the destination operand of instruction I1, this data conflict is called a write-before-read conflict.

When instructions are executed in a pipelined manner, since instruction I2 uses the result of instruction I1, if instruction I2 reads the old value of the register in the ID segment before instruction I1 writes the result to the register, it will cause a read Data error.

Read before write conflict (WAR)

If the destination operand of instruction I2 is the source operand of instruction I1, this data conflict is called a read-before-write conflict.

This data dependence does not have any impact on the execution of instructions.

Write after write conflict (WAW)

If the destination operands of instructions I2 and I1 are the same, this data conflict is called a write-after-write conflict.

When instructions are executed in a pipelined manner, this write-after-write conflict has no impact on the execution of instructions.

The solution

Hardware stall and software insertion of "NOP" instructions

data bypass technology

If there is data correlation, the register operands RS and RT of the EX segment are incorrect data. The correct data should come from the destination operands of the MEM and WB segment instructions, and these instructions have completed the operation through the EX segment.

Except for the Load class memory access instructions, the destination operands are actually stored in the EX/MEM and MEM/WB pipeline registers. The correct operands can be directly redirected (Forwarding) from their locations to the appropriate locations in the EX segment ( Also called Bypass).

There is no need to insert bubbles, which can solve most data-related problems, avoid pipeline performance degradation caused by bubble insertion, and greatly optimize pipeline performance.

Instruction compilation optimization and adjustment of instruction order

Take control of the adventure

When the pipeline encounters a branch instruction or other instructions that will change the PC value, the adjacent instructions loaded into the pipeline after the branch instruction may not enter the execution stage due to branch jumps. This conflict is called a control conflict, also known as a branch. conflict.

Solution

Perform branch prediction on transfer instructions and generate transfer target addresses as early as possible.

Prefetch target instructions in both successful and unsuccessful control flow directions.

Speed up and advance condition code formation.

Improve the accuracy of guessing the transfer direction.

Pipeline performance indicators

Pipeline throughput

Pipeline speedup

Advanced pipeline technology

Superscalar pipeline technology

CPI<1

Multiple independent instructions can be concurrently executed in each clock cycle, that is, two or more instructions are compiled and executed in parallel operation.

The processor is required to be equipped with multiple functional components and instruction decoding circuits, as well as multiple register ports and buses, so that multiple operations can be executed simultaneously.

Super pipeline technology

The more pipeline functional segments are divided, the shorter the clock cycle and the higher the instruction throughput rate. Therefore, super-pipeline technology improves pipeline performance by increasing the main frequency of the pipeline.

In the original clock cycle, the functional component is used three times, causing the pipeline to run at three times the original clock frequency.

CPI=1

Very long instruction word technology

Using an architecture in which multiple instructions are processed in parallel in multiple processing units, multiple instructions can be flowed out in one clock cycle.

Basic concepts of multiprocessors

Basic concepts of SISD, SIMD and MIMD

Single instruction stream single data stream (SISD) architecture

Single Instruction Multiple Data (SIMD) architecture

Multiple instruction stream single data stream (MISD) architecture

Multiple Instruction Multiple Data (MIMD) architecture

Basic concepts of hardware multithreading

Fine-grained multithreading

Coarse-grained multithreading

Simultaneous multi-threading

Basic concepts of multi-core processors

Basic concepts of shared memory multiprocessors

Even though these systems share the same physical address space, they can still run programs independently in their own virtual address spaces.

two types

Unified Memory Access (UMA) multiprocessor

Non-uniform memory access (NUMA) multiprocessors