MindMap Gallery How the program runs
This book introduces the composition of computers, CPU, binary operations, memory, operating systems, program running, assembly, hardware control, machine learning, etc. to computer practitioners, allowing readers to have an in-depth understanding of how programs go through various processes from source files. Running in Your Computer is a must-read popular science book for computer enthusiasts and practitioners.
Edited at 2024-01-18 19:32:06El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
El cáncer de pulmón es un tumor maligno que se origina en la mucosa bronquial o las glándulas de los pulmones. Es uno de los tumores malignos con mayor morbilidad y mortalidad y mayor amenaza para la salud y la vida humana.
La diabetes es una enfermedad crónica con hiperglucemia como signo principal. Es causada principalmente por una disminución en la secreción de insulina causada por una disfunción de las células de los islotes pancreáticos, o porque el cuerpo es insensible a la acción de la insulina (es decir, resistencia a la insulina), o ambas cosas. la glucosa en la sangre es ineficaz para ser utilizada y almacenada.
El sistema digestivo es uno de los nueve sistemas principales del cuerpo humano y es el principal responsable de la ingesta, digestión, absorción y excreción de los alimentos. Consta de dos partes principales: el tracto digestivo y las glándulas digestivas.
How does the program run?
1.Processor of the program: CPU
It consists of four parts: register, controller, arithmetic unit and clock. Use current signals to connect.
Register: temporary storage of data and instructions
Controller: Control the reading of data and instructions in the register
Clock: program timing
Operator: operates on the data read from the register
CPU is a collection of various functional registers
Registers can only handle machine language
Machine language is compiled from a high-level programming language
Register 1: Program Counter
Determine program execution by executing address instructions
Register 2: Flag register
Save operation results (positive, negative, zero, overflow, parity)
function call
This is achieved by setting the value of the program counter to the storage address of the function.
Use the stack to get the call address and return address
Array storage
Use the base register to store the array contents and the index register to store the array index.
2. Data is represented in binary
Why binary representation is used: inside the computer, diodes are used to represent two situations: pass or fail.
The smallest unit of binary is bit, which represents the bit of a number in binary.
The basic unit of binary is byte, a byte is 8 bits
Binary calculations
Conversion method between binary and decimal: Add the results of the powers of the binary bits to get the decimal number
The operation of decimal numbers in the computer is still converted into binary numbers for calculation: for example, shifting the binary number to the left by one position is equivalent to multiplying the number by 2.
Subtraction inside the computer is implemented using addition, here we use "complement"
The highest bit of binary is the sign bit, 1 represents a negative number, 0 represents a positive number
When representing negative numbers, you need to use "complement" to calculate
To find a negative number, first use an 8-digit binary number to represent the positive number, then invert the numbers on all digits, and then add 1 to the result.
When calculating addition, if the highest bit exceeds the calculation range, it will overflow, and the overflow value will be automatically discarded by the computer.
Unsigned types are binary numbers that are all positive numbers. The signed type takes away the highest bit to represent the sign, leaving only n-1 bits, so the positive and negative values account for half.
The difference between logical right shift and arithmetic right shift
Logical right shift: It is equivalent to moving the image to the right, directly filling the vacated position on the left with 0
Arithmetic right shift: the binary digits are moved to the right as a whole, and the vacant positions are filled with 0 or 1
If the value is a negative value represented by one's complement, then by right shifting and adding 1 to the highest vacated bit, numerical operations such as 1/2, 1/4, 1/8, etc. can be correctly implemented. If it is a positive number, just add 0 to the highest bit.
Binary numbers are converted into hexadecimal numbers, and the length can be reduced to 1/4 of the original, which is more concise and clear
Anything with 0x at the beginning represents the hexadecimal value.
3. Floating point numbers
When using binary to represent a decimal, it is impossible to achieve accurate representation and can only make a divisor with a range of precision.
The representation of floating point numbers is the IEEE standard
Single precision floating point number (32 bits)
Sign part 1, exponent part 8, mantissa part 23
Double precision floating point number (64 bits)
Sign part 1, exponent part 11, mantissa part 52
Expression method: regular expression
EXCESS system
The EXCESS system behaves in such a way that negative numbers do not need to be represented by a sign by setting the middle value of the range represented by the exponent part to 0.
4.Memory
Data can be read from the memory, and the power-off information disappears.
There are 8 data signal pins, so it can represent 8 bits and 1 byte
There are 10 address signal pins, which can represent 1024 signals, which is 1K
Different data types, even the same value, occupy different memory sizes (therefore, in order not to waste the 8 bits of each layer when defining variable types in the program, you need to adjust the position of the types to be as compact as possible. using memory)
pointer
A pointer is also a variable. What it represents is not the value of the data, but the address of the memory where the data is stored. By using a pointer, you can read and write data at any specified address.
array
Stored at consecutive addresses in memory. Use index to indicate the address of each data.
stacks and queues
Stacks and queues do not use indexes to access data, but they can divide a memory area in the form of an array with a certain number of elements to implement internal access.
Stack: first in, last out
Queue: first in, first out (using a ring buffer, which can be accessed repeatedly in a fixed size memory)
linked list
It can be more convenient to add, delete, modify and check
Binary tree
Easy to search
Ways to save memory
Share DLL files to reduce duplicate storage of functions
Reduce program file size by calling _stdcall
5. Disk
Programs stored on disk must be loaded into memory before they can be run. The CPU, which is responsible for parsing and running program content, needs to specify the memory address through the internal program counter before it can read out the program.
Disk caching speeds up disk access
Virtual memory: It is actually disk space, but this space is divided into multiple pages, and the contents of the pages are continuously read into the memory when running is required.
Computers generally divide disks into sectors and store them in clusters. No matter how small a file is, it must occupy a cluster exclusively.
6. Compress data
Files are stored in bytes
RLE compression algorithm
There are restrictions. If the proportion of repeated content in the file is not large, it will cause the file to expand.
Huffman algorithm
The key to the Huffman algorithm is that "data that appears multiple times can be represented by a number of bytes less than 8 bits, and data that is rarely used can be represented by a number of bytes exceeding 8 bits."
Use a Huffman tree to arrange the codes of each character, with high frequency in the short bit and low frequency in the long bit, and each code is used as a leaf node of the binary tree
Reversible compression and non-reversible compression
7. Program running environment
Operating environment: operating system and computer hardware
Source code->Native code->Run
Windows overcomes hardware differences other than CPU, allowing different models to be compatible with the same program
Different CPUs use different machine languages. Therefore, when the same program is migrated to other CPUs, a CPU-specific native code compiler is required to recompile it into the corresponding native code.
Use virtual machines to obtain other operating system environments
Virtual PC for MAC can make the Macintosh hardware the same as an AT-compatible computer, so that Windows can be installed on the hardware
Java virtual machine
The Java virtual machine runs while converting Java byte code one by one into native code.
BIOS
BIOS is stored in ROM and is a program pre-built into the computer host
In addition to basic control programs such as keyboard, disk, and graphics card, BIOS also has the function of starting the "boot program"
8. From source file to executable file
Source code needs to be compiled into native code to run
The essence of native code is a sequence of hexadecimal values
The compiler is responsible for translating source code into native code
The .c file becomes an .obj file after being compiled by the compiler. At this time, the program still cannot run.
The connector splices multiple target files together to generate an EXE file. This process is called linking. Only after entering the link command can the .exe file be generated.
Library files are packaged from multiple object files. By specifying a library file when linking, the linker can extract the required object files from it and link them with other object files to generate an EXE file.
Windows API is a program application interface
The essence of API is a function. The target file of API is the library file of dynamic link library (DLL) (it does not actually store the target file, but only provides the link to the target file, which is used to automatically obtain the target file when the program is running)
A library file that contains the target file itself and can be directly linked to the EXE file is called a static link library.
Variables and functions are required to run an executable file
In the EXE file, the memory addresses allocated to variables and functions are virtual. When the program runs, these virtual memory addresses will be converted into actual memory addresses. The linker will record various locations that require memory address translation at the beginning of the EXE file. This information is called relocation information.
Area composition in memory: variable space, function space, heap space, stack space
The stack is used to store local variables in functions and parameters that need to be passed.
The stack space is automatically generated or released by the compiler and does not require manual operation.
The heap is used to store arbitrary data
Heap space requires manual allocation and release (malloc and free) (new and delete)
9. Operating system
Nature
It is a monitoring program with the function of loading and running programs.
portability
High-level programming languages use a common language when editing source code, but after compiling native code in different operating systems, the program calls system functions within the system. This is portability.
Hardware abstraction
Operating systems and high-level programming languages abstract the hardware so that programmers no longer have to worry about system calls and hardware
Features of Windows operating system
Available in 32-bit and 64-bit versions
Provide system calls through a set of API functions
Using GUI
Ability to print output in WYSIWYG format
Provide multi-tasking capabilities
Provide network and database functions
Automatic device driver installation via Plug and Play
10. Assembly Language and Native Code
Assembly language
Assembly language uses mnemonics, which are instructions for native code
Source code written in assembly language must be converted into native machine code before it can be run.
The program that converts assembly language into native code is an assembler, and the conversion process is called assembly
You can also convert native language into assembly language. The conversion process is called disassembly.
The C language compiler can also convert C language source code into assembly language source code
There are two types of instructions in assembly language
1. General instructions that will be converted into native code
2. Pseudo-instructions specifically for assembler
Pseudo-instructions are responsible for telling the assembler the structure of the program and the method of assembly, so they are also called assembler instructions.
In assembly language, comments starting with # sign are
grammar
opcode
Indicates the action of the command
operand
Indicates the operation object of the instruction
When running the program, a stack space will be opened in the register. The function variables called will use this stack. The stack space will be cleared after the program ends.
Call functions
Take the parameters from the stack and perform the operation, store the return value in the eax register, and take the return target address from the stack and let the process return
variable
global variables
Declared outside the function, all functions in the program can access it
local variables
Declared inside a function, it can only be accessed within the function in which it is declared.
cycle
conditional branch
11. Access hardware
Programs access hardware through the operating system
Input and output instructions
I/O controller = port
Temporarily store input and output data
Use port number to distinguish, that is, I/O address
As long as the port number is specified in the in command and out command, you can access the I/O controller and complete the input and output operations.
Interrupt handling
Pause the current running program and run other programs
The I/O controller issues an interrupt request, and the CPU runs interrupt processing. The interrupt controller is used between the two to hand it over to the CPU for processing in turn.
DMA
A method of data transmission between external devices and memory directly without being transferred by the CPU. It is commonly used in devices such as networks and disks.
PIO
The way data is transferred between external devices and memory through the CPU is called PIO
Display characters and images
Store the data in the video memory and display it on the monitor
The graphics card has independent video memory and image processor GPU
12. Machine Learning
concept
Programmers only write programs for learning. The content of this program is to let the computer read a large amount of data, then learn the characteristics of this data, and generate a recognition model
supervised learning
Supervised learning is to provide the computer with a large amount of data with correct answers.
step
(1) Divide learning data and answer data into training data and test data
(2) Use learning algorithms to learn training data and generate models
(3) Use test data to evaluate the performance of the model
Machine learning algorithms
Support Vector Machines
tool
Python language
Libraries containing various machine learning related functions are provided in Python
Script mode uses the Python interpreter to interpret and execute pre-written source code (script mode)
Directly start the Python interpreter, input the program line by line through the keyboard and interpret the execution of the interactive mode (interactive mode) (machine learning uses this mode)
Cross-validation
Cross-validation is a method of performing machine learning that continuously rotates training and test data
You can check whether the recognition rate of the learning model is biased due to the type of learning data.