Computer architecture is the attribute of the computer that programmers see, that is, the logical structure and functional characteristics of the computer, including the interrelationship between its various hard and soft components . For computer system designers, computer architecture refers to the study of the basic design ideas of computers and the resulting logical structure; for programmers, it refers to the functional description of the system (such as instruction set, programming method, etc.).
Computer architecture refers to the system structure of software and hardware. It has two meanings: one is the system structure seen from the perspective of the programmer. It is research The conceptual structure and functional characteristics of the computer system are related to the characteristics of software design; the second is the system structure seen from the perspective of the hardware designer, which is actually the composition or realization of the computer system (see computer organization), mainly focusing on performance The reasonableness of the price ratio. In order to explain and study the attributes (external characteristics) of computers seen from the perspective of programming, Adam and others first proposed the concept of computer system structure in 1964.
Conceptual structure and functional characteristics, which are computer attributes seen from the perspective of programmers. It includes the data representation in the machine, the addressing mode, the operation of these data, and the control of the execution of these operations (namely, the instruction system). For general-purpose machines, it generally includes data representation, addressing mode, register definition, instruction system, interrupt mechanism, machine working state definition and state switching, machine-level input and output structure, and support for information protection, etc.
Computer architecture mainly studies the distribution of software and hardware functions and the determination of software and hardware interfaces. Since the 1970s, significant progress has been made in computer software. Although computers have made huge breakthroughs in performance, speed, price, reliability, organization, and implementation technology compared to the end of the 1950s, their system structure has not made any obvious and breakthrough progress. The system structure of most machines still does not deviate from the scope of the von Neumann type. The system structure seen by programmers has not changed much from the end of the 1950s. For example, with regard to the instruction system, programmers are basically still designing much more complex software based on the viewpoint of computer system structure in the late 1950s. The serious disconnection between the hardware composition of traditional computers and high-level languages and operating systems will adversely affect the reliability of software, the efficiency of source program compilation, and the efficiency of system problem-solving. This is an important aspect of the computer system structure that needs to be resolved. Subject. The data flow computer system structure thought that appeared in the 1970s changed the instruction control flow control method of the traditional computer to the data control flow control method, so that it is possible to automatically remove the obstacles of computational correlation and achieve the goal of high parallelism.
The distribution of computer software and hardware functions should mainly be considered from the implementation cost, the impact on speed and other performance requirements, that is, how to allocate can improve the performance-price ratio. The hardening or solidification of the basic and general functions of the operating system is conducive to improving the execution efficiency and speed of the operating system and reducing overhead; while the functions are unstable, that is, those that need to be constantly changed, and the implementation of software is conducive to providing the necessary flexibility . Implementation costs include development costs and repetitive production costs. The cost of hardware design and repetitive production is larger than that of software. The functions suitable for hardware implementation should be stable, commonly used, and relatively small, and the slowdown in software implementation speed will have a greater impact on the performance of the computer system. . Hardware implementation is only economically beneficial to computer systems with a large output.
Eight kinds of attributes
1. In-machine data representation: the data type and format that can be directly recognized and operated by the hardware
2·Addressing mode: the smallest possible Addressing unit, type of addressing mode, address operation
3·Register organization: definition, quantity and usage rules of operation register, index register, control register and special register
4. Instruction system: machine instruction operation type, format, ordering and control mechanism between instructions
5·Storage system: minimum addressing unit, addressing method, main memory capacity, maximum addressable space
6·Interrupt mechanism: interrupt type, interrupt level, and interrupt response method, etc.
7·Input and output structure: input and output connection mode, processor/memory and input and output equipment The way of data exchange and the control of the process of data exchange
8·Information protection: information protection method, hardware information protection mechanism.
The conceptual structure and functional characteristics of a computer, which refers to the attributes of the computer system in the eyes of the system programmer, and also includes the computer system seen by the machine designer The logical structure. In short, it is a detailed description of the relationship between the various parts of the computer. It is a comprehensive concept of hardware, software, algorithms, and language. The term computer architecture is also called computer system structure. It has developed into a subject with a wide range of content and has become a compulsory course for computer majors in colleges and universities. The main contents of its research are as follows:
1. The instruction system includes the operation type, format, and addressing of machine instructions.
2. The storage system includes multi-level storage hierarchy, virtual storage structure, high-speed buffer storage structure and storage protection.
3. The input and output system includes channel structure, input and output processor structure, etc.
4. The central processing unit structure includes von Neumann structure, non-von Neumann structure, overlapping structure, pipeline structure, parallel processing structure, etc.
5. Multi-computer system includes interconnection technology, multi-processor structure, distributed processing structure, computer network structure, etc.
6. Human-machine communication links include man-machine interface, computer reliability, availability and maintainability (the three are called RAS technology), fault-tolerant technology, fault diagnosis, etc.
In addition, the computer hardware description language, computer system performance evaluation and other content are also studied.
Computer architecture has gone through four different stages of development.
Before the mid-1960s, it was the early era of the development of computer architecture. In this period, general-purpose hardware has become quite common, but software is specially written for each specific application. Most people think that software development is something that does not need to be planned in advance. The software at this time is actually a smaller-scale program, and the programmer and user of the program are often the same (or the same group) of people. Due to the small scale, the program is quite easy to write, there is no systematic method, and there is no management of software development work. This individualized software environment makes software design only a vague process implicitly carried out in people's minds. Except for the program list, no other documents are preserved at all.
From the mid-1960s to the mid-1970s, it was the second generation of computer architecture development. In the past 10 years, computer technology has made great progress. The multi-program and multi-user system introduced the new concept of human-computer interaction, created a new realm of computer applications, and brought the coordination of hardware and software to a new level. Real-time systems can collect, analyze, and transform data from multiple information sources, so that process control can be performed in milliseconds instead of minutes. Advances in online storage technology led to the emergence of the first generation of database management systems. An important feature of this stage is the emergence of "software workshops", extensive use of product software. However, "software workshops" basically still use the individualized software development methods that were formed in the early days. With the increasing popularity of computer applications, the number of software has expanded dramatically. Errors found during program operation must be corrected; users must modify the program accordingly when they have new requirements; when hardware or operating systems are updated, programs usually need to be modified to adapt to the new environment. The above-mentioned software maintenance work consumes resources at an alarming rate. More seriously, the individual nature of many programs makes them ultimately unmaintainable. The "software crisis" just started to appear. In 1968, computer scientists from the North Atlantic Treaty Organization held an international conference in the Federal Republic of Germany to discuss the software crisis. At this conference, the term "software engineering" was formally proposed and used, and a new engineering discipline was born.
The third generation of computer architecture development began in the mid-1970s and spanned a full 10 years. In the past 10 years, computer technology has made great progress. Distributed systems have greatly increased the complexity of computer systems. Local area networks, wide area networks, broadband digital communications, and the increased demand for "instant" data access all put higher demands on software developers. However, during this period, software was still mainly used in industry and academia, with very few personal applications. The main feature of this period is the emergence of microprocessors, and microprocessors have been widely used. "Smart" products with microprocessors as the core can be seen everywhere. Of course, the most important smart product is a personal computer. In less than 10 years, personal computers have become a popular commodity.
The fourth generation of computer architecture development began in the mid-1980s and has continued to the present. At this stage, what people feel is the combined effect of hardware and software. Powerful desktop computers, local area networks and wide area networks controlled by complex operating systems, combined with advanced application software, have become the current mainstream. Computer architecture has rapidly changed from a centralized host environment to a distributed client/server (or browser/server) environment. The worldwide information network provides conditions for people to conduct extensive exchanges and fully share resources. The software industry has already occupied a pivotal position in the world economy. With the advancing of the times, new technologies continue to emerge. Object-oriented technology has rapidly replaced traditional software development methods in many fields.
The "fourth generation technology" of software development has changed the way the software industry develops computer programs. Expert systems and artificial intelligence software finally came out of the laboratory and entered practical applications, solving a large number of practical problems. Artificial neural network software using fuzzy logic has demonstrated the bright prospects of pattern recognition and anthropomorphic information processing. Virtual reality technology and multimedia systems make it possible to communicate with users in completely different ways than before. Genetic algorithms make it possible for us to develop software that resides on large parallel biological computers.
The computer architecture solves the problems that the computer system needs to solve in general and functions. It is a different concept from computer composition and computer implementation. An architecture may have multiple components, and a single composition may also have multiple physical realizations.
The logical realization of the computer system structure, including the composition of the internal data flow and control flow of the machine, and the logical design. Its goal is to rationally combine various components and equipment into a computer to achieve a specific system structure while meeting the desired cost-performance ratio. Generally speaking, the scope of computer composition research includes: determining the width of the data path, determining the degree of sharing of functional components by various operations, determining dedicated functional components, determining the parallelism of functional components, designing buffering and queuing strategies, and designing control mechanisms And determine which reliable technology to use, etc. Physical realization of computer composition. Including the physical structure of the processor, main memory and other components, the integration and speed of the device, the division and connection of devices, modules, plug-ins, and backplanes, the design of special devices, signal transmission technology, power supply, cooling and assembly technologies and related technologies Manufacturing process and technology.
In 1966, Michael J. Flynn proposed to classify computer systems according to the parallelism of instruction flow and data flow, which is defined as follows.
·Instruction stream: the sequence of instructions executed by the machine
·Data stream: the data sequence called by the instruction stream, including input data and intermediate results
·Parallel Degree: The maximum possible number of instructions or data executed in parallel.
Flynn divides computer systems into 4 categories according to different instruction flow-data flow organization methods.
1·Single Instruction Stream Single DataStream (Single Instruction Stream Single DataStream, SISD)
SISD is actually a traditional sequential execution single-processor computer, and its instruction components only Decode one instruction and assign data to only one operating part.
2·Single Instruction Stream Multiple Data Stream (SIMD)
SIMD is represented by a parallel processor. The structure is shown in Figure 1. The parallel processor includes multiple The two repeated processing units PU1 to PUn are controlled by a single instruction unit, and they are allocated different data required by each according to the requirements of the same instruction stream.
3·Multiple Instruction Stream Single Data Stream (MISD)
The structure of MISD, it has n processing units, according to the requirements of n different instructions Different processing of the same data stream and its intermediate results. The output of one processing unit is used as the input of another processing unit.
4·Multiple Instruction Stream Multiple Data Stream (MIMD)
The structure of MIMD refers to the implementation of operations, tasks, instructions and other levels of parallelism For multi-machine systems, multi-processors belong to MIMD.
In 1972, Feng Zeyun proposed to use maximum parallelism to classify computer architecture. The so-called maximum degree of parallelism Pm refers to the maximum number of binary digits that a computer system can process in a unit of time. Assuming that the number of binary bits that can be processed in each clock cycle △ti is Pi, the average parallelism in T clock cycles is Pa=(∑Pi)/T (where i is 1, 2, ..., T). The average degree of parallelism depends on the running degree of the system and has nothing to do with the application. Therefore, the average utilization rate of the system in the period T is μ=Pa/Pm=(∑Pi)/(T*Pm). A point in a rectangular coordinate system is used to represent a computer system. The abscissa represents the word width (N bits), that is, the number of binary digits processed at the same time in a word; the ordinate represents the bit slice width (M bits), that is, in one bit For the number of words that can be processed simultaneously in the film, the maximum degree of parallelism is Pm=N*M.
Four different computer structures are derived from this:
①Word serial and bit serial (WSBS for short). Where N=1 and M=1.
②Word parallel, bit serial (abbreviated as WPBS). Where N=1, M>1.
③Word serial, bit parallel (referred to as WSBP). Where N>1, M=1.
④Word parallel, bit parallel (referred to as WPBP). Where N>1, M>1.
The computer architecture is based on Turing machine theory and belongs to the von Neumann architecture. Essentially, Turing machine theory and von Neumann architecture are one-dimensional serial, while multi-core processors belong to a distributed and discrete parallel structure, and the mismatch between the two needs to be resolved.
First of all, the matching problem between the serial Turing machine model and the physically distributed multi-core processor. The Turing machine model means a serial programming model. It is difficult for serial programs to use multiple processor cores that are physically distributed to achieve performance acceleration. At the same time, the parallel programming model has not been well promoted, and is only limited to limited fields such as scientific computing. Researchers should seek suitable The mechanism to achieve the matching problem between the serial Turing machine model and the physically distributed multi-core processor or to narrow the gap between the two, to solve the problem of "parallel program programming is difficult, serial program acceleration is small".
In terms of supporting multi-threaded parallel applications, future multi-core processors should be considered from the following two directions. The first is to introduce a new programming model that can better express parallelism. Because the new programming model allows programmers to clearly express the parallelism of the program, performance can be greatly improved. For example, the Cell processor provides different programming models to support different applications. The difficulty lies in how to effectively promote the programming model and how to solve the compatibility problem. The second type of direction is to provide better hardware support to reduce the complexity of parallel programming. Parallel programs often need to use the lock mechanism to achieve synchronization and mutual exclusion of critical resources. The programmer must carefully determine the location of the lock, because the conservative locking strategy limits the performance of the program, and the precise locking strategy greatly increases the programming Complexity. Some studies have made effective explorations in this regard. For example, the Speculative Lock Elision mechanism allows the lock operation executed by the program to be ignored without conflicts, thus reducing the complexity of programming while taking into account the performance of parallel program execution. This mechanism allows programmers to concentrate on the correctness of the program, without having to consider the execution performance of the program too much. More radically, the Transactional Coherence and Consistency (TCC) mechanism considers data consistency issues in units of multiple memory access operations (Transaction), which further simplifies the complexity of parallel programming.
Mainstream commercial multi-core processors are mainly aimed at parallel applications. How to use multi-core to accelerate serial programs is still a problem worthy of attention. The key technology is to use software or hardware to automatically derive codes or threads that can be executed in parallel on a multi-core processor from a serial program. There are three main methods for multi-core acceleration of serial programs, including parallel compiler, speculative multi-threading, and thread-based prefetching mechanism. In traditional parallel compilation, the compiler needs to spend a lot of effort to ensure that there is no data dependency between the threads to be divided. There are a lot of fuzzy dependencies during compilation, especially when pointers (such as C programs) are allowed, the compiler has to adopt conservative strategies to ensure the correctness of program execution. This greatly limits the degree of concurrency that serial programs can tap, and also determines that the parallel compiler can only be used in a narrow range. To solve these problems, people propose speculative multi-threading and thread-based prefetching mechanisms. However, since this concept was put forward up to now, most of the research in this direction has been limited to the academic world, and only a few commercial processors have applied this technology, and they are only limited to special application areas. We believe that the combination of dynamic optimization technology and speculative multithreading (including thread-based prefetching mechanism) is a possible future development trend.
Secondly, the problem of matching the one-dimensional address space of the von Neumann architecture and the multi-dimensional memory access level of the multi-core processor. Essentially, the von Neumann architecture uses a one-dimensional address space. Due to uneven data access delays and different copies of the same data on multiple processor cores, data consistency problems are caused. Research in this field is divided into two categories: One type of research mainly introduces a new level of memory access. The new memory access level may adopt a one-dimensional distributed implementation. A typical example is the addition of a distributed and unified addressing register network. The feature of global unified addressing avoids the consideration of data consistency. At the same time, compared to traditional large-capacity cache access, registers can provide faster access speeds. Both TRIPS and RAW have implemented similar core register networks. In addition, the new memory access level can also be private. For example, each processor core has its own private memory access space. The advantage is that the data storage space is better divided, and there is no need to consider data consistency issues for some private data. For example, the Cell processor sets up a private data buffer for each SPE core. Another type of research mainly involves the development of a new cache consistency protocol. The important trend is to relax the relationship between correctness and performance. For example, it is speculated that the Cache protocol speculates and executes related instructions before the data consistency is confirmed, thereby reducing the impact of long-latency memory access operations on the pipeline. In addition, Token Coherence and TCC also adopted similar ideas.
Third, the diversity of programs and the matching problem of a single system structure. Future applications show the characteristics of diversity. On the one hand, the evaluation of the processor is not only limited to performance, but also includes other indicators such as reliability and safety. On the other hand, even if you consider only the pursuit of performance improvement, different applications also contain different levels of parallelism. The diversity of applications drives future processors to have a configurable and flexible architecture. TRIPS has made fruitful explorations in this regard. For example, its processor cores and on-chip storage systems have configurable capabilities, so that TRIPS can simultaneously mine instruction-level parallelism, data-level parallelism, and instruction-level parallelism.
The emergence of new processing structures such as multi-core and Cell is not only a landmark event in the history of processor architecture, but also a subversion of traditional computing models and computer architecture.
In 2005, a series of computer architectures with far-reaching influence were exposed, which may lay a fundamental foundation for the computer architecture of the next ten years, at least for the processor and even the entire computer architecture. Symbolic guidance. With the increase of computing density, the measurement standards and methods of processor and computer performance are changing. From the application point of view, the most satisfactory combination of mobile and deflection performance has been found, and it may detonate handheld The rapid expansion of equipment. Although handheld devices are relatively popular nowadays, in terms of computing power, scalability, and energy consumption, they fully play the role that a handheld device should have; on the other hand, performance-oriented server and desktop terminals are beginning to consider reducing power consumption. Catch up with the trend of a conservation-minded society.
Cell itself adapts to this change, and it also creates this change on its own. Therefore, it has emphasized different design styles from the beginning. In addition to being able to expand multiple times, the internal SPU (Synergistic Processor Unit) of the processor has good scalability, so it can face at the same time. General-purpose and special-purpose processing, realize the flexible reconstruction of processing resources. This means that through appropriate software control, Cell can cope with multiple types of processing tasks, while also being able to streamline the complexity of the design.