THE SIMPLE COMPUTER (SIM) 0.0 The SIM PROCESSOR 0.1 INTRODUCTION The C program described here simulates a very simple computer (SIM) that is designed to illustrate features of computer architecture and machine language. SIM consists of a central processing unit (CPU) or processor, a memory, and a console of some kind that allows an operator to change and display the content of memory locations and processor registers. Memory consists of 1000 memory cells, each of which contains a 4-digit decimal number called a word. The processor also contains storage locations called processor registers or simply registers. The programmer of the SIM computer must be aware of two registers The first is called the accumulator (abbreviated acc). It contains a 4-digit decimal number and is analogous to the display window in a calculator. The second register is called the instruction pointer (abbreviated ip). ip contains a 3-digit decimal number that specifies the address of the next instruction to be executed. +-------------CPU-------------+ | | | +------+ +-----+ | | acc | 0000 | ip | 000 | | | +------+ +-----+ | | | +-----------------------------+ / \ /| |\ | | | | \| |/ \ / +----------------+ | | | addr cont | | +------+ | | 000 | 1004 | | | +------+ | | 001 | 3005 | | | +------+ | | 002 | 2006 | | | +------+ | | 003 | 0000 | | | +------+ | | 004 | 5555 | | | +------+ | | 005 | 0234 | | | +------+ | | 006 | 0000 | | | +------+ | | ... ... | | +------+ | | 998 | 0000 | | | +------+ | | 999 | 0000 | | | +------+ | | | +-----MEMORY-----+ Memory is a passive device that responds to fetch and store requests from the processor. To perform a fetch, the processor sends memory a 3-digit address (e.g. 005) and memory responds by sending the word at that address (e.g. 0234) back to the processor. The content of memory is not changed. To perform a store, the processor sends memory an address (e.g. 006) and a new contents (e.g. 5789). The new contents (5789) replaces the old contents (0000). 0.2 PROGRAM EXECUTION The processor executes programs by blindly performing the following steps. 1. Using the instruction pointer (ip), fetch the next instruction from memory. 2. Add 1 to ip (so that instructions are fetched from consecutive memory cells). 3. Execute the instruction fetched in step 1. 4. Go back to step 1 to repeat the process. Graphically: +-------------------+ +---------------+ +----------------+ +--->| inst = memory[ip] |---> | ip = ip + 1 |--->| execute inst |--->+ ^ +-------------------+ +---------------+ +----------------+ | | v +<------------------------------------------------------------------------+ The processor continues fetching and executing instructions until it encounters a halt instruction or an illegal (unimplemented) instruction. In the example above, ip initially contains 000, and if the computer is started, the processor will execute the instruction in address 000 (1004) followed by the instructions in addresses 001 (3005), 002 (2006) and 003 (0000). Because 0000 is the halt instruction, the processor will halt after executing the instruction in address 003. We will trace the execution of this program in more detail. Because ip contains 000, the processor fetches the instruction in memory address 000 (namely 1004) and increments ip to 001. The instruction 1004 is an example of a ld (for load accumulator from memory) instruction. Load instructions have the format 1xxx, where xxx specifies the address containing the word to be loaded into the accumulator. In this case, the address is 004, so the instruction 1004 will load the number in address 004, namely 5555, into the accumulator. At this point, the processor registers appear as follows. (Note that ip has been incremented to 001.) +------+ +-----+ acc | 5555 | ip | 001 | +------+ +-----+ Since ip now contains 001, the processor will fetch the instruction (3005) from address 001 and increment ip to 002. The instruction 3005 is an example of an add instruction with format 3xxx. In this case, xxx is 005, so the processor will add the contents of address 005 (namely 0234) to the accumulator, producing: +------+ +-----+ acc | 5789 | ip | 002 | +------+ +-----+ Since ip now contains 002, the processor fetches the instruction 2006 from address 002 and increments ip to 003. 2006 is an example of a st (for store accumulator into memory) instruction with format 2xxx. The processor stores the word 5789 into location 006, erasing the previous contents (0000). The processor registers now contain:? +------+ +-----+ acc | 5789 | ip | 003 | +------+ +-----+ Next the processor fetches the halt instruction (0000) from address 003, increments ip to 004. The processor stops executing instructions leaving 5789 in acc and, 004 in ip, and 5789 in memory location 006. 0.3 FILE DESCRIPTIONS The SIM computer is introduced using a series of four C programs contained in the four files sim1.c, sim2.c, sim3.c, and sim4.c. The SIM1 program only implements simple data movement and arithmetic instructions, so only straight-line code can be implemented. SIM2 introduces jump and conditional branch instructions to implement loops and conditional statements. SIM3 introduces input and output instructions as well as additional data manipulation instructions. Finally, SIM4 replaces ip and acc with ten general registers and adds instructions to manipulate these registers. Each successive SIM version is backward compatible, so that the SIM4 computer will execute any program written for the earlier SIM's. In summary, the C program files are: * sim1.c In-line data movement and arithmetic. * sim2.c Adds jump and conditional branch instruction. * sim3.c Adds input, output, and additional data manipulation instructions. * sim4.c Converts sim3.c to a general register machine. * sim5.c A programming assignment. The file extensions .sim1, .sim2, .sim3, and .sim4 are used to identify SIM machine language programs designed to run on the corresponding SIM computer. The files include: * addv1.sim1 Computes 20+30 * addv2.sim1 Computer 20+30 with assembly and high level language comments * addv3.sim1 Initializes variables a and b and computes c = a + b * addv4.sim1 Same as addv3.sim1 with program relocated to different addresses. * loopv1.sim2 Computes 4+3+2+1 using a loop * readv1.sim3 Inputs two numbers and prints the sum. * readv2.sim3 Inputs n numbers and prints the sum. * sum1to10.sim4 Sums numbers from 10 to 1 to illustrate use of the general registers * reverse.sim4 Inputs n numbers and outputs in reverse order to illustrate arrays * mult.sim4 inputs 2 numbers and outputs their product to illustrate functions * addn.sim4 Uses registers to implement readv2.sim3 more efficiently. * square.sim5 Uses push and pop to save the link register. * Square2.sim5 Uses call and ret instructions to call functions. * Recurse.sim5 Computes 1+2+3+... using recursion. The file card.txt contains a 3-page SIM reference card that describes all of the features of the SIM1 through SIM5 computers. The file card.docx is a Microsoft Word version of the same document that fits on a single page. This document can be used during any quiz or examination. Finally, the current Microsoft Word document is available as simwritup.docx or as five smaller text files sim1.txt, sim2.txt, sim3.txt, sim4.txt and sim5.txt. * sim1.txt to sim5.txt - text file describing the SIM1 to SIM5 computers * simwriteup.docx - Microsoft word description of SIM1 to SIM5 * card.html - SIM reference card, HTML version * card.docx - SIM reference card, Microsoft Word version ? 1.0 THE SIMPLE COMPUTER VERSION 1 (SIM1) 1.1 A SAMPLE PROGRAM The SIM1 simulator sim1.c is a C program that executes programs written in the SIM1 machine language. The SIM1 machine language instructions are described at the beginning of the SIM1 program in #define statements. #define HALT 0 /* halt processor */ #define LD 1 /* load accumulator instruction */ #define ST 2 /* store accumulator instruction */ #define ADD 3 /* add (to accumulator) */ #define SUB 4 /* subtract (from accumulator) */ #define LDA 5 /* load address */ To create a SIM1 machine language program, the SIM1 programmer creates a data file such as addv1.sim1 (add version 1, sim1 machine language program). #addv1.sim1 - Add 20 to 30 and store the sum in address 006 000 1004 Load the accumulator with the word at 004 001 3005 Add to the accumulator the word at 005 002 2006 Store the accumulator in address 006 003 0000 Halt 004 0020 Address 004 initialized to 20 005 0030 Address 005 initialized to 30 006 0000 Address 006 initialized to 0 000 Execution starts at 000 Because the first line begins with the character “#”, it is assumed to be a comment. The second line specifies that address 000 should be initialized to 1004. The next six lines specify the initial contents of addresses 001 to 006. The text following the numbers are ignored when the program is loaded into the SIM1 memory. Because the last line contains only a single number, it specifies the address where execution is to begin and marks the end of the machine language program. The file sim1.c is the C program that simulates a SIM1 computer. This source file is compiled into an executable file called a.out using the C compiler driver gcc (for GNU C Compiler). The following Unix command is used to compile and run sim1.c using the file addv1.sim1 as input. cis-lclient02:~>gcc sim1.c cis-lclient02:~>./a.out < addv1.sim1 The string “cis-lclient02:~>” is the prompt used by the Unix computer cis-lclient02. The command gcc sim1.c requests execution of the gcc program using the file sim1.c as input. This creates a file called a.out, a machine language version of the sim1.c program (a.out is the default name for executable files created by gcc). On the second line, the command ./a.out < addv1.sim1 requests that the file ./a.out (the a.out file in the current directory) be executed using the file addv1.sim1 (rather than the keyboard) as input. The result is shown below. ? cis-lclient02:~>gcc sim1.c cis-lclient02:~>./a.out < addv1.sim1 #addv1.sim1 - Add 20 to 30 and store the sum in address 006 000 1004 Load the accumulator with the word at 004 001 3005 Add to the accumulator the word at 005 002 2006 Store the accumulator in address 006 003 0000 Halt 004 0020 Address 004 initialized to 20 005 0030 Address 005 initialized to 30 006 0000 Address 006 initialized to 0 000 Starting address of SIM program Starting execution of SIM program at address 000 cnt = 1, ip = 000, inst = 1004, acc = 0000 cnt = 2, ip = 001, inst = 3005, acc = 0020 cnt = 3, ip = 002, inst = 2006, acc = 0050 cnt = 4, ip = 003, inst = 0000, acc = 0050 Processor executed HALT instruction cnt = 4, ip = 004, inst = 0000, acc = 0050 cis-lclient02:~> The SIM1 simulator displays the contents of the input file (addv1.sim1) as well as the starting address of the program (000). As an aid to debugging, the SIM1 simulator program prints a line each time a machine language statement is executed, including a count of the total number of machine language statements executed (cnt) as well as the values of ip and acc before the instruction (inst) is executed. For example, the line: cnt = 3, ip = 002, inst = 2006, acc = 0050 shows that the third instruction to be executed was 2006 in address 002 and the value of the accumulator before the instruction was executed was 0050. Notice that there are two different machine language programs involved. The file a.out contains a machine language program created by gcc that will be executed by a 32-bit Intel processor. The file addvi.sim1 contains a SIM1 machine language program that will be interpreted by the code in a.out. 1.2 MACHINE LANGUAGE , ASSEMBLY LANGUAGE, AND HIGH LEVEL LANGUAGES When electronic digital computers were developed during the late 1940's and early 1950's, programmers created programs using machine language. When a program contains thousands of lines of code, it can be difficult for programmers to remember the function of the various operations codes (e.g. 1xxx, 2xxx, etc.) as well as the function of data stored at particular addresses (e.g. memory cells 004, 005, and 006 in the program above). Assembly languages were developed in the early 1950's to make programming easier. In assembly language programs, the numeric operation codes and address in a machine language program are replaced with symbolic operation codes and symbolic addresses. For example, the listing below displays the contents of the file addv2.sim1, the second version of our addition program. Functionally, this program is identical to the addv1.sim1 program. However, the comments have been changed to illustrate an assembly and a high level language version of the program. The comments under the heading "ASSEMBLY LANGUAGE" show an assembly language version of the addition program. The operation codes 0000, 1xxx, 2xxx, and 3xxx have been replaced by the symbols halt, ld, st, and add. The addresses 000, 004, 005, and 006 have been replaced with the symbols start, a, b, and c. The symbol .word is an assembly directive that tells the assembler to reserve space in memory for a word of data and to initialized the word to a specified value. The assembly directive .end marks the end of the assembly language program and specifies the address at which execution should begin (in this case address start). # ASSEMBLY LANUGAGE HIGH LEVEL LANGUAGE 000 1004 start: ld a a = b + c; 001 3005 add b 002 2006 st c 003 0000 halt exit(0); 004 0020 a: .word 20 int a = 20; 005 0030 b: .word 30 int b = 30; 006 0000 c: .word 0 int c = 0; 000 .end start A program called the assembler inputs the assembly language program and outputs an equivalent machine language program that can be executed by the computer hardware. The assembler translates the assembly language program into machine language by substituting numbers for names using two tables: the symbol table and the opcode table. The Symbol Table The Opcode Table symbolic numerical symbolic numerical address address opcode opcode a 004 halt 0000 b 005 ld 1xxx c 006 st 2xxx start 000 add 3xxx sub 4xxx lda 5xxx The opcode table is "built in" to the assembler. The assembler creates the symbol table by assigning successive assembly language statements to successive addresses in memory. The translation from assembly language to machine language is "one to one" in that each assembly language statement usually generates a single machine language instruction. In the middle to late 1950's, high-level languages similar to the C language were developed. The comments under "HIGH LEVEL LANGUAGE" in the file listing above illustrate how the sumv2.sim1 machine language program might be implemented in a high-level language. As the example shows, a single statement in a high-level language (a = b + c;) may be translated into several assembly and machine language statements (one to many). In addition to making programming easier, higher level languages make programs transportable between different "brands" of computers. Machine and assembly language programs will only run on a particular computer architecture, while a high level language program can be translated (by different translators or compilers) to run on any computer architecture. We will comment our machine language using both assembly language and a high level language These languages types are still used today. For example, when the command “gcc sim1.c” is entered, the gcc "driver" executes the C preprocessor, the C compiler, the GCC assembler, and the linker, producing several temporary files as well as the a.out machine language file that can be executed by the processor. +-------------+ +--------+ +---------+ +------+ -----> |preprocesseor| -----> |compiler| -----> |assembler| -----> |linker| ----> sim1.c +-------------+ sim1.i +--------+ sim1.s +---------+ sim1.o +------+ a.out When the command ./a.out is entered, the Unix operating system invokes the loader which places the machine language file a.out into (virtual) memory and starts execution of your SIM1 program. +------+ +---------------------------+ ----> |loader| -----> | execution of your program | a.out +------+ +---------------------------+ The relationships between the computer hardware, machine language, assembly language, and high level languages are the focus of this course. The Intel 32-bit and AMD 64-bit architecture, GCC assembly language for those processors, and the C programming language will be used to illustrate these concepts. 1.3 THE addv3.sim1 AND addv4.sim1 PROGRAMS The program addv3.sim1 illustrates the use of the LoaD Address (lda) instruction with format 5xxx. When an lda instruction with form 5xxx is executed, the number 0xxx is loaded into the accumulator. (Memory cell xxx is not involved in any way.). The program addv3.sim1 is similar to addv2.sim1 except that the initial values of memory cells 008 and 009 (variables a and b) are initialized by machine language statements during the execution of the program. #addv3.sim1 = Initialize A and B, and compute C as the sum #MACHINE LANGUAGE ASSEMBLY LANGUAGE HIGH LEVEL LANGUAGE 000 5123 start: lda 123 a = 123; 001 2008 st a 002 5456 lda 456 b = 456; 003 2009 st b 004 1008 ld a c = a + b; 005 3009 add b 006 2010 st c 007 0000 halt exit(0); 008 0000 a: .word int a; 009 0000 b: .word int b; 010 0000 c: .word int c; 000 .end start The first two lines of the program load the number 0123 into the accumulator and then store this number into memory word 008. (The equivalent assembly language and high level language are shown along with the machine language.) Variable b is initialized to 0456 by the third and fourth lines of the program. The program addv4.sim1 is the same as addv3.sim1 except that the code and data have been moved to different locations in memory. #addv4.sim1 - Initialize A and B, and compute C as the sum MACHINE LANGUAGE ASSEMBLY LANGUAGE HIGH LEVEL LANGUAGE # .=100 100 5123 start: lda 123 a = 123; 101 2200 st a 102 5456 lda 456 b = 456; 103 2201 st b 104 1200 ld a c = a + b; 105 3201 add b 106 2202 st c 107 0000 halt exit(0); # .=200 200 0000 a: .word int a; 201 0000 b: .word int b; 202 0000 c: .word int c; 100 .end start In addv3.sim1, the code and data used memory cells 000 through 010, while addv4.sim1 uses 100 through 107 for code and 200 through 202 for data. In assembly language, the symbol "." is called the location counter and it specifies the address in memory where the code generated from the "next" line is to be placed. Normally, the assembler initializes the location counter to 000 and increments the counter after each assembly language statement is processed so that code and data occupy successive locations in memory. As the addv4.sim1 program illustrates, directly modifying the location counter provides the programmer with a very simple way to relocate (move) a program to another area of memory. Virtual memory (to be covered in CIS 3207) provides a much more powerful and elegant method of relocating programs. 1.4 SIM1 QUESTIONS AND PROBLEMS 1. What value will the accumulator (acc) contain when the following program halts? (In machine language, instructions can be used as data, which is usually a bad idea). 000 1000 001 3001 002 0000 000 2. What value will the accumulator (acc) contain when the following program halts? (The SIM memory words have no place for a + or - sign. However, numbers between 5000 and 9999 can be interpreted as negative numbers. If we were using this signed interpretation, how should the number 9999 be interpreted?) 000 1003 001 4004 002 0000 003 0050 004 0051 000 3. What value will memory location 012 contain when the following program halts? 000 5002 001 2012 002 1012 003 3012 004 2012 005 1012 006 3012 007 2012 008 1012 009 3012 010 2012 000 4. Write an assembly language program equivalent to program 3.. 5. What value will the accumulator (acc) contain when the following program halts? (Even short programs can be very confusing.) 000 5002 001 3000 002 4002 003 2004 004 0000 005 0000 000 6. As program 2 illustrated, the number 9999 can play the role of -1 in SIM1 machine language. As shown in sim1.c, the ADD instruction is implemented with the following code. case ADD: acc = acc + memory[digit234]; if (acc > WORDLIMIT) /* wrap if acc > 9999 */ acc = acc - (WORDLIMIT + 1); /* by subtracting 10,000 */ break; As a result: 1 + 9999 = 0 2 + 9999 = 1 20 + 9999 = 19 500 + 9999 = 499 a. What number would play the role of -10 (so that adding this number to, for example, 50, would yield 40). (Hint - if you drove a new car backwards for a mile, the odometer (if it is mechanical) would go backwards from 000000 to 999999. What would happen if you went backwards for 10 miles?). b. Can you give a simple formula for taking the negative of any SIM1 number between 1 and 4999. c. What happens if you apply the same formula to numbers between 5001 and 9999? 1.5 SIM1 PRACTICE QUIX Each of the following SIM1 programs begins execution at address 000. What will be in the accumulator when each program halts? (You can use your SIM programming card). 1. acc = __ __ __ __ 5. acc = __ __ __ __ 000 1003 000 1001 001 3004 001 3000 002 0000 002 0000 003 0123 003 0123 004 0101 004 0101 2. acc = __ __ __ __ 000 1003 001 4004 002 0000 003 0123 004 0101 3. acc = __ __ __ __ 000 5003 001 3004 002 0000 003 0123 004 0101 4. acc = __ __ __ __ 000 1001 001 3004 002 0000 003 0123 004 0101