CMSC 144 Notes
Computer Systems
What are computer applications?
- C programs
- But what do those programs run on?
Computer systems provide environment for programs to run on hardware
- have multiple programs
- Efficient sharing of hardware is a key functionality of computer systems
Moore’s law → transistor count increasing exponentially (doubles every two years) (has broken down since 2000)
Computer systems provide an abstraction of different hardware
What is a computer system?
graph TD
Application <--> Physics
Application to physics → we need to bridge this gap through layers of abstraction
3 layers in the middle (that we’re going to focus on — there’s many more): OS, Instruction Set Architecture, Microarchitecture
OS: interfaces for controlling and coordinating the use of hardware among various applications and users
ISA: interface between hardware and software
Microarchitecture: implementation of ISA
- key to performance and efficiency
Focus of course is key design principles, not their implementation
- LogisticsDrop lowest HW score (it counts as an 100)
Lecture 0
Bits → 0 or 1
- noise in communication information shouldn’t be enough to flip a bit
Byte → 8 bits (minimum addressable unit in memory)
- can be written as binary, decimal, or hexademical
- hex in c → 0x
Word size → continuous block bits but usually longer than a byte (natural data size a CPU can process)
- for newer system, 64 bits
- Shorter word size → less possible memory addresses
- on a 32 bit machine 2^32 → 4gb of memory
- on a 64 bit machine 2^48→ 256 terabytes (so we don’t have to use all 64)
Word size = memory address size
Address of each word is the address of its first byte
Pointer takes 4 bits on a 32 bit machine and 8 bytes on a 64 bit machine (since it’s pointing to the word)
Byte Ordering
How are bytes ordered in memory?
Big endian: least significant byte has the highest address
Little endian: least significant byte has the lowest address (most common) so for 0x01234567 is represented as 67 45 23 01
- Why?
- Most numbers that are nonzero are the smaller numbers so put them first
- X-endian: least significant byte has the X-est address
Bit-Level Operations
If you’re checking if p is a pointer
p&&*p
Right shift:
Logical shift → fill with 0s
Arithmetic shift → replicate most significant bit
- C uses arithmetic right shifts for signed data
Integers
X = [x_w-1, … , x0]
For signed numbers, the most significant bit is 0 (nonnegative) or 1 (negative), but it’s summed differently
e.g., 1111 = -1 for signed while it equals 15 for unsigned (so for a 4 bit number, if it’s negative, it’s just 8 - the last 3 bits
To get from max to min (max positive to min neg) just apply not
Conversion Casting
Casting does not change bit level encoding! Just changes interpretation
Constants
- integers are considered signed integers unless explicitly defined as unsigned (capital U at end of value)
Implicit casting
unsigned int = int → casts int to unsigned int (vice versa will happen if int = unsigned int)
IF THERE IS A MIX OF UNSIGNED AND SIGNED IN A SINGLE EXPRESSION, SIGNED VALUES IMPLICITLY CAST TO SIGNED
Expanding and Truncating
Sign extension
When going from a short int to an int for example, copy the most significant bit k times (the number of bits you’re adding)
- Casting keeps the bit vector, but extension keeps the value
Truncating
keep bits on right and remove bits on the left
Unsigned Addition
Ignores carry output (truncates) — we get a modular sum
Signed Addition
Negative overflow = true sum is below lowest possible value
Positive overflow = true sum is greater than lowest possible value
Machine-Level Programming I: Basics of Instruction Set Architecture
ISA — interface between software and hardware
Object = state + methods
- state is data
- methods are operations on data
These methods are the interface of an object
Programming Interface
Software programming interface: programming languages
void swap(long *xp, long*yp) {
//change value of two pointers
}
Computer compiles this C code into assembly code (Instruction Set Architecture)
- On the left, you have methods and on the right you have states
swap:
movq (%rdi), %rax
//and more
Why is ISA important?
In early days of computers, there are incompatible lines of computers (new software for each hardware) → ISA enables for same software to work on all hardware
Tradeoff: software developers don’t worry about specs in hardware but need to rewrite ISA when you want to take advantage of new hardware (interface can limit innovation)
Architectural States
CPU
- State is called registers
Memory
- State is a big array with bytes
Memory Basics
- One long array of bytes — smallest addressable unit in memory (you can’t interface with specific bits)
- Each byte has a memory address
- Word → nominal size of integer-valued data in a machine
- Word is size of each memory address
- Word is size of CPU register
- When CPU talks to memory, it talks to memory in terms of words
- Word size limits the memory size!
- If a machine uses 8-bit words → 2^8 = 256 unique memory addresses (the max number of places the CPU can refer to)
- 32 bit words → 2^32 unique addresses → 4GB (not big enough)
- 64 bit words → 2^64… alot, 2^48 gives 256 terabytes
<aside> 💡 If you were to represent 234 in a short, what would it look?
</aside>
Multi-Byte Variables & Byte Ordering
Multi-Byte variable
Two conventions:
- Big Endian: least significant bit has highest address
- Little Endian: least significant bit has the lowest address (most common)
- For example, if you are moduloing, you simple need that first address (sometimes)
- Most frequently used part of the variable is that in the first address
Array memory layout
Little endian does not change the order of the elements in the array but it does change the byte order in the array
E.g.,
short a[2] = {1, 2}
Big Endian → 00 01 00 02
Little Endian → 02 00 01 00
Strings
char S[4] = "123";
Big Endian → 31 32 33 00
Little Endian → 31 32 33 00 (they’re all one byte so there’s no shift)
CPU
- Registers: heavily used program data (temporary variables)
- 16 general purpose registers
- Each has a size of 8 bytes (64 bits — that’s why ISA is called x86-64)
- there’s a lower 4 byte part to each register (that’s what the previous generation used)
Moving Data
movq$0x100, %rax
movq → move 8 bytes (q = 8 bytes) from somewhere to somewhere
%rax → destination
$0x100 → the value to be moved (in 8 byte format
Prefix $ indicates it’s a constant value
movq Source, Dest
Operand types
- Register: one of 16 integer registers
- Ex. movq %rax, %r13
Immediate: movq $0x100, %rax
- value
Absolute: movq 0x100, %rax
- 0x100 treated as memory address and copy value to %rax
Memory: movq (%rax), %r13
- Use value in %rax aas a memory address and copy teh value from memory to %r13
- There is no memory to memory (e.g., no movq (%rax), (%rdx) → this could be long so ew break it up, we can just do this with two instructions)
Machine-Level Programming II: Basics of ISA
x86-64 ISA (designed by intel)
- 64 bit word → 64 bit memory addresses and 64 bit registers
Registers can be used as anything as long as it’s 8 bytes — how we interpret it is up to us (e.g., placing parentheses around the value or taking the value as it is)
Every byte in memory has a memory address — you read starting from an address and then read 8 bytes after that
How mov works byte by byte
rax has 0x123456789abcded (massive number)
rdi has 0x120 (fills everything to left with 0s)
Assume little endian
movq → moves 8 bytes!
movl → moves 4 bytes!
movq %rax, (%rdi)
0x120 | ef |
---|---|
0x121 | cd |
… | … |
0x127 | 01 |
movq (%rdi), %rdx → rdx will have ef in lowest byte etc.
What if you used movl?
you only get 89abcdef
Simple Memory Addressing Modes
- So far, we have to figure out memory address and then put it in a cpu register and then use parentheses
- Normal
- movq (%rcx), %rax → copy Mem[Reg[rcx]] to rax
- Displacement
- movq 8(%rbp), %rdx → copy Mem[Reg[rbp] +8] to rdx
- D(rb)
- D(Rb, Ri, S)
- D: constant displacement (D bytes)
- Rb: base register
- RI: index register (how many steps)
- Scale: 1, 2, 4, 8 (bytes per step)
- D(Rb, Ri, S)
Example:
1(%rax, %rdi, 4)
- Start at %rax, move 4*%rdi, add 1
- 1(%rax, %rdi, 4) ⇒ %rax + 4*%rdi + 1
- if %rax = 0x120, %rdi = 0x2 → the above equals 0x129
Special cases on slides
Address Computation Instruction (leaq)
leaq Src, Dst (lea → load effective address) → just pointer arithmetic
- copies memory address, not value at that memory address
mov → pointer arithmetic and dereference value at that pointer
uses
int arr[] = [1, 2, 3, 4, 5]
return *p &arr[i]
%rdi has arr addr
%rsi has i
leaq(%rdi, %rsi, 4), %rax
Why 4? → int has 4 bytes
Arithmetic calculation
long m3(long x)
{
return x*3;
}
leaq(%rdi, %rdi, 2), %rax (Takes value and addits to 2 more times)
Two and one operand instructions on slides for HW
- there are two right shifts
- left shift can be used for multiplication (e.g., mult by 16 → shift 4 to the left)
%rax holds the return value!
Getting Assembly Code
clang -S -o [outputfile.s] [inputfile.c]
- does it line by line
Machine Level Programming Part 3
- So far, we’ve seen examples run linearly
- But what if has an if/else statement or a loop?
How can we use assembly to implement non-linear code?
ISA provides cmp, set, and jmp
- cmpq → compares two values
- jmp → jump to a location in assembly
cmpq
- Why not just do subtraction and look at that result?
- Changes value of one of the registers (could be good in certain instance)
- Needs a side effect to track result
- Condition codes! CF ZF SF OF (1 bit)
Two’s complement representation
Leading bit indicates sign
- if 1 → interpret as -2^n
Import properties
- values can’t exceed a range (this is overflow)
- -y is ~y+1
cmp
Consider x < y (signed)
How to compare on a bit level?
- if doing arithmetic, throw out overflow
- x - y and y > x iff x -y is negative doesn’t work because of overflow!
t = x - y
x < y iff (t is negative)^overflow (^ is xor)
2 conditional registers
- SF check sign
- OF check overflow
So, x < y iff SF^OF
x≤y → iff (t is negative)^(overflow)| t is zero
→ iff(SF ^ OF)|ZF
unsigned
x < y (unsigned) iff CF (carry bit)
There’s a good slide that’s helpful
Exception: leaq → does not set condition codes
Condition Codes
Explicit testing
Test
testq Src2, Src1
→ testq b, a → like computing a&b without setting destination
ZF set when a&b == 0
SF set when a&b < 0
OF and CF are set to 0
so, if ZF and SF aren’t set, you know a&b > 1
Compare Instruction
cmpq b, a → computing a-b without setting detination
CF set if carry out from most significant bit (unsigned comparisons)
ZF if a == b
SF set if (a-b) < 0
OF set if two’s complement overflow
How to Use Condition Codes?
cmpq %rsi, %rdi
SetX insutrctions
setl (set less than) → SF^OF
so let’s suppose 2-3
cmpq %rsi, %rdi
setl %al
movzbq %al, %rax #Zero rest of %rax so we just get %al value
ret
There’s a bunch of SetX instructions
Every line of code has a memory address and this is how CPU knows where to find it.
- so you can say — jump to this address
Procedures (functions)
CPU simply knows address of each instruction, not the notion of functions
Passing control
- To beginning of procedure code
- Back to return point
Passing data
- Procedure arguments
- Return value
Memory management
- Allocate during procedure execution
- Deallocate upon return
Passing Control
- To start function code, send to first line of function code
- uses callq → shows you where to go back
- retq sends you back to callq address
Stack in memory
- Stack
- Grows downward
- Caller’s stack frame
- Callee’s stack frame goes on bottom of stack (uses a different part of memory) → doesn’t change anything in caller’s stack frame
- There’s a border between them and callee stack frame can determine where caller’s stack frame ends
- Return address stored at boundary
- Stack pointer is %rsp (lowest stack address)
- pushq → pushes stack down 8 bytes and then saves value of register there
- popq → takes 8 bytes off and takes value from bottom of stack (so %rsp is pushed up)
- return address is very next line after call
- %rip = program counter → set to return address
Memory address of line after call is stored on stack so we know where to return to
- when return is called, return address is popped and %rip is set to equal it
Recursion
Grows stack function by function and between every two frames, there’s a return address
Passing Data
First 6 arguments: %rdi, %rsi, %rdx, %rcx, %r8, %r9
→ 7th and later are in memory (this is slow since you’re writing to and reading from memory)
Return value: %rax
Managing Local Data
If I have %rdx equivalent to something in one function, but I have a different value for %rdx in another function, what the fuck? What am I going to do? There’s physical constraints!
Conventions
- “Caller saved” registers → caller saves temporary values in its frame before the call
- “Callee saved” → callee saves temporry values in its frame before using them; callee restores them before returning to caller
Each register is of one of these types
All arguments are caller-saved registers
“push” → does not match to something in C
- push callee-saved registers and then pop it at the end
Advanced Datatypes
Arrays
Array of data type T and length L
- size is L * sizeof(T) bytes
Memory addressing mode to locate element in array
Multi-dimensional Arrays
A[R][C]
Array size R * C *sizeof(element)
Row-major ordering
- Assembly just represents it as a 1d array with R * C elements
To get address → start from beginning, multiply index of row * (columns * bytes) + index of column * bytes
Structs
Block of memory big enough to hold all of the fields
Fields ordered according to declaration
Compiler determines overall size and position of fields; machine-level program has no understanding of structures in source code
Data needs to be aligned — ints need to be at multiple of 4, double need to be at multiple of 8; so we may need gaps between parts of structs
Union allocation
Allocates according to largest element
Memory Hierarchy
CPUs are way faster than memory
Locality → temporal correlation in memory access