CMSC 144 Notes

Computer Systems

What are computer applications?

  • C programs
  • But what do those programs run on?

Computer systems provide environment for programs to run on hardware

  • have multiple programs
  • Efficient sharing of hardware is a key functionality of computer systems

Moore’s law → transistor count increasing exponentially (doubles every two years) (has broken down since 2000)

Computer systems provide an abstraction of different hardware

What is a computer system?

graph TD
  Application <--> Physics

Application to physics → we need to bridge this gap through layers of abstraction

3 layers in the middle (that we’re going to focus on — there’s many more): OS, Instruction Set Architecture, Microarchitecture

OS: interfaces for controlling and coordinating the use of hardware among various applications and users

ISA: interface between hardware and software

Microarchitecture: implementation of ISA

  • key to performance and efficiency

Focus of course is key design principles, not their implementation

  • LogisticsDrop lowest HW score (it counts as an 100)

Lecture 0

Bits → 0 or 1

  • noise in communication information shouldn’t be enough to flip a bit

Byte → 8 bits (minimum addressable unit in memory)

  • can be written as binary, decimal, or hexademical
  • hex in c → 0x

Word size → continuous block bits but usually longer than a byte (natural data size a CPU can process)

  • for newer system, 64 bits
  • Shorter word size → less possible memory addresses
  • on a 32 bit machine 2^32 → 4gb of memory
  • on a 64 bit machine 2^48→ 256 terabytes (so we don’t have to use all 64)

Word size = memory address size

Address of each word is the address of its first byte

Pointer takes 4 bits on a 32 bit machine and 8 bytes on a 64 bit machine (since it’s pointing to the word)

Byte Ordering

How are bytes ordered in memory?

Big endian: least significant byte has the highest address

Little endian: least significant byte has the lowest address (most common) so for 0x01234567 is represented as 67 45 23 01

  • Why?
  • Most numbers that are nonzero are the smaller numbers so put them first
  • X-endian: least significant byte has the X-est address

Bit-Level Operations

If you’re checking if p is a pointer

p&&*p

Right shift:

Logical shift → fill with 0s

Arithmetic shift → replicate most significant bit

  • C uses arithmetic right shifts for signed data

Integers

X = [x_w-1, … , x0]

For signed numbers, the most significant bit is 0 (nonnegative) or 1 (negative), but it’s summed differently

e.g., 1111 = -1 for signed while it equals 15 for unsigned (so for a 4 bit number, if it’s negative, it’s just 8 - the last 3 bits

To get from max to min (max positive to min neg) just apply not

Conversion Casting

Casting does not change bit level encoding! Just changes interpretation

Constants

  • integers are considered signed integers unless explicitly defined as unsigned (capital U at end of value)

Implicit casting

unsigned int = int → casts int to unsigned int (vice versa will happen if int = unsigned int)

IF THERE IS A MIX OF UNSIGNED AND SIGNED IN A SINGLE EXPRESSION, SIGNED VALUES IMPLICITLY CAST TO SIGNED

Expanding and Truncating

Sign extension

When going from a short int to an int for example, copy the most significant bit k times (the number of bits you’re adding)

  • Casting keeps the bit vector, but extension keeps the value

Truncating

keep bits on right and remove bits on the left

Unsigned Addition

Ignores carry output (truncates) — we get a modular sum

Signed Addition

Negative overflow = true sum is below lowest possible value

Positive overflow = true sum is greater than lowest possible value

Machine-Level Programming I: Basics of Instruction Set Architecture

ISA — interface between software and hardware

Object = state + methods

  • state is data
  • methods are operations on data

These methods are the interface of an object

Programming Interface

Software programming interface: programming languages

void swap(long *xp, long*yp) {
//change value of two pointers
}

Computer compiles this C code into assembly code (Instruction Set Architecture)

  • On the left, you have methods and on the right you have states
swap:
	movq (%rdi), %rax
	//and more

Why is ISA important?

In early days of computers, there are incompatible lines of computers (new software for each hardware) → ISA enables for same software to work on all hardware

Tradeoff: software developers don’t worry about specs in hardware but need to rewrite ISA when you want to take advantage of new hardware (interface can limit innovation)

Architectural States

CPU

  • State is called registers

Memory

  • State is a big array with bytes

Memory Basics

  • One long array of bytes — smallest addressable unit in memory (you can’t interface with specific bits)
    • Each byte has a memory address
  • Word → nominal size of integer-valued data in a machine
    • Word is size of each memory address
    • Word is size of CPU register
    • When CPU talks to memory, it talks to memory in terms of words
    • Word size limits the memory size!
      • If a machine uses 8-bit words → 2^8 = 256 unique memory addresses (the max number of places the CPU can refer to)
      • 32 bit words → 2^32 unique addresses → 4GB (not big enough)
      • 64 bit words → 2^64… alot, 2^48 gives 256 terabytes

<aside> 💡 If you were to represent 234 in a short, what would it look?

</aside>

Multi-Byte Variables & Byte Ordering

Multi-Byte variable

Two conventions:

  • Big Endian: least significant bit has highest address
  • Little Endian: least significant bit has the lowest address (most common)
    • For example, if you are moduloing, you simple need that first address (sometimes)
    • Most frequently used part of the variable is that in the first address

Array memory layout

Little endian does not change the order of the elements in the array but it does change the byte order in the array

E.g.,

short a[2] = {1, 2}

Big Endian → 00 01 00 02

Little Endian → 02 00 01 00

Strings

char S[4] = "123";

Big Endian → 31 32 33 00

Little Endian → 31 32 33 00 (they’re all one byte so there’s no shift)

CPU

  • Registers: heavily used program data (temporary variables)
  • 16 general purpose registers
    • Each has a size of 8 bytes (64 bits — that’s why ISA is called x86-64)
    • there’s a lower 4 byte part to each register (that’s what the previous generation used)

Moving Data

movq$0x100, %rax

movq → move 8 bytes (q = 8 bytes) from somewhere to somewhere

%rax → destination

$0x100 → the value to be moved (in 8 byte format

Prefix $ indicates it’s a constant value

movq Source, Dest

Operand types

  • Register: one of 16 integer registers
  • Ex. movq %rax, %r13

Immediate: movq $0x100, %rax

  • value

Absolute: movq 0x100, %rax

  • 0x100 treated as memory address and copy value to %rax

Memory: movq (%rax), %r13

  • Use value in %rax aas a memory address and copy teh value from memory to %r13
  • There is no memory to memory (e.g., no movq (%rax), (%rdx) → this could be long so ew break it up, we can just do this with two instructions)

Machine-Level Programming II: Basics of ISA

x86-64 ISA (designed by intel)

  • 64 bit word → 64 bit memory addresses and 64 bit registers

Registers can be used as anything as long as it’s 8 bytes — how we interpret it is up to us (e.g., placing parentheses around the value or taking the value as it is)

Every byte in memory has a memory address — you read starting from an address and then read 8 bytes after that

How mov works byte by byte

rax has 0x123456789abcded (massive number)

rdi has 0x120 (fills everything to left with 0s)

Assume little endian

movq → moves 8 bytes!

movl → moves 4 bytes!

movq %rax, (%rdi)

0x120 ef
0x121 cd
0x127 01

movq (%rdi), %rdx → rdx will have ef in lowest byte etc.

What if you used movl?

you only get 89abcdef

Simple Memory Addressing Modes

  • So far, we have to figure out memory address and then put it in a cpu register and then use parentheses
  • Normal
    • movq (%rcx), %rax → copy Mem[Reg[rcx]] to rax
  • Displacement
    • movq 8(%rbp), %rdx → copy Mem[Reg[rbp] +8] to rdx
  • D(rb)
    • D(Rb, Ri, S)
      • D: constant displacement (D bytes)
      • Rb: base register
      • RI: index register (how many steps)
      • Scale: 1, 2, 4, 8 (bytes per step)

Example:

1(%rax, %rdi, 4)

  • Start at %rax, move 4*%rdi, add 1
  • 1(%rax, %rdi, 4) ⇒ %rax + 4*%rdi + 1
  • if %rax = 0x120, %rdi = 0x2 → the above equals 0x129

Special cases on slides

Address Computation Instruction (leaq)

leaq Src, Dst (lea → load effective address) → just pointer arithmetic

  • copies memory address, not value at that memory address

mov → pointer arithmetic and dereference value at that pointer

uses

int arr[] = [1, 2, 3, 4, 5]
return *p &arr[i]

%rdi has arr addr

%rsi has i

leaq(%rdi, %rsi, 4), %rax

Why 4? → int has 4 bytes

Arithmetic calculation

long m3(long x)
{
	return x*3;
}

leaq(%rdi, %rdi, 2), %rax (Takes value and addits to 2 more times)

Two and one operand instructions on slides for HW

  • there are two right shifts
  • left shift can be used for multiplication (e.g., mult by 16 → shift 4 to the left)

%rax holds the return value!

Getting Assembly Code

clang -S -o [outputfile.s] [inputfile.c]
  • does it line by line

Machine Level Programming Part 3

  • So far, we’ve seen examples run linearly
  • But what if has an if/else statement or a loop?

How can we use assembly to implement non-linear code?

ISA provides cmp, set, and jmp

  • cmpq → compares two values
  • jmp → jump to a location in assembly

cmpq

  • Why not just do subtraction and look at that result?
    • Changes value of one of the registers (could be good in certain instance)
  • Needs a side effect to track result
    • Condition codes! CF ZF SF OF (1 bit)

Two’s complement representation

Leading bit indicates sign

  • if 1 → interpret as -2^n

Import properties

  • values can’t exceed a range (this is overflow)
  • -y is ~y+1

cmp

Consider x < y (signed)

How to compare on a bit level?

  • if doing arithmetic, throw out overflow
  • x - y and y > x iff x -y is negative doesn’t work because of overflow!

t = x - y

x < y iff (t is negative)^overflow (^ is xor)

2 conditional registers

  • SF check sign
  • OF check overflow

So, x < y iff SF^OF

x≤y → iff (t is negative)^(overflow)| t is zero

→ iff(SF ^ OF)|ZF

unsigned

x < y (unsigned) iff CF (carry bit)

There’s a good slide that’s helpful

Exception: leaq → does not set condition codes

Condition Codes

Explicit testing

Test

testq Src2, Src1

→ testq b, a → like computing a&b without setting destination

ZF set when a&b == 0

SF set when a&b < 0

OF and CF are set to 0

so, if ZF and SF aren’t set, you know a&b > 1

Compare Instruction

cmpq b, a → computing a-b without setting detination

CF set if carry out from most significant bit (unsigned comparisons)

ZF if a == b

SF set if (a-b) < 0

OF set if two’s complement overflow

How to Use Condition Codes?

cmpq %rsi, %rdi

SetX insutrctions

setl (set less than) → SF^OF

so let’s suppose 2-3

cmpq %rsi, %rdi
setl %al
movzbq %al, %rax #Zero rest of %rax so we just get %al value
ret

There’s a bunch of SetX instructions

Every line of code has a memory address and this is how CPU knows where to find it.

  • so you can say — jump to this address

Procedures (functions)

CPU simply knows address of each instruction, not the notion of functions

Passing control

  • To beginning of procedure code
  • Back to return point

Passing data

  • Procedure arguments
  • Return value

Memory management

  • Allocate during procedure execution
  • Deallocate upon return

Passing Control

  • To start function code, send to first line of function code
    • uses callq → shows you where to go back
    • retq sends you back to callq address

Stack in memory

  • Stack
    • Grows downward
    • Caller’s stack frame
    • Callee’s stack frame goes on bottom of stack (uses a different part of memory) → doesn’t change anything in caller’s stack frame
    • There’s a border between them and callee stack frame can determine where caller’s stack frame ends
      • Return address stored at boundary
  • Stack pointer is %rsp (lowest stack address)
    • pushq → pushes stack down 8 bytes and then saves value of register there
    • popq → takes 8 bytes off and takes value from bottom of stack (so %rsp is pushed up)
  • return address is very next line after call
  • %rip = program counter → set to return address

Memory address of line after call is stored on stack so we know where to return to

  • when return is called, return address is popped and %rip is set to equal it

Recursion

Grows stack function by function and between every two frames, there’s a return address

Passing Data

First 6 arguments: %rdi, %rsi, %rdx, %rcx, %r8, %r9

→ 7th and later are in memory (this is slow since you’re writing to and reading from memory)

Return value: %rax

Managing Local Data

If I have %rdx equivalent to something in one function, but I have a different value for %rdx in another function, what the fuck? What am I going to do? There’s physical constraints!

Conventions

  • “Caller saved” registers → caller saves temporary values in its frame before the call
  • “Callee saved” → callee saves temporry values in its frame before using them; callee restores them before returning to caller

Each register is of one of these types

All arguments are caller-saved registers

“push” → does not match to something in C

  • push callee-saved registers and then pop it at the end

Advanced Datatypes

Arrays

Array of data type T and length L

  • size is L * sizeof(T) bytes

Memory addressing mode to locate element in array

Multi-dimensional Arrays

A[R][C]

Array size R * C *sizeof(element)

Row-major ordering

  • Assembly just represents it as a 1d array with R * C elements

To get address → start from beginning, multiply index of row * (columns * bytes) + index of column * bytes

Structs

Block of memory big enough to hold all of the fields

Fields ordered according to declaration

Compiler determines overall size and position of fields; machine-level program has no understanding of structures in source code

Data needs to be aligned — ints need to be at multiple of 4, double need to be at multiple of 8; so we may need gaps between parts of structs

Union allocation

Allocates according to largest element

Memory Hierarchy

CPUs are way faster than memory

Locality → temporal correlation in memory access