Quick Links
- Lecture 2 - Engineering Tests
- Lecture 3 - Coverage and Fuzzing
- Lecture 4 - Mutation Analysis
- Lecture 5 - Syntax and Semantics
- Lecture 6 - Automated Test Case Generation
- Lecture 7 - Symbolic Execution
Lecture 2 - Engineering Tests
Date: January 8, 2025
Why Tests?
- Tests provide assurance that code works.
- Automated tests allow for faster feedback compared to manual testing.
- Not writing tests incurs technical debt, making future maintenance harder.
- Code that lasts years without tests will accumulate bugs unnoticed.
- Key takeaway: If it matters that the code works, write a test for it.
Exploratory Testing
- Typically done by dedicated testers, but useful for developers as well.
- Exploratory testing = Learning + Test Design + Execution.
- Contrasted with scripted testing:
- Scripted testing follows predefined steps.
- Exploratory testing adapts based on findings.
- Best for:
- Rapid feedback on new features.
- Finding the most critical bug quickly.
- Evaluating risks before writing scripted tests.
Process:
- Define a charter (broad testing goal).
- Choose an area of the software to test.
- Design a test informally.
- Execute and log findings.
- Repeat as needed.
Regression Testing
Ensures that new changes don’t break existing functionality.
- Automated: Manual regression testing is inefficient.
- Optimally sized: Too few = missed bugs, too many = slow execution.
- Up-to-date: Must match the current version of the software.
Automating Regression Tests
- Input:
- File-based inputs simplify regression testing.
- UI-based tests often use event capture and replay.
- Output:
- Comparing screenshots (e.g., Mozilla's testing of Gecko engine) is problematic due to nondeterminism.
- A more reliable approach is logging-based verification.
Unit Testing
- Focuses on individual components (functions, classes, or modules).
- Must be fast to execute.
- Use mocks and stubs to isolate components.
Example Test Structure (NUnit - C#)
[Test]
public void GetMinimum_UnsortedArray_ReturnsSmallestValue() {
var arr = new int[] {7, 4, 9, 2, 5};
var minimum = Statistics.GetMinimum(arr);
Assert.AreEqual(2, minimum);
}
- Structure:
- Arrange - Set up test data.
- Act - Execute the function.
- Assert - Verify the output.
Flaky Tests
- Flaky tests fail inconsistently.
- Common causes:
- Improper waiting for async responses.
- Concurrency issues (race conditions, deadlocks).
- Test order dependencies (tests rely on prior tests).
- Fixes: Use explicit waits instead of
sleep()
, proper locking mechanisms, and independent test cases.
Test Doubles
- Used to isolate dependencies in unit tests.
- Types:
- Dummy objects: Only fill parameter lists, do nothing.
- Fakes: Implement real behavior but unsuitable for production.
- Stubs: Return predefined responses.
- Mocks: Verify method calls and behavior.
- Spies: Record interactions but wrap real objects.
Mocking Example (jMock - Java)
Mock warehouse = mock(Warehouse.class);
Mock mailer = mock(MailService.class);
order.setMailer((MailService) mailer.proxy());
mailer.expects(once()).method("send");
Key Takeaways
- Testing is essential for maintainable software.
- Automate everything: Regression tests should always be automated.
- Unit tests should be small and isolated.
- Flaky tests must be fixed to maintain reliability.
- Mock dependencies for more effective unit testing.
Lecture 3 - Coverage and Fuzzing
Date: January 13, 2025
Control-Flow Graph (CFG, very brief refer to notes for examples)
A CFG is a graphical representation of the flow of control in a program.
- Nodes: Represent zero or more statements.
- Edges: Represent possible transitions between statements.
CFG Construction:
- Lexing: Converts characters into tokens.
- Parsing: Converts tokens into a concrete syntax tree.
- Abstract Syntax Tree (AST): Cleans up the concrete syntax tree.
- CFG Generation: Converts the AST into a control-flow graph.
- Optimizations: Apply optimizations to the CFG.
- Code Generation: Convert the CFG into bytecode or machine code.
Basic Blocks
A sequence of instructions with one entry point and one exit point.
- Maximal Basic Blocks: Group statements that always execute together.
Example: In a loop, the body of the loop forms a basic block.
Statement & Branch Coverage
- Statement Coverage: Fraction of statements (nodes) executed by the test suite.
- Branch Coverage: Fraction of branches (edges) taken by the test suite.
Example: Consider the following Python code and test suite:
class Foo:
def m(self, a, b):
if a < 0 and b < 0:
return 4
elif a < 0 and b > 0:
return 3
elif a > 0 and b < 0:
return 2
elif a >= 0 and b >= 0:
return a / b
raise Exception("I didn't think things through")
And the test suite:
import unittest
from foo import Foo
class CoverageTests(unittest.TestCase):
def test_one(self):
f = Foo()
f.m(1, 2)
def test_two(self):
f = Foo()
f.m(1, -2)
def test_three(self):
f = Foo()
f.m(-1, 2)
- Coverage Report: The test suite achieves 100% statement coverage but only 75% branch coverage because it doesn't cover the case where both
a
andb
are negative.
Fuzzing
Fuzzing is a technique for finding software bugs by feeding random or semi-random inputs to a program.
Types of Fuzzing:
- Mutation-Based Fuzzing: Randomly modifies existing inputs.
- Generation-Based Fuzzing: Generates inputs based on a grammar or model.
Fuzzing Workflow
- Generate random inputs.
- Feed inputs to the program.
- Observe crashes, assertion failures, or runtime errors.
Levels of Fuzzing (for C Programs)
- Sequence of ASCII characters;
- Sequence of words, separators, and white space (gets past the lexer);
- Syntactically correct C program (gets past the parser);
- Type-correct C program (gets past the type checker);
- Statically conforming C program (starts to exercise optimizations);
- Dynamically conforming C program;
- Model conforming C program.
Fuzzing runs automatically and really works, but without significant work, it won’t find sophisticated domain-specific issues.
Key Takeaways
-
Code Coverage: Use statement and branch coverage to ensure that your tests are comprehensive and target critical paths in your code.
-
Fuzzing: Fuzzing is an effective technique for uncovering bugs, by feeding the program random or semi-random input.
-
Control Flow Graphs: Represent the flow of control in a program, and are used to calculate coverage metrics.
Lecture 4 - Mutation Analysis
Date: January 15, 2025
Mutation analysis helps determine if a test suite is effective by checking whether it detects small changes (mutations) in the code. These changes, called mutants, simulate real-world programming mistakes.
Concept
- A mutant is a modified version of the original program.
- A test suite is considered effective if it can kill mutants, meaning the output of the test differs when run on the mutant versus the original.
- Mutation analysis is essentially fuzzing the test suite to ensure it catches errors.
Example of Mutants
Original Code
int min(int a, int b) {
int minVal = a;
if (b < a) {
minVal = b;
}
return minVal;
}
Mutant Variations
minVal = b; // ∆ 1
if (b > a) { // ∆ 2
if (b < minVal) { // ∆ 3
BOMB(); // ∆ 4
minVal = a; // ∆ 5
minVal = failOnZero(b); // ∆ 6
- Test Case Example: A test suite with
(a=0, b=1, expected=0)
might detect some mutants but miss others. - Equivalent Mutants: Some mutants don’t change behavior, making them undetectable. (∆ 3)
Mutation Analysis Workflow
- Generate Mutants - Apply mutation operators (small code changes) to create mutants.
- Execute Mutants - Run the test suite on each mutant.
- Classify Mutants:
- Killed: The test suite detects the mutation.
- Survived: The test suite fails to detect the change.
- Equivalent: The mutant behaves identically to the original.
Generating Mutants
- Ground String: A valid program belonging to the programming language grammar.
- Mutation Operators: Rules that specify syntactic variations of strings generated
from a grammar:
- Arithmetic operator replacement (
+
→-
,*
→/
) - Logical connector replacement (
&&
→||
) - Relational operator replacement (
>
→>=
,<
→<=
) - Unary operator deletion (
!x
→x
) - Statement removal
- Arithmetic operator replacement (
- Mutant: The result of applying a mutation operator to a ground string.
- Mutation Score: The percentage of killed mutants over total mutants.
- Strong vs. Weak Mutation:
- Strong: The test must propagate the error to the output.
- Weak: The test must only cause an internal state change.
Let’s consider mutant ∆1 from above, i.e. we change minVal = a
to minVal = b
. In this case:
- reachability: unavoidable;
- infection: need
b != a
; - propagation: wrong minVal needs to return to the caller; that is, we can’t execute the body
of the if statement, so we need
b > a
.
A test case for strong mutation is therefore a = 5
, b = 7
and for weak mutation a = 7
, b = 5
.
Now consider mutant ∆3, which replaces b < a
with b < minVal
. This mutant is an equivalent
mutant, since a = minVal
. (The infection condition boils down to false
.)
Key Takeaways
- Mutation analysis evaluates the effectiveness of test suites.
- Good test suites kill a high percentage of mutants.
- Statement and branch coverage alone are weak indicators of test quality.
- Mutation testing is computationally expensive but provides deep insights into test suite strength.
Lecture 5 - Syntax and Semantics
Date: January 20, 2025
Note: This lecture is a pain in the ass. Consider skipping.
The WHILE Language
A simplified imperative language used for reasoning about code.
Example Program
{ p := 0; x := 1; n := 2 };
while x <= n do {
x := x + 1;
p := p + m
};
print_state
- Statements are separated by
;
(not terminated like in C). - Variables are implicitly declared.
- No side effects in expressions—effects occur only in statements.
Syntax Overview
Terminals (Leaf Nodes)
n ∈ Z
: integerstrue, false ∈ B
: Boolean valuesx, y ∈ L
: variable locations
Non-Terminals (Internal Nodes)
a ∈ Aexp
: arithmetic expressionsb ∈ Bexp
: Boolean expressionss ∈ Stmt
: statements
Grammar Rules
Arithmetic Expressions
e ::= n | x | -e | e1 aop e2 | (e)
aop ::= + | - | *
Boolean Expressions
b ::= true | false | not b | e1 rop e2 | b1 bop b2 | (b)
rop ::= < | <= | = | >= | >
bop ::= and | or
Statements
s ::= skip | x:=e | if b then s [else s] | while b do s | { slist } | print_state | assert b | assume b | havoc v1, ..., vN
slist ::= s (;s)*
prog ::= slist
Operational Semantics: Big-Step
Defines how program execution transforms a state.
A state q
is a mapping from variables to values.
Judgments are written as:
Expression e in state q evaluates to value n.
For statements, the notation is:
statement s executed in state q results in state q′.
Rules for Arithmetic Expressions (Aexp)
The semantics for arithmetic expressions in WHILE are defined using inference rules. Here are the rules:
1. Constants
- A constant evaluates to itself.
2. Variables
- A variable evaluates to its value in the current state .
3. Addition
- The sum of two expressions and evaluates to the sum of their values.
4. Subtraction
- The difference of two expressions and evaluates to the difference of their values.
5. Multiplication
- The product of two expressions and evaluates to the product of their values.
Rules for Boolean Expressions (Bexp)
Boolean expressions are built on top of arithmetic expressions. Here are the rules:
1. True and False
- The constants
true
andfalse
evaluate to themselves.
2. Equality
- The equality of two expressions and evaluates to whether their values are equal.
3. Less Than or Equal
- The less-than-or-equal comparison of two expressions and evaluates to whether is less than or equal to .
4. Logical AND
- The logical AND of two expressions and evaluates to the logical conjunction of their values.
5. Logical OR
- The logical OR of two expressions and evaluates to the logical disjunction of their values.
Rules for Statements
In big-step semantics, statements transform the state of the program. The notation is:
- : The statement to be executed.
- : The initial state.
- : The final state after executing .
Notation
- Empty state:
- Example state:
- Substitution: (yields a state like but with bound to ).
Rules for Evaluating Statements
Statements have side effects because they modify the state rather than directly returning a value. Here are the rules for evaluating statements in WHILE:
1. Skip Statement
- The
skip
statement does nothing. It simply returns the same state .
2. Print State
- The
print_state
statement does not modify the state but may have the side effect of printing the state.
3. Statement Composition
- The composition executes first, resulting in state , and then executes in state , resulting in the final state .
4. Assignment
- The assignment statement evaluates the expression to and updates the state so that is bound to .
5. If-Then-Else Statement
- If the condition evaluates to
true
, theif
statement executes and results in state .
- If the condition evaluates to
false
, theif
statement executes and results in state .
Example of Evaluating Statements
Consider the compound statement:
We can derive its execution as follows:
-
Evaluate in the empty state :
-
Evaluate in the state :
-
Evaluate in the state :
-
The final state after executing the compound statement is:
Semantics of Loops
The semantics of the while
statement are defined in two parts:
1. Loop Condition is False
- If the loop condition evaluates to
false
, thewhile
statement is equivalent toskip
and does not modify the state.
2. Loop Condition is True
- If the loop condition evaluates to
true
, the body is executed once, and then thewhile
loop is executed again in the new state.
Non-Terminating Loops
For infinite loops, we introduce a special state (read as "top") to represent divergence:
-
An infinite loop enters the state , and any statement executed in this state remains in :
-
But we don’t get any insight into what happens. Thats where small-step semantics come in to deal with programs that don’t terminate, but produce useful side effects while they are executing.
Deterministic Semantics
A semantics is deterministic if every program statement has at most one possible derivation in any state:
Two statements are semantically equivalent if they produce the same output state given the same input state:
For example, unrolling a while
loop yields a semantically equivalent statement:
Structural Induction
To prove a property on all possible derivation trees (e.g., for semantics), we use structural induction:
- Base case: Prove for all axioms.
- Inductive case: Assume holds for all smaller derivation trees and show that continues to hold when applying each possible rule once.
For example, we can prove that the semantics are deterministic using structural induction.
Operational Semantics: Small-Step
Describes intermediate execution steps instead of final results.
Judgments are written as:
- : The current program (or statement) to be executed.
- : The current state (or store) of the program.
- : The remaining program (or statement) after executing one step of .
- : The new state after executing one step of .
If there is no more program left to execute after , the notation simplifies to:
Essentially peeling off the first statement from s, which we execute, and we have a remaining program t to execute
Rules in Small-Step Semantics
The rules provided in the section describe how different constructs in the WHILE language are executed step-by-step:
1. Skip Statement
- The statement does nothing. It simply transitions to the same state with no further program to execute.
2. Sequential Composition
- If executing in state results in state , then the program (which means "execute followed by ") reduces to in state .
Alternatively, if reduces to another statement :
- Here, is not fully executed yet, so the remaining program is .
3. If-Then-Else Statement
- If the condition ) evaluates to
true
, then theif
statement reduces to (the "then" branch).
Similarly, if evaluates to false
:
- If the condition evaluates to
false
, theif
statement reduces to (the "else" branch).
4. While Loop
- The
while
loop is transformed into anif
statement. If the condition istrue
, the loop body is executed, followed by thewhile
loop again. If isfalse
, the loop reduces toskip
.
Example of Small-Step Execution
Consider the program:
In small-step semantics, the execution would proceed as follows:
-
The
while
loop is transformed into anif
statement: -
If is
true
, the program reduces to: -
The assignment is executed, updating the state , and the loop continues.
This step-by-step reduction continues until becomes false
, at which point the loop reduces to skip
.
Assertions, Assumptions, and Specification
assert
: Ensures a condition holds at a program point.assume
: Restricts program execution to cases where a condition is true.havoc
: Assigns arbitrary values to variables, modeling unknown inputs.
Example (Symbolic Execution)
havoc(x);
assume(x > 10);
y = x + 1;
assert(y > x);
assert(y < 200);
- Ensures
y > x
always holds but does not guaranteey < 200
.
Key Takeaways
- Big-Step Semantics: Describes how a program executes in a single step.
- Small-Step Semantics: Defines transitions between intermediate execution states.
- Assertions & Assumptions: Help model program correctness and symbolic execution.
Lecture 6 - Automated Test Case Generation
Date: No Clue
To generate interesting inputs:
- Black box - truly random
- Grey Box - guided by code coverage
- White box - guided by code structure
Properties for generic automatic oracles:
- Don't crash
- Dont mis-use memory
- User written assertions
- Domain specific oracles
Valgrind
Inpsects CPU instructions and inserts low-level checks. Execution of every instruciton is interpreted in a sandbox and reported if bad behaviour detected.
- Pros: Detailed analysis.
- Cons: 10x slower in performance.
Address Sanitizer
A runtime libary supported by Clang and GCC that provides compile-time instrumentation. Often used in production code.
- Major part of automated test-case generation validation.
Key Idea: Instrument all memory accesses.
- Compiler insturments store/load instructions with a check if memory is accesible.
- Must be super efficient!
- Meta information about memory is stored.
Example:
Original | Instrumented |
---|---|
|
|
Memory Mapping
Virtual memory has 2 classes:
- Mem is normal application memory.
- Shadow is used to store metadata about main memory.
Poisoning a byte addr
of Mem means writing a special value to
corresponding place in Shadow
Mem and Shadow should be organized such that mapping is super fast.
// instrument access to addr:
shadow_addr = MemToShadow(addr);
if (ShadowIsPoisoned(shadow_addr)) {
ReportError(addr, sz, kIsWrite);
}
Memory Alignment
Process memory is divided into 8 byte words, called QWORDs.
- Heap and Stack allocs are at a QWORD boundary.
- This is called alignment (# of bytes differs on architecture).
Depending on the architecture unaligned memory accesses are expensive.
- Compilers optimize the code so most accesses are aligned
Stae of an allocated QWORD
AddressSanitizer maps each QWORD of Mem into one byte of Shadow.
Each QWORD can be in one of 9 states:
- All 8 bytes are accessible (not poisoned). Shadow value is 0.
- All 8 bytes are inaccessible (poisoned). Shadow value is negative (< 0).
- First k bytes are accessible, the rest 8-k bytes are not, 0 < k < 8. Shadow is k.
Instumentation Example
byte *shadow_addr = addr >> 3 + 0x20000000;
byte shadow_value = *shadow_addr;
if (shadow_value < 0) ReportError(addr, sz, kIsWrite);
else if (shadow_value) {
if (SlowPathCheck(shadow_value, addr, sz)) {
ReportError(addr, sz, kIsWrite);
}
}
bool SlowPathCheck(shadow_value, addr, sz) {
last_accessed_byte = ((addr & 7) + (sz – 1)) & 7;
// returns if the acessed byte is poisoned
return (last_accessed_byte >= shadow_value);
}
Marking Allocation Boundaries with Redzones
Change heap allocator to mark boundaries of allocated segments
- The markers are called redzones
- All calls to
malloc()
are replaced with calls to__asan_malloc()
void *__asan_malloc(size_t sz) {
// Allocate & poison left redzone
void *rz = malloc(RED_SZ);
Poison(rz, RED_SZ);
//Allocate usable memory
void *addr = malloc(sz);
UnPoison(addr, sz);
// Allocate & poison right redzone
rz = malloc(RED_SZ);
Poison(rz, RED_SZ);
// Return usable memory
return addr;
}
Key Takeaways
- Address Sanitizer instruments memory accesses to detect errors.
- Memory Mapping and Memory Alignment are key concepts in Address Sanitizer.
- Redzones are used to mark allocation boundaries.
Lecture 7 - Symbolic Execution
Date: January 27, 2025
Symbolic execution is a technique used to automatically generate test suites that:
- Achieve full branch coverage (or even full path coverage).
- Identify dead code.
- Discover potential division by zero errors.
Steps in Symbolic Execution:
- Transform the program to add explicit tests for division by zero (oracles).
- Traverse the program to compute each program path.
- Solve constraints for each path using a constraint solver (e.g., z3).
- Run the program on tests generated by the previous step.
Example Program:
def Foo(x, y):
"""
requires: x and y are int
ensures: returns floor(max(x, y) / min(x, y))
"""
if x > y:
return x / y
else:
return y / x
Transformed Program:
def Foo(x, y):
"""
requires: x and y are int
ensures: returns floor(max(x, y) / min(x, y))
"""
if x > y:
assert y != 0
return x / y
else:
assert x != 0
return y / x
Main Components of Symbolic Execution
Traversing: automatically explores program paths by executing the program on symbolic input values.
- Forks execution at each branch and records branching conditions.
Solving Constraints: decides path feasibility, and generating test cases to get paths and to find bugs.
Example Paths:
- : Assertion fails.
- : Returns
x / y
. - : Assertion fails.
- : Returns
y / x
.
Symbolic Execution Example
See L07 pdf.
Defining Symbolic Execution
- Symbolic values: Stand in for input variables instead of using specific values.
- Path conditions: Constraints that must hold for a given execution path to be feasible.
- Symbolic state: Pair , where:
Env
maps program variables to symbolic expressions.pc
is a first-order logic formula.
Checking Symbolic State Satisfiability
A concrete state satisfies symbolic state if:
Example:
For concrete state [x → 10, y → 1]
, we check:
For [x → 1, y → 10]
, we get:
Another Symbolic State Example
Checking for [x → 10, y → 2]
:
For [x → 2, y → 10]
, solving gives X = -8, Y = 6
:
Finding Bugs Using Symbolic Execution
Finding a bug requires finding the conditions that trigger it. Bugs include:
- assertion failures
- buffer overflows
- division by zero
Assertions are rewritten as conditionals to create explicit error paths. For example:
assert(x != NULL);
becomes:
if (x == NULL) error();
This ensures that symbolic execution explores error paths and determines whether an error is reachable.
Instrumenting Code for Bug Detection
Symbolic execution can inject runtime checks, similar to sanitizers. Examples include:
y = 100 / x;
// becomes
assert(x != 0);
y = 100 / x;
a[x] = 10;
// becomes
assert(x >= 0 && x < len(a));
a[x] = 10;
Problems of Classical Symbolic Execution
Symbolic execution is powerful but has limitations:
-
Constraint Complexity: Some generated constraints may be too complex for SMT solvers to handle efficiently.
-
Path Explosion: The number of paths grows exponentially with program size, making full analysis infeasible.
-
Real-World Code Challenges: Programs interact with:
- Pointers and data structures
- Files, databases, and networks
- Threads and concurrency
To address these challenges, dynamic symbolic execution will be introduced in the next lecture.