Static Program Analysis - Aarhus Universitet

8m ago
51 Views
0 Downloads
1.18 MB
176 Pages
Transcription

Static Program AnalysisAnders Møller and Michael I. SchwartzbachNovember 30, 2020

Copyright 2008–2020 Anders Møller and Michael I. SchwartzbachDepartment of Computer ScienceAarhus University, DenmarkThis work is licensed under the Creative Commons Attribution-NonCommercialNoDerivatives 4.0 International License. To view a copy of this license, 4.0/.

ContentsPrefaceiii1Introduction1.1 Applications of Static Program Analysis . . . . . . . . . . . . . .1.2 Approximative Answers . . . . . . . . . . . . . . . . . . . . . . .1.3 Undecidability of Program Correctness . . . . . . . . . . . . . .2A Tiny Imperative Programming Language2.1 The Syntax of TIP . . . . . . . . . . . . .2.2 Example Programs . . . . . . . . . . . .2.3 Normalization . . . . . . . . . . . . . . .2.4 Abstract Syntax Trees . . . . . . . . . . .2.5 Control Flow Graphs . . . . . . . . . . .9913141516Type Analysis3.1 Types . . . . . . . . . . . . . . . . . .3.2 Type Constraints . . . . . . . . . . .3.3 Solving Constraints with Unification3.4 Record Types . . . . . . . . . . . . . .3.5 Limitations of the Type Analysis . .192022252933.3737384043Dataflow Analysis with Monotone Frameworks5.1 Sign Analysis, Revisited . . . . . . . . . . . . . . . . . . . . . . .5.2 Constant Propagation Analysis . . . . . . . . . . . . . . . . . . .5.3 Fixed-Point Algorithms . . . . . . . . . . . . . . . . . . . . . . . .51525758345.Lattice Theory4.1 Motivating Example: Sign Analysis . . . .4.2 Lattices . . . . . . . . . . . . . . . . . . . .4.3 Constructing Lattices . . . . . . . . . . . .4.4 Equations, Monotonicity, and Fixed Pointsi.1136

dening6.1 Interval Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . .6.2 Widening and Narrowing . . . . . . . . . . . . . . . . . . . . . .7979827Path Sensitivity and Relational Analysis7.1 Control Sensitivity using Assertions . . . . . . . . . . . . . . . .7.2 Paths and Relations . . . . . . . . . . . . . . . . . . . . . . . . . .8990918Interprocedural Analysis8.1 Interprocedural Control Flow Graphs . . . . . .8.2 Context Sensitivity . . . . . . . . . . . . . . . . .8.3 Context Sensitivity with Call Strings . . . . . . .8.4 Context Sensitivity with the Functional Approach.99. 99. 103. 104. 107Control Flow Analysis9.1 Closure Analysis for the λ-calculus . . . . .9.2 The Cubic Algorithm . . . . . . . . . . . . .9.3 TIP with First-Class Function . . . . . . . .9.4 Control Flow in Object Oriented Languages9Live Variables Analysis . . . . . . . .Available Expressions Analysis . . .Very Busy Expressions Analysis . . .Reaching Definitions Analysis . . . .Forward, Backward, May, and MustInitialized Variables Analysis . . . .Transfer Functions . . . . . . . . . .111. 111. 114. 115. 11910 Pointer Analysis10.1 Allocation-Site Abstraction . . . .10.2 Andersen’s Algorithm . . . . . .10.3 Steensgaard’s Algorithm . . . . .10.4 Interprocedural Pointer Analysis10.5 Null Pointer Analysis . . . . . . .10.6 Flow-Sensitive Pointer Analysis .10.7 Escape Analysis . . . . . . . . . .12112112212612812913113311 Abstract Interpretation11.1 A Collecting Semantics for TIP11.2 Abstraction and Concretization11.3 Soundness . . . . . . . . . . . .11.4 Optimality . . . . . . . . . . . .11.5 Completeness . . . . . . . . . .11.6 Trace Semantics . . . . . . . . .135135141148154156162Bibliography.167

PrefaceStatic program analysis is the art of reasoning about the behavior of computerprograms without actually running them. This is useful not only in optimizingcompilers for producing efficient code but also for automatic error detectionand other tools that can help programmers. A static program analyzer is a program that reasons about the behavior of other programs. For anyone interestedin programming, what can be more fun than writing programs that analyzeprograms?As known from Turing and Rice, all nontrivial properties of the behaviorof programs written in common programming languages are mathematicallyundecidable. This means that automated reasoning of software generally mustinvolve approximation. It is also well known that testing, i.e. concretely runningprograms and inspecting the output, may reveal errors but generally cannotshow their absence. In contrast, static program analysis can – with the right kindof approximations – check all possible executions of the programs and provideguarantees about their properties. One of the key challenges when developingsuch analyses is how to ensure high precision and efficiency to be practicallyuseful. For example, nobody will use an analysis designed for bug finding ifit reports many false positives or if it is too slow to fit into real-world softwaredevelopment processes.These notes present principles and applications of static analysis of programs. We cover basic type analysis, lattice theory, control flow graphs, dataflowanalysis, fixed-point algorithms, widening and narrowing, path sensitivity, relational analysis, interprocedural analysis, context sensitivity, control-flow analysis, several flavors of pointer analysis, and key concepts of semantics-basedabstract interpretation. A tiny imperative programming language with pointers and first-class functions is subjected to numerous different static analysesillustrating the techniques that are presented.We take a constraint-based approach to static analysis where suitable constraintsystems conceptually divide the analysis task into a front-end that generatesconstraints from program code and a back-end that solves the constraints toproduce the analysis results. This approach enables separating the analysisiii

ivPrefacespecification, which determines its precision, from the algorithmic aspects thatare important for its performance. In practice when implementing analyses, weoften solve the constraints on-the-fly, as they are generated, without representingthem explicitly.We focus on analyses that are fully automatic (i.e., not involving programmer guidance, for example in the form of loop invariants or type annotations)and conservative (sound but incomplete), and we only consider Turing complete languages (like most programming languages used in ordinary softwaredevelopment).The analyses that we cover are expressed using different kinds of constraintsystems, each with their own constraint solvers: term unification constraints, with an almost-linear union-find algorithm, conditional subset constraints, with a cubic-time algorithm, and monotone constraints over lattices, with variations of fixed-point solvers.The style of presentation is intended to be precise but not overly formal.The readers are assumed to be familiar with advanced programming languageconcepts and the basics of compiler construction and computability theory.The notes are accompanied by a web site that provides lecture slides, animplementation (in Scala) of most of the algorithms we cover, and additionalexercises:https://cs.au.dk/ amoeller/spa/

Chapter 1IntroductionStatic program analysis aims to automatically answer questions about the possible behaviors of programs. In this chapter, we explain why this can be usefuland interesting, and we discuss the basic characteristics of analysis tools.1.1Applications of Static Program AnalysisStatic program analysis has been used since the early 1960’s in optimizing compilers. More recently, it has proven useful also for bug finding and verificationtools, and in IDEs to support program development. In the following, we givesome examples of the kinds of questions about program behavior that arise inthese different applications.Analysis for program optimization Optimizing compilers (including just-intime compilers in interpreters) need to know many different properties of theprogram being compiled, in order to generate efficient code. A few examples ofsuch properties are: Does the program contain dead code, or more specifically, is function funreachable from main? If so, the code size can be reduced. Is the value of some expression inside a loop the same in every iteration?If so, the expression can be moved outside the loop to avoid redundantcomputations. Does the value of variable x depend on the program input? If not, it couldbe precomputed at compile time. What are the lower and upper bounds of the integer variable x? The answermay guide the choice of runtime representation of the variable. Do p and q point to disjoint data structures in memory? That may enableparallel processing.

21 INTRODUCTIONAnalysis for program correctness The most successful analysis tools that havebeen designed to detect errors (or verify absence of errors) target generic correctness properties that apply to most or all programs written in specific programming languages. In unsafe languages like C, such errors sometimes lead tocritical security vulnerabilities. In more safe languages like Java, such errors aretypically less severe, but they can still cause program crashes. Examples of suchproperties are: Does there exist an input that leads to a null pointer dereference, divisionby-zero, or arithmetic overflow? Are all variables initialized before they are read? Are arrays always accessed within their bounds? Can there be dangling references, i.e., use of pointers to memory that hasbeen freed? Does the program terminate on every input? Even in reactive systems suchas operating systems, the individual software components, for exampledevice driver routines, are expected to always terminate.Other correctness properties depend on specifications provided by the programmer for the individual programs (or libraries), for example: Are all assertions guaranteed to succeed? Assertions express programspecific correctness properties that are supposed to hold in all executions. Is function hasNext always called before function next, and is open alwayscalled before read? Many libraries have such so-called typestate correctnessproperties. Does the program throw an ActivityNotFoundException or aSQLiteException for some input?With web and mobile software, information flow correctness properties havebecome extremely important: Can input values from untrusted users flow unchecked to file systemoperations? This would be a violation of integrity. Can secret information become publicly observable? Such situations areviolations of confidentiality.The increased use of concurrency (parallel or distributed computing) and eventdriven execution models gives rise to more questions about program behavior: Are data races possible? Many errors in multi-threaded programs are causeby two threads using a shared resource without proper synchronization. Can the program (or parts of the program) deadlock? This is often aconcern for multi-threaded programs that use locks for synchronization.

1.2 APPROXIMATIVE ANSWERS3Analysis for program development Modern IDEs perform various kinds ofprogram analysis to support debugging, refactoring, and program understanding. This involves questions, such as: Which functions may possibly be called on line 117, or conversely, wherecan function f possibly be called from? Function inlining and other refactorings rely on such information. At which program points could x be assigned its current value? Can thevalue of variable x affect the value of variable y? Such questions often arisewhen programmers are trying to understand large codebases and duringdebugging when investigating why a certain bug appears. What types of values can variable x have? This kind of question oftenarises with programming languages where type annotations are optionalor entirely absent, for example OCaml, JavaScript, or Python.1.2Approximative AnswersRegarding correctness, programmers routinely use testing to gain confidencethat their programs work as intended, but as famously stated by Dijkstra [Dij70]:“Program testing can be used to show the presence of bugs, but never to show theirabsence.” Ideally we want guarantees about what our programs may do for allpossible inputs, and we want these guarantees to be provided automatically, thatis, by programs. A program analyzer is such a program that takes other programsas input and decides whether or not they have a certain property.Reasoning about the behavior of programs can be extremely difficult, evenfor small programs. As an example, does the following program code terminateon every integer input n (assuming arbitrary-precision integers)?while (n 1) {if (n % 2 0) // if n is even, divide it by twon n / 2;else // if n is odd, multiply by three and add onen 3 * n 1;}In 1937, Collatz conjectured that the answer is “yes”. As of 2017, the conjecturehas been checked for all inputs up to 87 · 260 , but nobody has been able to proveit for all inputs [Roo19].Even straight-line programs can be difficult to reason about. Does the following program output true for some integer inputs?x input; y input; z input;output x*x*x y*y*y z*z*z 42;

41 INTRODUCTIONThis was an open problem since 1954 until 2019 when the answer was foundafter over a million hours of computing [BS19].Rice’s theorem [Ric53] is a general result from 1953 which informally statesthat all interesting questions about the input/output behavior of programs(written in Turing-complete programming languages1 ) are undecidable. This iseasily seen