tokstyle-0.0.9: TokTok C code style checker
Safe HaskellNone
LanguageHaskell2010

Tokstyle.Analysis.PointsTo

Description

This module implements an inter-procedural, context-sensitive, summary-based points-to analysis for C code.

The analysis determines the set of abstract memory locations that each pointer variable could point to at any given program point. This is a foundational analysis required for many subsequent static analyses, such as taint tracking.

The core algorithm works as follows: 1. **AST Traversal**: It first traverses the Abstract Syntax Tree (AST) of the entire program to find all function definitions and declarations. 2. **Worklist Algorithm**: It uses a worklist algorithm to iteratively analyze functions until a fixed point is reached. The worklist contains pairs of (FunctionName, Context), ensuring that functions are re-analyzed if their calling context changes. 3. **Intra-procedural Analysis**: For each function, it performs a standard forward dataflow analysis over its Control Flow Graph (CFG). The state (or "facts") at each point is a PointsToMap, which maps abstract locations to the set of locations they point to. 4. **Function Summaries**: After analyzing a function, it generates a PointsToSummary. This summary captures the function's effect on pointers, including what it returns and what side effects it has on pointers passed as arguments. This is the key to scalability, as it avoids re-analyzing a function's body at every call site. 5. **Context Sensitivity**: The analysis is context-sensitive, meaning it can distinguish between different call sites of the same function. The Context is a list of node IDs (hashes) representing the call stack. This allows for more precise summaries and handling of recursion. 6. **Fixed-Point Iteration**: The worklist algorithm continues until the summaries for all functions stabilize (i.e., no longer change). When a function's summary changes, all of its callers are added back to the worklist to be re-analyzed with the updated information. This process guarantees that the analysis correctly handles complex interactions, including mutual recursion.

Synopsis

Documentation

type PointsToMap = Map AbstractLocation (Set AbstractLocation) Source #

The PointsToMap is the data flow fact. It maps a pointer's abstract location to the set of abstract locations it can point to.

data PointsToContext l Source #

The context for the inter-procedural analysis. This data structure holds all the global information needed while analyzing the program.

Constructors

PointsToContext 

Fields

  • ptcCallGraph :: CallGraph

    The global call graph of the program. This is initially built from direct calls and is updated dynamically as function pointers are resolved.

  • ptcSummaries :: Map FunctionName PointsToSummary

    The current, evolving map of summaries for all functions. This is the central piece of state for the summary-based analysis. It's a map from function names to another map from calling contexts to the summary for that specific context.

  • ptcFuncDefs :: FunctionDefs

    A map from function names to their AST definitions. Used to find the code for a function when it needs to be analyzed.

  • ptcFuncDecls :: FunctionDefs

    A map from function names to their AST declarations (or definitions). Used to check if an identifier is a function.

  • ptcStructDefs :: Map Text (Node (Lexeme l))

    A map from struct/union names to their AST definitions.

  • ptcVarTypes :: Map Text (Node (Lexeme l))

    A map from variable names to their type AST nodes for the current scope.

  • ptcCurrentContext :: Context

    The current call stack context, used for context-sensitive analysis.

  • ptcDynamicCallGraph :: Map (FunctionName, Context) (Set (FunctionName, Context))

    A dynamically constructed call graph that tracks calls discovered during the analysis (e.g., through function pointers). This is more precise than the initial, static call graph.

  • ptcAnalyzedCfgs :: Map (FunctionName, Context) (CFG Text PointsToState)

    A cache of the Control Flow Graphs (CFGs) that have been analyzed. This avoids rebuilding the CFG for the same function and context pair.

  • ptcLocalVars :: Set Text

    The set of local variables (including parameters) for the function currently being analyzed. This is crucial for distinguishing between local and global variables.

  • ptcFileMacros :: MacroDefinitionMap
     

Instances

Instances details
DataFlow PointsToContext Text PointsToState Source #

The PointsTo analysis is an instance of the generic DataFlow framework.

Instance details

Defined in Tokstyle.Analysis.PointsTo

data PointsToState Source #

The state for the points-to analysis, including the points-to map and the set of currently defined macros.

type PointsToSummary = Map Context PointsToSummaryData Source #

The full, context-sensitive points-to summary for a function.

type MacroDefinitionMap = Map FunctionName (Node (Lexeme Text)) Source #

A map from a macro's name to its AST definition node.

buildPointsToContext :: [(FilePath, [Node (Lexeme Text)])] -> CallGraph -> Map FunctionName PointsToSummary -> PointsToContext Text Source #

The main entry point for the inter-procedural points-to analysis. It sets up the initial context and kicks off the fixed-point iteration.

analyzeFunctionWithSummaries :: PointsToContext Text -> FunctionName -> PointsToMap Source #

Analyze a single function using the pre-computed summaries to get the final points-to map at its exit points. This is useful for debugging or for getting the final state of a specific function like main.

toAbstractLocation :: HasCallStack => Node (Lexeme Text) -> AbstractLocation Source #

Helper to convert an LHS expression AST node to an AbstractLocation.

analyzeStatementForPointers :: MacroDefinitionMap -> PointsToContext Text -> Text -> PointsToMap -> Node (Lexeme Text) -> (PointsToMap, Set (FunctionName, Context)) Source #

The transfer function for a single statement. It takes the current points-to state and a statement, and returns the new state and a set of new functions that need to be added to the worklist (due to being called).

fixpointSummaries :: PointsToContext Text -> Worklist (FunctionName, Context) -> PointsToContext Text Source #

The fixed-point iteration loop for computing function summaries. This is the heart of the inter-procedural analysis.

evalPointsToSet :: MacroDefinitionMap -> PointsToContext Text -> Set Text -> FunctionName -> PointsToMap -> Node (Lexeme Text) -> (Set AbstractLocation, Set (FunctionName, Context)) Source #

Evaluates an expression to determine the set of abstract locations it points to. This is a key part of the transfer function, used to resolve the RHS of assignments and the targets of function calls.

transferPointsToState :: PointsToContext Text -> FunctionName -> PointsToState -> Node (Lexeme Text) -> (PointsToState, Set (FunctionName, Context)) Source #

The new top-level transfer function. It inspects the current statement and either updates the macro map or delegates to the existing points-to logic.