cap

HPC workflow that automates the tedious actions of compiling, analyzing, and parsing with bincfg

https://github.com/llnl/cap

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary

Keywords

data-analysis hpc python workflows

Last synced: 6 months ago · JSON representation

Repository

HPC workflow that automates the tedious actions of compiling, analyzing, and parsing with bincfg

Basic Info

Host: GitHub
Owner: LLNL
License: mit
Language: Python
Default Branch: main
Homepage:
Size: 195 KB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Topics

data-analysis hpc python workflows

Created over 1 year ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

Compile. Analyze. Prepare.

CAP (Compile. Analyze. Prepare.) is a python tool to help automate much of the tedious compile -> analyze -> preprocess pipeline required when performing large-scale binary analysis.

The /cap directory contains the main launching script used to start a CAP process.

There are many ways to run it: single threaded, multi-threaded, multi-node HPC. This can compile competition-like data (EG: codeforces, aizu), miscellaneous folders/projects, and can simply analyze precompiled binaries. It is made to be (semi-) easily extendable to new compilers, languages, compiler flags, analyzers, and preprocessing steps. Information on how to extend this code to add new features can be found in the 'Extend Me' section.

This project is well integrated with the CFG-handling tool BinCFG: https://github.com/LLNL/BinCFG

Setup

This section will describe how to set up your data, directories, and configuration files to run CAP.

Directories And Paths

There are a number of directories/paths needed to run CAP. To make things easier, you can pass a single directory with the '--defaultdir' flag to specify the default directory, within which it is assumed we will find most of the needed directories which are not otherwise explicitly passed on the command line (IE: any directories passed will override their expected location in the 'defaultdir').

The directories/paths needed are -

* '--atomic_data_dir' (default: "[default_dir]/atomic_data", automatically created): 
  directory to output atomically-updated data shared across multiple threads/processes/nodes
  (EG: normalized assembly tokens). It is expected that, if data is meant to be shared, then
  all processes that are running concurrently have access to this same folder within the 
  same filesystem. See the 'OUTPUT_DATA' section for more info on what will appear here.

* '--containers_dir' (default: "[default_dir]/containers", should already exist): directory 
  containing all file-based containers that are needed to execute compilations and analysis. 
  This is needed for container programs like singularity, but would not be needed if using 
  something like docker. See the 'Containers' subsection for info on what containers should 
  reside here and how they should be built

* '--logs_dir' (default: "[default_dir]/logs", automatically created): directory for logs

* '--output_dir' (default: "[default_dir]/output", automatically created): directory for 
  output data. See 'OUTPUT_DATA' section for more info on what outputs are generated

* '--input_path' (default: "[default_dir]/input", should be passed and already exist): 
  file/directory containing the raw data that should be CAP-ed. It must exist before 
  execution. Defaults to assuming the path at "[default_dir]/input" is a file/directory 
  containing data to CAP 

* '--temp_dir' (default: "[default_dir]/temp", automatically created if nonexistent): 
  directory within which subdirectories will be created to house temporary scratch files 
  created during the CAP process. While I make every effort to ensure all temporary files 
  that are created are eventually deleted, this is not 100% ensured, especially if execution
  is interrupted due to exceptions, signals, etc. It is recommended to override the default
  directory for this temp_dir and set it to a node's temporary scratch directory when using
  HPC (these are often emptied between jobs on HPC)

* '--partitioned_info_path' (default: "[input_path]/[exec_uid].parquet", should exist before 
  execution if using): path to a file containing the info for a 'partitioned' data CAP 
  process. Only used if performing the 'partitioned' CAP process. Defaults to a file within
  the input_path directory whose name is the execution uid of the current CAP process. See 
  the 'Input Data' subsection for more info.

* '--partitioned_dir' (default: "[input_path]/partitioned", should exist before execution 
  if using): path to a folder containing all of the partitioned data for a 'partitioned' 
  data CAP process. Only used if performing the 'partitioned' CAP process. See the 
  'Input Data' subsection for more info on how partitioned files should be structured

* '--container_info_path' (default: "[cap]/container_info.yaml", should already exist): path 
  to a YAML file containing information and configurations about containers used in the CAP 
  process. If not passed, then this will use the default "container_info.yaml" file shipped 
  along with this code expected to be right next to the 'CAP.py' file. See the 
  'Configuration Files' and 'Containers' subsections for info about using the default 
  containers shipped with this code, and the 'Extend Me' section for how this file is 
  structured and how to extend it to accomodate new containers.

* '--execution_info_path' (default: "[cap]/execution_info.py", must exist before execution):
  path to a python file containing information about this CAP execution such as the 
  execution_uid, the data to keep, the normalizers/analyzers to use, etc. This file will be 
  imported, then all of the needed global constants will be imported. See the 
  'Configuration Files' section for more info on what global constants should be present

Within any of these paths, you may insert various strings that will be replaced when resolving the final paths:

* "[default_dir]": replaces with the full path to the default directory, without a '/' at 
  the end. Cannot exist within the '--default_dir' directory path
* "[input_path]": replaces with the full path to the input directory, without a '/' at the 
  end. Cannot exist within the '--input_path' or '--default_dir' path
* "[main]" or "[cap]": replaces with the full path to the directory containing the "CAP.py" 
  file, without a '/' at the end
* "[exec_uid]": replaces with the execution uid being used for this CAP execution

NOTE: if you do not pass a '--default_dir' value, then it is expected that every one of these file paths which depend on that default path are also passed, even if they are not eventually used.

Configuration Files

There are a couple of configurations needed before running:

* container_info - YAML file containing information on containers being used including those
  for compilers and analyzers. By default, CAP will use the "container_info.yaml" file 
  shipped with this code, which should reside right next to 'CAP.py'. It should contain 
  everything needed for GCC and JavaC compilers and Rose analyzer. Information on how this 
  file is structure and how new containers can be added can be found in the 'Extend Me'
  section

* execution_info - Information about the current CAP execution. By default, this resides 
  within the 'execution_info.py' file located next to the 'CAP.py' file. It exists as a 
  python file as it has been useful to make use of python code to generate some of this 
  information (EG: enumerating many compile_methods in for loops, getting IDE hints,
  etc.). I may want to find a better way of creating this configuration in the future...

  The execution info file must contain the global variables:

    - EXECUTION_UID (str): a unique string identifier of the current execution. This should 
      be the same between multiple threads/nodes working on this current execution. However,
      multiple executions could be run on the same folders or data at the same time without 
      clobbering each other so long as you change this EXECUTION_UID. This value will be 
      used in the naming of a lot of output files/logs and whatnot

  It may optionally contain the following variables. If not present, they will be set to 
  their defaults:

    - POSTPROCESSING (Optional[Union[str, List[str]]], default=[]): string or list of 
      strings for the postprocessings to apply to analyzer outputs, or None to not apply
      any. Available strings:

      * 'cfg': build a CFG() object, one for each of the normalizers in 
        exec_info['normalizers']
      * 'memcfg': build a MemCFG() object, one for each of the normalizers in 
        exec_info['normalizers']
      * 'stats': build a CFG() object and get the graph statistics with 
        cfg.get_compressed_stats(), one for each of the normalizers in 
        exec_info['normalizers']

      NOTE: these will be stored as pickled bytes() objects

    - DROP_COLUMNS (Optional[Union[str, List[str]]], default=[]): by default, all of the 
      data generated is kept. This if not None, can be a string or list of strings of the 
      column or group of columns to drop. You may also pass any columns that would appear 
      in the metadata, and those will be dropped. Metadata columns to drop can be passed 
      either as their original name, or with the prefix 'meta_' as they would appear in the 
      output data. Any columns that do not correspond to data being kept will raise an 
      error, unless they start with the prefix 'meta_', in which case it is assume that that
      column is a possible metadata column which may or may not exist. Available 
      non-metadata columns to drop: 

      'analyzer', 'binaries', 'analyzer_output', 'metadata', 'error', 'compile_stdout', 
      'compile_stderr', 'analyzer_stdout', 'analyzer_stderr', 'compile_time', 
      'analysis_time', 'language_family', 'compiler_family', 'compiler', 'compiler_version',
      'architecture', 'flags'

      There are also a couple of special strings that will drop groups of columns including:

        * 'metadata': drop any metadata that was passed in metadata dictionaries
        * 'compile_info': drop all the compilation info
        * 'timings': drop all the timing information ('compile_time', 'analysis_time')
        * 'stdio': drop all the stdio information ('compile_stderr', 'analyzer_stdout', etc.)
        * 'stdout': drop all the stdout information ('compile_stdout', 'analyzer_stdout')
        * 'stderr': drop all the stderr information ('compile_stderr', 'analyzer_stderr')

      See the README.md for what all of these columns are.

    - NORMALIZERS (Optional[Union[str, Normalizer, Iterable[Union[None, str, Normalizer]]]], 
      default=[]): normalizers to use when building postprocessing CFG's. Will build those 
      CFG's once for each of the normalizers here. Can be:

      * None: will not normalize at all (use raw input) for CFG's, will use 'unnormalized' 
        normalization (default BaseNormalizer()) for MemCFG's
      * str: string name of a normalizer to use. See BinCFG documentation for list of 
        available normalizer strings
      * Normalizer: Normalizer-like object to use. See BinCFG documentation
      * Iterable of any of the above: will do one of each normalization for each datapoint

    - ANALYZERS (Optional[Union[str, List[str]]], default=[]): string or list of strings for
      the analyzers to use. Should exist in the container_info YAML file. Can be empty if 
      you wish to not analyze files, but only compile

    - COMPILE_METHODS (Optional[List[Dict[str, Any]]], default=[]): list of dictionaries of 
      compile methods to apply. See the 'Compile Methods' subsection for more info. Can be 
      empty if using CAP task 'binaries' to analyze pre-compiled binaries

Containers

You must have containers made before execution. These can currently either be singularity image files within the '--containers_dir' directory, or prebuilt docker images ready in the global docker. All of the code used to build these containers is in the '/singularity' directory at the top of this repository. See that directory's readme for info on how to set up and build singularity/docker containers used in CAP

The currently available container platforms are:

'docker'
'singularity'

Input Data

There are multiple forms of input data depending on what CAP-ing you wish to perform. These can often be automatically detected, see the 'Automatic Detection' subsection for more info

Partitioned Data

Assumes your input is partitioned into multiple parquet files based on an integer key 'id'. Makes the following assumptions:

The '--task' passed was 'partitioned' or automatically detected to be 'partitioned'
There is a parquet file located at '--partitionedinfopath' (defaults to "[inputpath]/[execuid].parquet" if not passed) which contains the metadata about the source code files being CAP-ed for this execution uid. This file should contain the columns "id" and "programming_language" designating a unique integer id per individual source code, and a string programming language identifier respectively. The "id" is the key that will be used to look up the source code inside the 'partitioned' folder. This file may contain other columns as well which will be added to each CAP-ed file as metadata. All other columns will be considered metadata
There is a folder located at '--paritioneddir' (defaults to "[inputpath]/partitioned" if not passed) which contains one or more parquet files. These parquet files will be partitioned by the integer "id" key such that each file contains a contiguous range of id's. Each parquet file should be named "[start]-[end].parquet" where 'start' is the starting id (inclusive) and 'end' is the ending id (exclusive) of the range for that file. Each file should contain the columns "id" for the id key and "source" for the source code. Any other columns will be ignored.

Source Code

A single source code file. You can pass either 'source' or 'source-[language]' as the task with '[language]' being the programming language used (if not there, then the language will be automatically detected). Makes the following assumptions:

The '--task' passed was 'source' or 'source-[language]', or was automatically determined to be one of those. If '[language]' is not present, then the source code programming language will be automatically determined. See 'Languages and File Types' for info on source code programming languages
The '--input_path' points to a single source code file
You have passed one or more compile methods. See the 'Compile Methods' subsection for more info

The 'id' column for this will be the filename.

Binary

A single precompiled binary file. You can pass either 'binary' or 'binary-[filetype]' as the task with '[filetype]' being the type of the binary (currently not used, but may be used in the future). If '[file_type]' is not passed, it will be automatically detected. Makes the following assumptions:

The '--task' passed was 'binary' or 'binary-[filetype]', or was automatically determined to be one of those. If '[filetype]' is not present, then the binary file type will be automatically detected. See 'Languages and File Types' subsections for info on binary file types
The '--input_path' points to a single precompiled binary file

The 'id' column for this will be the filename.

Tabular

A single file containing some tabular data (EG: csv, parquet, etc.). You can pass either 'tabular' or 'tabular-[filetype]' as the task with '[filetype]' being the type of tabular data. If '[file_type]' is not passed, it will be automatically detected. Makes the following assumptions:

The '--task' passed was 'tabular' or 'tabular-[filetype]', or was automatically determined to be one of those. If '[filetype]' is not present, then the tabular file type will be automatically detected. See 'Languages and File Types' subsections for info on tabular file types
The '--input_path' points to a single tabular file
If compiling and analyzing, then this file contains at least the columns 'id', 'source', and 'programminglanguage'. Source codes will be compiled and analyzed based on their programming language. Any extra columns will be treated as metadata. If the column 'binary' is also present in addition to the 'source' and 'programminglanguage' columns, then the 'binary' column will be ignored and CAP will default to compiling/analyzing instead of purely analyzing
If only analyzing, then this file contains at least the columns 'id' and 'binary'. No compilation will be performed. Any extra columns will be treated as metadata. If the file also contains the columns 'source' and 'programming_language', then this option will not be chosen, and CAP will default to compiling/analyzing instead of purely analyzing.

Project

A folder containing all the files relevant to a single project. You can pass either 'project' or 'project-[buildtype]' as the task with '[buildtype]' being the type of project being built. If '['build_type]' is not passed, it will be automatically detected. Makes the following assumptions:

The '--task' passed was 'project' or 'project-[buildtype]', or was automatically determined to be one of those. If '[buildtype]' is not present, then the project build type will be automatically detected. See 'Languages and File Types' subsections for info on tabular file types
The '--input_path' points to a directory containing all the files required for the project to be built
There exists a special file within this directory that will determine how things are CAP-ed. This file can be:

1. 'CAP.json': a file special to CAP. It should be in JSON format. The object should be a 
   dictionary, and it can have the following keys/values:

  * 'only_cap': a string or list of string filenames relative to this directory for the 
    file/files that should be CAP-ed in this project. This is useful for times when there 
    needs to be multiple files in the same directory as the main CAP file while 
    compiling/analyzing (EG: header files when compiling, shared libraries when analyzing 
    with rose, etc.). Files should only be either source code or binary files, not tabular 
    files or project directories. Files should be auto-detectable

  NOTE: dictionary keys will be searched for in the above order. If there are multiple
  conflicting keys, the first one found in the order is what will be used

NOTE: files will be searched for in the above order. If there are multiple conflicting files
the first one found is what will be used

Miscellaneous Folders/Files

A folder containing some number of files or subfolders to CAP. You should pass the 'misc' task or 'misc-recursive' task, or if left to 'auto' and another task couldn't be determined, then CAP will default to 'misc' CAP task. The 'misc-recursive' task is the same as 'misc', but will recursively check subfolders for more files/projects to CAP. Makes the following assumptions:

The '--task' passed was 'misc' or 'misc-recursive', or was automatically determined to be that
The '--input_path' points to a directory containing some number of files/folders. Files will have their types automatically detected, while folders are always assumed to be project folders (See the 'Project' subsection).

The 'id' column for files/projects will be built off of their filepaths relative to the '--input_path' directory like so:

source code/binaries: 'file@[filepath]' where '[filepath]' is the path to that file relative to the '--input_path' directory
tabular data: 'tabular@[filepath]-[id]' where '[filepath]' is the path to that file relative to the '--input_path' directory and '[id]' is the value of the 'id' column in the tabular data
projects: 'project@[folderpath]-[binaryfilename]' where '[folderpath]' is the path to that project's folder relative to the '--inputpath' directory and '[binary_filename]' is the name of the compiled binary built by that project (NOTE: it's possible for a project to build multiple binaries)

Note on symlinks:

Whenever symlinks are present (either within a project folder, or when CAP-ing a standalone file), the resolved symlink paths will be added as bound directories when using containers automatically. This will apply recursively to project folders.

Languages and File Types

CAP currently has built-in support for a few programming languages and file types (though, others may be added somewhat easily, see the 'Extend Me' section for more info).

Programming Languages

The following programming languages can be CAP-ed (assuming the relevant containers exist) without any modification to CAP:

'c', 'c++': C and C++ programming languages
'java': Java programming language

These can be automatically detected as source code files as well as handled during compilation and analysis.

Binary Types

These are not currently used, however we can currently automatically detect the following binary types based on their 'magic' bytes:

'elf': ELF files
'pe': Windows PE files
'macho': MacOS MachO files
'java': Java Classfiles

Tabular Types

The following tabular data types can be automatically detected and handled:

'csv': CSV files
'parquet': Parquet files

Project Types

The following project build types are available:

'cap': allows for fine-grained control over what happens based on a CAP.json file located in the directory
'cmake': blah blah blah

Precomputed Tokens

The '--atomicdatadir' folder can contain precomputed tokens used for doing BinCFG postprocessing (creating CFG's, MemCFG's, etc.). These precomputed token files should have their names start with "precomputed_tokens", and should be pickle files containing a 2-tuple of (norm, tokens), where 'norm' is the BinCFG normalizer used and 'tokens' is a dictionary mapping string tokens to their integer value. These are useful as the default AtomicTokenDict in BinCFG can be slow when trying to atomically update the token dictionary file for many tokens/processes at once. Having a good chunk of the tokens precomputed can reduce the number of atomic updates required by a lot.

Output Data

Most output data will appear in the '--outputdir' directory. Each running process will output its own data into one or more parquet files. It will save data in chunks, saving to a new file every time the current chunk begins to take up too much memory. Each file will have filepath "[outputdir]/[execuid]-[taskid]-[saveidx].parquet", where 'taskid' is the id (integer) of the current process's task (in case you are using multithreading/multi-node), and 'save_idx' is the index of the current chunk of data being saved. The parquet file can have the following columns:

'id': the id of the datapoint, usually either int or string
'analyzer': the string name of the analyzer used, or None if no analysis was performed or an error occurred before or during analysis.
'language_family': the string language family used when compiling (See 'Programming Languages' subsection for list of currently available language families by default), or None if no compilation was performed or an error occurred before or during compilation
'compilerfamily': the string compiler family used when compiling. These will be the main names of compiler families in the containerinfo.yaml file. Will be None if no compilation was performed or an error occurred before or during compilation
'compiler': the string name of the compiler used when compiling, or None if no compilation was performed or an error occurred before or during compilation
'compiler_version': version string of the compiler used when compiling, or None if no compilation was performed or an error occurred before or during compilation
'architecture': the string architecture name compiled to when compiling, or None if no compilation was performed or an error occurred before or during compilation
'flags': list of string flags passed to compiler when compiling, or None if no compilation was performed or an error occurred before or during compilation
'binaries': list of all output binaries associated with this cap file. Each binary is a bytes() object containing the full bytes of the compiled binary. For languages like c/c++, this list will likely have only one element (the compiled binary), but for others like Java, this list may contain multiple elements (one for each classfile produced).
'binarymd5hashes': list of md5 hashes of output binaries
'totalsizemb': total size of all binaries in MB
'analyzer_output': the string text output from the analyzer
'error' (List[Optional[str]]): string error message for any error occurr during CAP. Will be None if no error occurred. Specifically, this is an error from within python, not an error from compile/analysis stderr
'compile_stdout' (List[Optional[str]]): string output from stdout during compilation process, or None if no compilation was performed or an error occurred before or during compilation
'compile_stderr' (List[Optional[str]]): string output from stderr during compilation process, or None if no compilation was performed or an error occurred before or during compilation
'analyzer_stdout' (List[Optional[str]]): string output from stdout during analyzer process, or None if no analysis was performed or an error occurred before or during analysis
'analyzer_stderr' (List[Optional[str]]): string output from stderr during analyzer process, or None if no analysis was performed or an error occurred before or during analysis
'compile_time' (List[Optional[float]]): time in seconds required to compile, or None if no compilation was performed or an error occurred before or during compilation
'analysis_time' (List[Optional[float]]): time in seconds required to analyze, or None if no analysis was performed or an error occurred before or during analysis
'cfg[normname][idx]': bytes() containing pickled BinCFG CFG() object normalized with the normalizer '[normname]'. The '[idx]' is just an integer index in the list of normalizers being used in case multiple normalizers are passed which have the same name. One column will exist per normalizer used iff the 'cfg' string is present in the postprocessing EXECUTION_INFO list
'memcfg[normname][idx]': bytes() containing pickled BinCFG MemCFG() object normalized with the normalizer '[normname]'. The '[idx]' is just an integer index in the list of normalizers being used in case multiple normalizers are passed which have the same name. One column will exist per normalizer used iff the 'memcfg' string is present in the postprocessing EXECUTION_INFO list
'stats[normname][idx]': bytes() containing pickled BinCFG CFG().getcompressedstats() numpy array normalized with the normalizer '[normname]'. The '[idx]' is just an integer index in the list of normalizers being used in case multiple normalizers are passed which have the same name. One column will exist per normalizer used iff the 'stats' string is present in the postprocessing EXECUTION_INFO list
'meta[metadatakey]': metadata associated with the CAP-ed datapoint. Each key that was in the metadata will have 'meta_' prepended to it and its value will be stored in its associated column

Any of these columns can be dropped and will not appear in the output files if their name appears in the DROP_COLUMNS parameter in the execution info.

The only other set of possible output data is BinCFG AtomicTokenDict tokens, which will appear in the "--atomicdatadir" directory

Automatic Detection

Files and folders can be automatically detected depending on what task was passed. When the 'auto' task is passed, we check in order:

Does the '--inputpath' point to a directory? If so: 1.a Are the '--partitioneddir' and '--partitionedinfopath' passed and valid folders/files? If so, assume task='paritioned' 1.b Otherwise, check this folder for a special file designating it as a project build folder. EG: a 'CMakeLists.txt' file. If so, assume task='project' 1.c Otherwise, assume task='misc'
Otherwise, assume we are pointing to a file: 2.a Check if we were pointing to a tabular file. If so, assume task='tabular' 2.a.0 Does this file end with a known file extension for tabular files? EG: '.csv', '.parquet' 2.a.1 Does this file start with known magic bytes? EG: "PAR1" for parquet files 2.b Check if we were pointing to a precompiled binary file. If so, assume task='binary' 2.b.0 Does this file start with known binary magic bytes? EG: 0xCAFEBABE for java classfiles 2.c Check if we were pointing to a source code file. If so, assume task='source' 2.c.0 Does this file contain known strings that uniquely (or likely uniquely) designate this file as a particular language? EG: something similar to "public static void main(String[] args)" for Java 2.d We couldn't auto-detect the file, raise an error

Running CAP

CAP can be run in two main ways: from the command line, and by importing CAP.py and calling capmain(). This code doesn't have the capability to actually execute HPC jobs, that is left to the user. However, there is a runcap.sh file if you are by chance using singularity and the SLURM job manager which you could modify.

Command Line


usage: CAP.py [-h] [--atomic_data_dir ATOMIC_DATA_DIR] [--containers_dir CONTAINERS_DIR] [--logs_dir LOGS_DIR] [--output_dir OUTPUT_DIR]
              [--input_path INPUT_PATH] [--partitioned_info_path PARTITIONED_INFO_PATH] [--partitioned_dir PARTITIONED_DIR]
              [--container_info_path CONTAINER_INFO_PATH] [--temp_dir TEMP_DIR] [--execution_info_path EXECUTION_INFO_PATH]
              [--default_dir DEFAULT_DIR] [-n N_JOBS] [-t TASK_ID] [--threads THREADS] [--task TASK] [--container_platform CONTAINER_PLATFORM]
              [--fail_on_error] [--hpc_copy_containers] [--specific_tasks SPECIFIC_TASKS]

Compile. Analyze. Prepare.

optional arguments:
  -h, --help            show this help message and exit
  --atomic_data_dir ATOMIC_DATA_DIR
                        The path to a directory for atomic data
  --containers_dir CONTAINERS_DIR
                        The path to a directory for containers
  --logs_dir LOGS_DIR   The path to a directory for log files
  --output_dir OUTPUT_DIR
                        The path to a directory for output data
  --input_path INPUT_PATH
                        The path to a directory/file for the input data
  --partitioned_info_path PARTITIONED_INFO_PATH
                        The path to a parquet file containing the metadata and id keys for a "partitioned" CAP process
  --partitioned_dir PARTITIONED_DIR
                        The path to a directory containing partitioned data for a "partitioned" CAP process
  --container_info_path CONTAINER_INFO_PATH
                        The path to a YAML file containing the container_info. Defaults to "container_info.yaml" file 
                        assumed to be right next to this file.
  --temp_dir TEMP_DIR   The path to a directory for temporary files
  --execution_info_path EXECUTION_INFO_PATH
                        The path to a python file that should be imported to get the execution information
  --default_dir DEFAULT_DIR
                        The path to a directory for any unpassed default directories. Any missing directories will use 
                        default names and be subdirectories of this one
  -n N_JOBS, --n_jobs N_JOBS
                        The number of jobs running. If this argument is not passed, will first check the os enironment 
                        variable SLURM_ARRAY_TASK_COUNT. If that does't exist, then will assume n_jobs=1
  -t TASK_ID, --task_id TASK_ID
                        The task_id for this process. Should be in the range [0, num_jobs - 1] and unique for each process. 
                        If this argument is not passed, then the SLURM_ARRAY_TASK_ID environment variable will be used.
  --threads THREADS     Number of threads to use for this task.
  --task TASK           Which task to run. Can be: "auto", "partitioned", "source", "source-[language]", "project", 
                        "project-[build_type]", "binary", "binary-[binary_type]", "tabular", "tabular-[file_type]", 
                        "file", "folder", "misc", or "misc-recursive". See README.md for info
  --container_platform CONTAINER_PLATFORM
                        The container platform to use
  --fail_on_error       By default, most errors will be captured and saved silently into the output data during the CAP 
                        process. If this is True, then any error while CAP-ing a file/folder will instead be raised, an 
                        error will be printed to the log files, and that data will not be stored in the output files. 
                        This will not stop the entire CAP process, however, as files and folders will continue to be CAP-ed. 
                        This just makes the errors visible in the logs and doesn't save them along with the output data
  --await_load          If passed, then each thread within an execution will wait to begin loading its data until the 
                        previous thread has completed the data loading process to save memory during the intial 
                        loading/splitting phase
  --hpc_copy_containers
                        If this flag is passed, then it is assumed that we are running on HPC systems, and we should copy
                        container files from the given `containers_dir` into a temporary place on this node's in-memory 
                        filesystem for faster loading of containers. The 'containers' path will be automatically updated
                        to be "[temp_path]/containers" with all containers for the original "--containers_dir" directory
                        being copied into that path
  --specific_tasks SPECIFIC_TASKS
                        Specific task_id's you wish to run. Should be a comma separated list of integer task_id's. It is
                        assumed that if there are multiple tasks, then they should be run in parallel. If this is passed,
                        then you must pass most values directly. EG: `n_jobs` must be passed, `task_id` must not be passed,
                        `threads` should be the same value that was used during full execution and will not specify the 
                        number of threads to use to run these specific tasks (it is only used for proper logging, `task` 
                        must be passed and cannot be 'all', and `task_id_offset` must not be passed or must be set to the 
                        default value of 0.

Calling cap_main()

You can also import from CAP.py the cap_main() function and call that yourself. It has signature/docstring:

def capmain(paths, execinfo, task, njobs=1, threads=1, taskid=0, hpccopycontainers=False, containerplatform=None, specifictasks=None): """The main entrypoint for a CAP process

See the README.md for info on how to set everything up, arguments, etc.

Paths that for sure exist before calling various _main's: 'atomic_data', 'containers', 'logs', 'output', 'input',
    'temp', 'container_info'. Ones that are optional: 'partitioned_info', 'partitioned'

Args:
    paths (Dict[str, str]): dictionary of paths to use. Available keys: 'default', 'atomic_data', 'containers',
        'logs', 'output', 'input', 'temp', 'partitioned_info', 'partitioned', 'container_info'

        Some paths may have various substrings which will be replaced. These are:

            - "[default_dir]": replaces with the default directory path
            - "[input_path]": replaces with the input data directory path
            - "[main]" or "[cap]": replaces with the path to the directory containing this file

        These paths must be present to use and the default directory path cannot use any of them

        NOTE: not all of these have to be present, just enough so that we can build a path for every needed one

    exec_info (Dict[str, Any]): dictionary of execution info. Must contain the keys:

        - 'execution_uid' (str): unique string identifier for this execution

        Can contain the optional keys:

        - 'postprocessing' (Optional[Union[str, List[str]]], default=[]): string or list of strings for the 
          postprocessings to apply to analyzer outputs, or None to not apply any. Available strings:

          * 'cfg': build a CFG() object, one for each of the normalizers in exec_info['normalizers']
          * 'memcfg': build a MemCFG() object, one for each of the normalizers in exec_info['normalizers']
          * 'stats': build a CFG() object and get the graph statistics with cfg.get_compressed_stats(), one for
            each of the normalizers in exec_info['normalizers']

          NOTE: these will be stored as pickled bytes() objects

        - 'drop_columns' (Optional[Union[str, List[str]]], default=[]): by default, all of the data generated is kept.
          This if not None, can be a string or list of strings of the column or group of columns to drop. You may also
          pass any columns that would appear in the metadata, and those will be dropped. Metadata columns to 
          drop can be passed either as their original name, or with the prefix 'meta_' as they would appear in
          the output data. Any columns that do not correspond to data being kept will raise an error, unless they 
          start with the prefix 'meta_', in which case it is assume that that column is a possible metadata column 
          which may or may not exist. Available non-metadata columns to drop: 

          'analyzer', 'binaries', 'analyzer_output', 'metadata', 'error', 'compile_stdout', 'compile_stderr', 
          'analyzer_stdout', 'analyzer_stderr', 'compile_time', 'analysis_time', 'language_family', 'compiler_family',
          'compiler', 'compiler_version', 'architecture', 'flags'

          There are also a couple of special strings that will drop groups of columns including:

          * 'metadata': drop any metadata that was passed in metadata dictionaries
          * 'compile_info': drop all of the compilation info
          * 'timings': drop all of the timing information ('compile_time', 'analysis_time')
          * 'stdio': drop all of the stdio information ('compile_stderr', 'analyzer_stdout', etc.)
          * 'stdout': drop all of the stdout information ('compile_stdout', 'analyzer_stdout')
          * 'stderr': drop all of the stderr information ('compile_stderr', 'analyzer_stderr')

          See the README.md for what all of these columns are.

        - 'normalizers' (Optional[Union[str, Normalizer, Iterable[Union[None, str, Normalizer]]]], default=[]): normalizers 
          to use when building postprocessing CFG's. Will build those CFG's once for each of the normalizers here. Can be:

          * None: will not normalize at all (use raw input) for CFG's, will use 'unnormalized' normalization
            (default BaseNormalizer()) for MemCFG's
          * str: string name of a normalizer to use
          * Normalizer: Normalizer-like object to use
          * Iterable of any of the above: will do one of each normalization for each datapoint

        - 'analyzers' (Optional[Union[str, List[str]]], default=[]): string or list of strings for the analyzers to
           use. Can be empty if you wish to not analyze files, but only compile
        - 'compile_methods' (Optional[List[Dict[str, Any]]], default=[]): list of compile methods to use. See the 
          readme for this
        - 'container_platform' (Optional[str]): the container platform to use, or None to detect one by default
        - 'fail_on_error' (Optional[bool]): By default, most errors will be captured and saved silently into the
          output data during the CAP process. If this is True, then any error while CAP-ing a file/folder will
          instead be raised, an error will be printed to the log files, and that data will not be stored in the
          output files. This will not stop the entire CAP process, however, as files and folders will continue
          to be CAP-ed. This just makes the errors visible in the logs and doesn't save them along with the output data
        - 'await_load' (Optional[bool]): If True, then each thread within an execution will wait to begin loading 
          its data until the previous thread has completed the data loading process to save memory during the intial 
          loading/splitting phase

    task (str): the task being run. Can be:

        - "auto": automatically determine what type of task is being run based on the input directories
        - "partitioned": CAP-ing partitioned data
        - "source": CAP-ing a single source file. Language will be automatically detected
        - "source-[language]": CAP-ing a single source file, with the '[language]' being the language family used
        - "project": CAP-ing a project (a directory of files that produce one or more binaries as a part of the
          same project). Project built type will be automatically detected
        - "project-[build_type]": CAP-ing a project (a directory of files that produce one or more binaries as a 
          part of the same project). '[build_type]' determines the project build type
        - "binary": CAP-ing a single precompiled binary. Binary type will be automatically detected
        - "binary-[binary_type]": CAP-ing a single precompiled binary. '[binary_type]' determines the binary type
        - "tabular": CAP-ing a single file containing tabular data (EG: csv, parquet, etc.). File type will be
          automatically detected
        - "tabular-[file_type]": CAP-ing a single file containing tabular data (EG: csv, parquet, etc.). '[file_type]'
          determines the type of tabular data in the file
        - "file": CAP-ing a single file. Type of file will be auto-detected
        - "folder": CAP-ing a folder. Type of folder will be auto-detected
        - "misc": CAP-ing a bunch of files/folders within a directory. Files and folders types will be automatically
          detected and CAP-ed
        - "misc-recursive": same as "misc", but will recursively check subfolders for other files/projects to CAP

    n_jobs (int): the total number of jobs being run
    threads (int): the number of threads to use per task
    task_id (int): the id of this current task
    hpc_copy_containers (bool): if True, will assume we are running on HPC and copy all of the containers over
        to "[temp_dir]/containers" for faster loading
    specific_tasks (Optional[List[int]]): if passed, then only these specific tasks will be run
"""

Safety

There are currently some safety concerns with this code. Specifically, while YAML files are safely loaded, we currently allow for arbitrary python code to be executed while loading the containerinfo YAML file (specifically, when loading information about 'compiler' type containers). So long as you are using a trusted containerinfo file, you will be fine. See the 'Container Info' subsection in the 'Extend Me' section for more info on why/when this happens (specifically, the information about "Command Strings").

Another concern may be that the execution_info python file that is used to hold CAP execution information is imported and run like a normal python file. So again, don't use untrusted files!

Extend Me

Information on how to extend this code to new languages, analyzers, etc.

Adding new language: - Build singularity container for language compiler(s) - Add container info to containerinfo file - Add cleaning method to cleansource() - Add parser for language family to utils.misc.getlanguagefamily() file - Add file format detections to misc - Add new compileLANGfile() method to cap.processdata.compile.py file. See header for methods specs. - Add lang to cap.processdata.compile.compilesinglefile() method - Add info to CAP.py in FILEEXTENSIONSTOLANGUAGE and BINARYFILE_EXTENSIONS - Update README.md and Docstrings :) Adding new analyzer: Adding new container platform:

Container Info

The container_info file contains information about, who would have guessed, containers. It is a YAML file which should look something like:

``` ContainerName1: type: ContainerType1 container: ContainerPath1 ...

ContainerName2: type: ContainerType2 container: docker: ContainerPath2Docker singularity: ContainerPath2Singularity ```

The ContainerName is the name of the container, and how that container will be referenced in CAP code and settings (EG: the name of the compiler_family/analyzer being used).

ContainerType is a string designating the type of that container. Currently available container types are: 'compiler', 'analyzer'. In case I forget to update this file, the definitely-up-to-date list of them can be found in the cap.parsing.containerinfo.CONTAINERTYPE global variable.

The ContainerPath contains information on how to locate the container being used. It can be a:

string: the name of the container to use. For singularity files, this will be the location of the '.sif' file relative to the 'containersdir'. If the name doesn't end in '.sif', then it will be added when looking for a singularity image file. For docker images, this will be the name of the docker image used. This way, one could enter the string 'rose-analysis', and CAP will look for a '[containerdir]/rose-analysis.sif' file when using singularity, or will attempt to use the 'rose-analysis' docker image when using docker.
dictionary: the name to use depending on the container platform being used. Currently available container platforms are: 'docker', 'singularity'

There may be multiple container informations within a file. Each of them will require different information depending on what type of containter it is.

Analyzer Containers

These objects give information about analyzers, where to find them, how to use them, etc. They should look something like:

AnalyzerName: type: 'analyzer' container: AnalyzerPath analysis_cmd: AnalysisCmd

With the analysiscmd being the command which should be executed within that container in order to produce analysis output. The input will be placed within a mounted folder at '/mounted/{binarybasename}', and the output is expected to exist at '/mounted/{analyzeroutputbasename}' once analysis is complete. You should use the strings "{binarybasename}" and "{analyzeroutput_basename}" as those will automatically be inserted into the string with .format() when running the analysis.

It is currently expected that the output will be a single file that should be read in as a string (not bytes), and that the analyzer will not modify the original binary file.

Compiler Containers

These objects give information about compiler families and their compilers, versions, architectures, flags, etc. These can get quite complicated, so there is a lot of syntactic sugar and defaults built in to help ease this and reduce the size of the files.

Basic Structure

Each compiler container object should contain information for one or more compiler 'families' (also called compiler 'collection' or compiler 'suite', EG: GCC, Clang, etc.). Information is structured as: family -> compiler -> version -> architecture (arch). That is, each compiler family has a number of compilers, each of those a number of versions, and so on:

- 'family': a compiler family (collection/suite). EG: GCC, Clang, etc.
- 'compiler': a single compiler in the family. EG: for GCC, these would be 'gcc', 'g++', 'gfortran', etc.
- 'version': a single version of a compiler. EG: for gcc, these could be '5.5.0', '7.5.0', '11.3.0', etc.

  NOTE: version strings can optionally start with a 'v' and it will be ignored. EG: 'v5.5.0', 'v7.5.0', etc.

  Versions can either take the form of "[VERSION_NUMBER]" or "[VERSION_NUMBER]-[EXTRA_INFO]", where VERSION_NUMBER
  is a '.'-separated list of digits of any positive length (EG: '3', '7.5.0', '18.47.62.1234'), and EXTRA_INFO can 
  be any string to help differentiate versions with the same number. 

  Versions are comparable. They are compared first in order of their VERSION_NUMBER's, then their EXTRA_INFO. That
  is, they are compared by the integer values created by splitting thier VERSION_NUMBER on '.'s from left to right,
  then by their EXTRA_INFO as a plain string comparison. The lack of VERSION_NUMBER's at that index would make that
  version smaller than one that does have a VERSION_NUMBER at that index (same goes for EXTRA_INFO). 
  EG: "2" < "2.3.5" < "2.4" < "2.4.0" < "2.4.0-alpha" < "2.4.0-beta" < "2.5" < "3" < "18.5"

- 'arch' (architecture): a single target architecture

At each level in this hierarchy, there are some metadata fields that should/can be present. The types of these fields can be:

- "Optional": Optional, doesn't have to be here
- "Partially-Optional": Optional only in the sense that this information doesn't necessarily have to be at this
  level, but must exist by the lowest level. EG: the 'binary_name' field must exist by the 'arch' level, but could
  theoretically exist at some level above it and be inherited.
- "Inheritable": This field is inheritable, meaning all information lower down the hierarchy will by default inherit
  these values from those above it in the hierarchy (if available), and override it. EG: the 'force_flags' field can 
  be present at every level of the family -> compiler -> version -> architecture hierarchy. Those at lower levels 
  (say, 'version') would inherit the 'force_flags' of its parents ('compiler' and 'family' in this example) if it 
  exists in those levels. If the 'force_flags' field was present in this level ('version'), then it would override 
  those from above (how it overrides may be different depending on the field)

By default, if something is not Optional/Partially-Optional, then it is required at that level. If something is not Inheritable, then it will only be used at that level.

NOTE: Names/values are case-sensitive.

Field Information

Information on all the fields that exist in the compiler:

- 'binary_name' [Partially-Optional, Inheritable]: string name of the binary to use to compile (or path to that 
  compiler binary). Can be inherited, and overridden by lower levels. Needed by the arch-level

- 'container' [Partially-Optional, Inheritable]: string basename (IE: without file extension) for filename of the
  container that contains the compiler binary. Needed by the arch-level

- 'supported_languages' [Partially-Optional, Inheritable]: either string or list of strings for the language/languages
  that the compiler binary is able to compile. Needed by the arch-level. 
  NOTE: it is likely best to do this at the compiler or lower level

- 'use_previous' [Optional]: allows one to use the previous value information at the current hierarchy level as a
  starting point. That information will be copied, and all information in this value will override that. EG: the 'g++' 
  compiler might use the 'gcc' compiler as a use_previous, in which case all information in the 'gcc' compiler will 
  be copied as used as a starting point for the 'g++' compiler, and all information in the 'g++' compiler would then
  override that data. The value can be either:

    * string: the string name of the value at that level in the hierarchy to use, so order in the file wouldn't matter
    * bool (True): only available at 'version' level. Uses the immediately previous version

NOTE: currently it is only possible to use previous values at the same hierarchy level in the same parent

The priority for data at sublevels varies depending on the key. The `flags` and `force_flags` at lower levels
will override/add to those inherited from upper levels when using use_previous. All other flags will be overriden
instead by those inherited.

- 'flags' [Optional, Inheritable]: flags that are available to this value and all levels lower in the hierarchy. This
  should be a dictionary with flag/value pairs. Each 'flag' is the name of a compiler flag (Ignoring the first '-' 
  that would be present. If there are multiple '-'s that should be used, then add in all but the first one to the name). 
  Each value can be:

    * null (None): this has the effect of removing/ignoring that flag. Useful for removing deprecated/outdated 
      flags from versions that use_previous
    * boolean: This is a flag that is either present or not present, and has no value. The boolean value determines 
      whether or not there is an associated 'no-' flag as well. If False, then there is no extra flag. If True, then
      an extra possible 'no-' flag will be added by inserting the string 'no-' after the first letter of the flag name. 
      EG: "finline: True" would add the mutually exclusive flags 'finline' and 'fno-inline'.
    * string/int: a value to always use for this flag. Value will be directly appended to the flag name, so delimiters
      must be inserted here as a string if using. EG: to set a flag '--fake_flag=3', one would use "-fake_flag: '=3'",
      or for flag '-m32': "m: '32'" which incidentally is equivalent to "m32: false"
    * list/tuple: the first value should always be a string for the separator that will be inserted between the flag
      name and any values in this list, and all other values are string/int values that are possible options for this
      string. EG: setting the possible flags '-std=c++11', '-std=gnu++11', '-std=c++14', and '-std=gnu++14' would be:
      "std: ['=', 'c++11', 'gnu++11', 'c++14', 'gnu++14']", or using the flags '-m64' and '-m32' would be: 
      "m: ['', 32, 64]"

  All string values may also be "command strings" (See: 'Command Strings' section below). These strings may have
  substrings within them surrounded by '$$' (like latex), and those substrings will be evaluated using eval(). There
  are also some added classes for more functionality.

- 'force_flags' [Optional, Inheritable]: flags that will be automatically set/forced for everything at/lower on the
  hierarchy. These will override flags with the same name in the `flags` key for all levels at/below it. This should
  be a dictionary with flag/value pairs within it:

  ```
  force_flags:
    flag1: value
    flag2: value
    ...
  ```

  EG: the 'gcc' compiler version 5 is only fully-compatible with c++ std versions c++98, c++11, c++14,
  gnu++98, gnu++11, gnu++14. So, one might wish to add the following to their compiler info file at the gcc 5.5.0
  version:

  ```
  force_flags:
    std: ['=', 'c++98', 'gnu++98', 'c++11', 'gnu++11', 'c++14', 'gnu++14']
    ...
  ```

  Which would force all arch's in that version to only allow a c++ std version of '-std=c++98', '-std=gnu++98'...

  This field is also inheritable, so those lower down the hierarchy will by default use the force_flags from the
  level above them. This field can also be overridden in that:

    * flags in multiple levels will be overridden in the lowest level. NOTE: setting a flag to null will remove
      that flag from force_flags in that level and those below it
    * flags in a lower level that are not in higher levels will be added to the force_flags for that level and
      those below it
    * flags in a higher level that are not in a lower level will be inherited

  EG: in the example above, let's say that the 'i686' architecture binary (for some reason) cannot use the gnu++
  std versions. One could then insert the force_flags in the 'i686' architecture:

  ```
  i686:
    force_flags:
        std: ['=', 'c++98', 'c++11', 'c++14']
    ...
  ```

  And that would still use all of the force_flags from the levels above it, but would override the 'std' flag
  for only the 'i686' architecture

Complete Compiler Info Structure

The compiler info object should look like:

```

The name of a single compiler family/collection/suite. EG: 'GCC', 'Clang'

compiler_family:

# Family-level data
'use_previous': # [Optional]
'container': name  # [Partially-Optional, Inheritable]
'binary_name': name  # [Partially-Optional, Inheritable]
'supported_languages': langauges  # [Partially-Optional, Inheritable]
'force_flags':  # [Optional, Inheritable]
    ...
'flags':  # [Optional, Inheritable]
    ...

# All the compilers available to the current compiler family
'compilers':

    compiler1:

        # Compiler-level data
        'use_previous': previous_value  # [Optional]
        'container': name  # [Partially-Optional, Inheritable]
        'binary_name': name  # [Partially-Optional, Inheritable]
        'supported_languages': langauges  # [Partially-Optional, Inheritable]
        'force_flags':  # [Optional, Inheritable]
            ...
        'flags':  # [Optional, Inheritable]
            ...

        # All the versions in this compiler
        'versions':

            version1:

                # Version-level data
                'use_previous': previous_value  # [Optional]
                'container': name  # [Partially-Optional, Inheritable]
                'binary_name': name  # [Partially-Optional, Inheritable]
                'supported_languages': langauges  # [Partially-Optional, Inheritable]
                'force_flags':  # [Optional, Inheritable]
                    ...
                'flags':  # [Optional, Inheritable]
                    ...

                # All of the architectures available in this version
                'architectures':

                    arch1:

                        # Architecture-level data
                        'use_previous': previous_value  # [Optional]
                        'container': name  # [Partially-Optional, Inheritable]
                        'binary_name': name  # [Partially-Optional, Inheritable]
                        'supported_languages': langauges  # [Partially-Optional, Inheritable]
                        'force_flags':  # [Optional, Inheritable]
                            ...
                        'flags':  # [Optional, Inheritable]
                            ...

```

NOTE: versions (and only versions) can be left empty or set to null to automatically "use_previous: True" with no modifications

Variable Paths/Names

Values that are path/name strings ('container', 'binary_name') can use variables in their values for easy/repetative naming. They should be inserted in strings as bracket variables that will be formatted with .format(). These names will be inserted right at the end of loading. Kwargs are:

- 'family'/'compiler'/'version'/'arch'/'architecture': insert the current name of this value. 'arch' and 'architecture'
  are the same. All of these values will be available since they will only be inserted right at the end, after 
  inheriting all the way down to the final level. 'version' will insert the raw version string
- 'v': insert the cleaned version string (IE: what you would get by calling str(Version(version_str)))
- 'vX': with `X` being an integer, insert the specified index of the version string. EG: "{v2}" with 
  version = Version("3.7.2.4-aaa") would insert the string '2', "{v0}" would be '3', etc. Will not insert the
  extra version string bits

NOTE: if you wish to actually enter brackets, you can just use "{{...}}" double brackets instead to escape checking for string kwargs inside

Command Strings

String flags may also contain "command strings" within them surrounded by '\$\$' (like latex). The substrings within the '\$\$'s will be evaluated using python's eval() function and thus must be valid python code.

WARNING: this will execute arbitrary python code, do not load compiler info files from untrusted sources.

These strings will be parsed into CFConcatenate() objects (See compilerselection.cfactions for more info) that will concatenate all pieces of the strings with the evaluated values replacing the sections surrounded by '\$\$'.

EG: a flag like '-flag=valXX' where 'XX' can be any integer value between 0 and 99:

flag: "=val$$CFRange(0, 100)$$"

Would make a CFConcatenate() object like:

CFConcatenate(['=val', CFRange(0, 100)])

NOTE: if for some reason you wanted to make the literal string "\$\$", you could do something like: $$'$'*2$$ which would be parsed into the literal string "\$\$"

Currently available objects for command line flags (See compilerselection.cfactions for up-to-date objects):

- CFAction(): the base class for all actions. Derivatives must override '__call__'. It is recommended that they also
  override '__hash__' for consistent use with other CAP tools. See the docs on the CFAction() object for more info

- CFRandomNoFlag(flag_name: str): Randomly switches between a flag and its 'no-' version. The 'no-' version is the 
  same as the original value, just with the string 'no-' inserted after the first character. IE: 'finline' -> 'fno-inline'

- CFConstant(const: Union[int, str]): Always return the given value (as a string). Must be a string or integer value.

- CFChoice(choices: Iterable[Union[str, int, CFAction]]): Randomly choose uniformly from a list of values. Each value 
  can be either a str/int constant or another CFAction().

- CFRange(start: int, end: int): Randomly choose an integer value in the given range from start (inclusive) to end (exclusive)

- CFConcat(vals: Iterable[Union[str, int, CFAction]], sep: str = ''): Evaluates all values in list/tuple and 
  concatenates them (with optional separator)

Technically, since we call eval() on command strings, you can use whatever variables/imports/etc is available at the time these objects are evaluated, but do so at your own risk.

Poseidon

The /poseidon directory contains our current work towards extending the Triton binary analysis tool to implement linux system emulation. It is still in development.

Triton: https://triton-library.github.io/

Release

LLNL-CODE-837816

Owner

Name: Lawrence Livermore National Laboratory
Login: LLNL
Kind: organization
Email: github-admin@llnl.gov
Location: Livermore, CA, USA

Website: https://software.llnl.gov
Twitter: LLNL_OpenSource
Repositories: 520
Profile: https://github.com/LLNL

For over 70 years, the Lawrence Livermore National Laboratory has applied science and technology to make the world a safer place.

GitHub Events

Total

Create event: 1

Last Year

Create event: 1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

cap