cvmfs-venv

Example implementation of getting a Python virtual environment to work with CVMFS LCG views

https://github.com/matthewfeickert/cvmfs-venv

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    1 of 1 committers (100.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

cvmfs hep hep-ex lcg physics python python-venv python3 uv
Last synced: 4 months ago · JSON representation ·

Repository

Example implementation of getting a Python virtual environment to work with CVMFS LCG views

Basic Info
  • Host: GitHub
  • Owner: matthewfeickert
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Homepage:
  • Size: 93.8 KB
Statistics
  • Stars: 12
  • Watchers: 2
  • Forks: 2
  • Open Issues: 5
  • Releases: 7
Topics
cvmfs hep hep-ex lcg physics python python-venv python3 uv
Created almost 4 years ago · Last pushed 5 months ago
Metadata Files
Readme License Citation

README.md

cvmfs-venv

DOI

Simple command line utility for getting a Python virtual environment to work with CVMFS LCG views. This is done by adding additional hooks to the Python virtual environment's bin/activate script.

Install

Either clone the repo to your directory or simply download the relevant files and place on PATH

console $ mkdir -p ~/.local/bin $ export PATH=~/.local/bin:"${PATH}" # If ~/.local/bin not on PATH already $ curl -sL https://raw.githubusercontent.com/matthewfeickert/cvmfs-venv/main/cvmfs-venv.sh -o ~/.local/bin/cvmfs-venv $ chmod +x ~/.local/bin/cvmfs-venv

Use

Source the script to create a Python 3 virtual environment that can coexist with a CVMFS LCG view. The default name is venv.

```console $ cvmfs-venv --help Usage: cvmfs-venv [-s|--setup] [--no-system-site-packages] [--no-update] [--no-uv]

Options: -h --help Print this help message -s --setup String of setup options to be parsed --no-system-site-packages The venv module '--system-site-packages' option is used by default. While it is not recommended, this behavior can be disabled through use of this flag. --no-update After venv creation don't update pip and setuptools to the latest releases. Use of this option is not recommended, but is faster. --no-uv After venv creation don't install uv and use it to update pip, and setuptools. By default, uv is installed.

Note: cvmfs-venv extends the Python venv module and so requires Python 3.3+.

Examples:

* Create a Python 3 virtual environment named 'lcg-example' with the Python
runtime provided by LCG view 105 on AlmaLinux 9.

    setupATLAS -3
    lsetup 'views LCG_105 x86_64-el9-gcc12-opt'
    cvmfs-venv lcg-example
    . lcg-example/bin/activate

* Create a Python 3 virtual environment named 'atlas-ab-example' with the
Python runtime provided by ATLAS AnalysisBase release v25.2.15.

    setupATLAS -3
    asetup AnalysisBase,25.2.15
    cvmfs-venv atlas-ab-example
    . atlas-ab-example/bin/activate

* Create a Python 3 virtual environment named 'venv' with whatever Python
runtime "$(command -v python3)" evaluates to.

    cvmfs-venv
    . venv/bin/activate

* Setup LCG view 105 on AlmaLinux 9 and create a Python virtual environment
named 'lcg-example' using the Python 3.9 runtime it provides.

    . cvmfs-venv --setup "lsetup 'views LCG_105 x86_64-el9-gcc12-opt'" lcg-example

* Setup ATLAS AnalysisBase release v25.2.15 and create a Python virtual
environment named 'atlas-ab-example' using the Python 3.9 runtime it
provides.

    . cvmfs-venv --setup 'asetup AnalysisBase,25.2.15' atlas-ab-example

```

Example: Virtual environment with LCG view

```console $ ssh lxplus [feickert@lxplus924 ~]$ mkdir -p ~/.local/bin [feickert@lxplus924 ~]$ export PATH=~/.local/bin:"${PATH}" [feickert@lxplus924 ~]$ curl -sL https://raw.githubusercontent.com/matthewfeickert/cvmfs-venv/main/cvmfs-venv.sh -o ~/.local/bin/cvmfs-venv [feickert@lxplus924 ~]$ chmod +x ~/.local/bin/cvmfs-venv [feickert@lxplus924 ~]$ setupATLAS -3 --quiet [feickert@lxplus924 ~]$ lsetup 'views LCG105 x8664-el9-gcc12-opt'


Requested: views ... Setting up views LCG105:x8664-el9-gcc12-opt ...

Dependencies

cvmfs-venv has no dependencies beyond the ones it aims to extend: A Linux operating system that has CVMFS installed on it with a Python 3.3+ runtime with a functioning venv module.

A full listing of all programs used outside of Bash shell builtins are: * cat * curl * ed or vi * find * readlink * sed * Python 3.3+ with pip

Why is this needed?

When an LCG view or an ATLAS computing environment that uses software from CVFMS is setup, it manipulates and alters the PYTHONPATH environment variable. By placing the contents of all the installed software of an LCG view or ATLAS release onto PYTHONPATH for the rest of the shell session, the protections and isolation of a Python virtual environment are broken. It is not possible to fix this in a reliable and robust way that will not break the access of other software in the LCG view or ATLAS environment dependent on the Python packages in them. The best that can be done is to control the directory tree at the head of PYTHONPATH in a stable manner that allows for most of the benefits of a Python virtual environment (control of install and versions of packages, isolation of directory tree).

While lcgenv allows for package specific environment building, it still lacks the control to specify arbitrary versions of Python packages and will load additional libraries beyond what is strictly required by the target package dependency requirements. That being said, if you are able to use an LCG view or lcgenv without any additional setup, you may not have need of specifying a Python virtual environment.

While Python's venv module does have the --system-site-packages option to

Give the virtual environment access to the system site-packages dir.

this unfortunately isn't quite enough. It does allow for isolation to work, but the manipulation of PYTHONPATH makes it so that while packages can be installed properly in the local virtual environment and will show up with python -m pip list if there is another version of that package provided by the already setup environment that package version's location on PYTHONPATH will take precedence. Using --system-site-packages without cvmfs-venv is arguably even worse as it provides confusing differences in information between what pip has purported to install in the user's virtual environment and the user pip list view and the runtime environment.

Caveat: This is an LCG view specific issue mostly. If nothing from LCG is used (like a pure ATLAS AnalysisBase environment, or an environment in a Linux container) then --system-site-packages by itself should be sufficient.

How things work

cvmfs-venv provides a shim layer to manage activation and use of a Python virtual environment created with LCG view resources. It does this by copying the structure of the activate scripts generated by Python's venv module. When venv creates a virtual environment it generates a shell script under the virtual environment's directory tree at bin/activate. This activate script controls and edits the shell environmental variables PATH and PYTHONHOME — placing the virtual environment's bin/ directory onto PATH and unsetting PYTHONHOME upon activation, and restoring their original values when deactivate is run. cvmfs-venv simply extends this existing behavior to also place the virtual environment's site-packages/ directory onto PYTHONPATH during activation and to remove it on deactivation. This is done by injecting Bash snippets directly into the bin/activate script generated by venv at positions found relative to the manipulation of PYTHONHOME.

Advantages

  • As cvmfs-venv is just altering the contents of the venv virtual environment's bin/activate script it is extending existing functionality and not trying to remake virtual environments.
  • Once the virtual environment is setup and modified there is no additional dependency on the cvmfs-venv script that generated it.
    • While it saves time it is not needed. You can setup the environment again without it. console $ . cvmfs-venv --setup "lsetup 'views LCG_105 x86_64-el9-gcc12-opt'" venv vs. console $ setupATLAS -3 $ lsetup "views LCG_105 x86_64-el9-gcc12-opt" $ . venv/bin/activate
  • As the virtual environment is prepended to PYTHONPATH all packages installed in the virtual environment are automatically given higher precedence over existing packages of the same name found in the LCG view.
    • If a package named awkward is found in the venv virtual environment and in the LCG view, import awkward with import it from the virtual environment.
  • Python packages not installed in the venv virtual environment but installed in the LCG view (e.g. ROOT, XRootD) can still be accessed inside of the virtual environment.
    • N.B.: This is considered a "advantage" loosely, as it is only happening as a side effect of the isolation of the virtual environment being broken by the LCG view's PYTHONPATH manipulation.
  • Through additional manipulation of PYTHONPATH and PATH with the added cvmfs-venv-rebase functionality, any software added to PATH or PYTHONPATH (e.g., by CVMFS or ATLAS software) while the virtual environment is active will persist on PATH or PYTHONPATH when the virtual environment is deactivated, allowing for a smoother interactive experience.
    • Example: If you want to use rucio both inside and outside of the virtual environment you can set it up either before sourcing the bin/activate script

console $ setupATLAS -3 $ lsetup "views LCG_105 x86_64-el9-gcc12-opt" $ lsetup "rucio -w" # PYTHONPATH is altered by lsetup $ command -v rucio # rucio is found $ . venv/bin/activate (venv) $ command -v rucio # rucio is found (venv) $ deactivate $ command -v rucio # rucio is found

or after, as cvmfs-venv-rebase will update path variables and keep the virtual environment's directory trees at the heads.

console $ setupATLAS -3 $ lsetup "views LCG_105 x86_64-el9-gcc12-opt" $ . venv/bin/activate (venv) $ lsetup "rucio -w" # PYTHONPATH is altered by lsetup (venv) $ command -v rucio # rucio is found (venv) $ deactivate # deactivate calls cvmfs-venv-rebase to persist non-venv paths $ command -v rucio # rucio is found

Disadvantages

  • The isolation of a Python virtual environment is not recovered. Python packages installed in the LCG view can still be accessed inside of the virtual environment. This can result in packages from the LCG view meeting requirements of other dependencies during a package install or upgrade (depending on the upgrade strategy) and installing older versions then expected.
    • Suggestion: When installing or upgrading with pip use the --ignore-installed flag.
    • Suggestion: When using pip list to inspect installed packages, use the --local flag to not list packages from the LCG view. (venv) $ python -m pip list --local
  • As the Python version tied to the virtual environment is provided by LCG view, any changes to the LCG view version require setting up the Python virtual environment from scratch.
    • It is highly advised that your environment be controlled with strict requirements.txt and lock files (e.g. generated from pip-tools using pip-compile) making it reproducible and easy to setup again as a consequence.
    • Example: console (venv) $ python -m pip install --upgrade pip-tools (venv) $ echo "scipy==1.11.3" > requirements.txt # define high level requirements (venv) $ pip-compile --generate-hashes --output-file requirements.lock requirements.txt # Generate full environment lock file (venv) $ python -m pip install --no-deps --require-hashes --only-binary :all: --requirement requirements.lock # secure-install for reproducibility
  • Having all of the environment manipulation happen inside of the venv's bin/activate script means that the virtual environment needs to be activated after any LCG view or ATLAS software (which make PYTHONPATH not empty) to trigger PYTHONPATH manipulation. This essentially means that the virtual environment must not be activated first in any setup script.

Citation

The preferred BibTeX entry for citation of cvmfs-venv is

@software{cvmfs-venv, author = {Matthew Feickert}, title = "{cvmfs-venv: v0.0.7}", version = {0.0.7}, doi = {10.5281/zenodo.7751033}, url = {https://doi.org/10.5281/zenodo.7751033}, note = {https://github.com/matthewfeickert/cvmfs-venv/releases/tag/v0.0.7} }

Owner

  • Name: Matthew Feickert
  • Login: matthewfeickert
  • Kind: user
  • Location: Lafayette, Colorado
  • Company: University of Wisconsin-Madison

Research scientist in high energy physics and data science at University of Wisconsin-Madison working on LHC physics with the ATLAS experiment and IRIS-HEP.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "Please cite the following works when using this software."
type: software
authors:
- family-names: "Feickert"
  given-names: "Matthew"
  orcid: "https://orcid.org/0000-0003-4124-7862"
  affiliation: "University of Wisconsin-Madison"
title: "cvmfs-venv: v0.0.7"
version: 0.0.7
doi: 10.5281/zenodo.7751033
repository-code: "https://github.com/matthewfeickert/cvmfs-venv/releases/tag/v0.0.7"
keywords:
  - python
  - cvmfs
  - physics
license: "MIT"

GitHub Events

Total
  • Watch event: 3
  • Delete event: 1
  • Push event: 2
  • Pull request event: 1
  • Create event: 2
Last Year
  • Watch event: 3
  • Delete event: 1
  • Push event: 2
  • Pull request event: 1
  • Create event: 2

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 46
  • Total Committers: 1
  • Avg Commits per committer: 46.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 9
  • Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Matthew Feickert m****t@c****h 46
Committer Domains (Top 20 + Academic)
cern.ch: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 9
  • Total pull requests: 41
  • Average time to close issues: 2 months
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 1
  • Total pull request authors: 2
  • Average comments per issue: 1.22
  • Average comments per pull request: 0.12
  • Merged pull requests: 39
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 6
  • Average time to close issues: 14 days
  • Average time to close pull requests: 12 minutes
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • matthewfeickert (9)
Pull Request Authors
  • matthewfeickert (44)
  • AlexanderHeidelbach (4)
Top Labels
Issue Labels
enhancement (2) bug (1) documentation (1) wontfix (1)
Pull Request Labels
documentation (24) enhancement (13) fix (9) ci (5) chore (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v1 composite