https://github.com/digital-grinnell/network-file-finder

A Python script desinged to recursively "find" a list of files in network storage.

https://github.com/digital-grinnell/network-file-finder

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 4 months ago · JSON representation

Repository

A Python script desinged to recursively "find" a list of files in network storage.

Basic Info
  • Host: GitHub
  • Owner: Digital-Grinnell
  • Language: Python
  • Default Branch: main
  • Size: 96.7 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

Network File Finder

A Python script designed to recursively "find" a list of files in network storage. The list of files will be a range of cells from the specified column of a specified Google Sheet. If an exact match can't be found, a "fuzzy" search is initiated to look for any matching filenames, but with a wildcard in place of the original extension. The effect can be seen in the sample output from a "fuzzy" result shown below.

``` Finding a filename match for '/Volumes/DGIngest/Reunion//grinnell-26849.jpg'... NONE FOUND! Starting 'fuzzy' search for: '/Volumes/DGIngest/Reunion//grinnell-26849.*'...


NOPE, could not even find a FUZZY match! 

```


Python

See Proper Python for applicable guidance when enabling Python parts of this workflow.

Quick Start

bash ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main› ╰─$ git pull Already up to date. ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main› ╰─$ python3 -m venv .venv ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●› ╰─$ source .venv/bin/activate (.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●› ╰─$ pip3 install -r python-requirements.txt

Then...

zsh (.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●› ╰─$ python3 network-file-finder.py --column A --worksheet https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit\#gid\=750240076 --tree-path /Volumes/exports/ --output-csv


Use

Runing python3 network-file-finder.py --help in the project directory generates this "help" guidance:

```zsh Number of arguments: 1 Argument List:' ['--help']

python3 network-file-finder.py --help --output-csv --keep-file-list --worksheet --column --tree-path --regex --skip-rows ```

Note: Do NOT add a slash to the end of the --tree-path!

--output-csv

Specifying this optional argument sets show = True causing the script to open/create a match-list.csv output file which will include a shorthand summary of each filename match outcome.

-c or --column

Specifying this optional argument sets column equal to the specified value causing the script to read target filenames from the corresponding column of the Google Sheet. column defaults to G (typically the OBJ column) if not specifed. If specified, -c or --column must be a single uppercase letter A through Z.

Sample

```zsh (.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●› ╰─$ python3 network-file-finder.py --column A --worksheet https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit#gid=750240076 --tree-path /Volumes/exports/ --output-csv /Users/mcfatem/GitHub/network-file-finder/.venv/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')

Number of arguments: 7 Argument List:' ['--column', 'A', '--worksheet', 'https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit#gid=750240076', '--tree-path', '/Volumes/exports/', '--output-csv']

No --regex specified, matching will consider ALL paths and files. Skipping match for 'objectid' in worksheet row 0

  1. Finding best fuzzy filename matches for 'grinnell3601OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['1', 'grinnell3601OBJ', False, '95', 'grinnell3601OBJ.tiff', '/Volumes/exports/college-life/OBJ']

  2. Finding best fuzzy filename matches for 'grinnell3611OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['2', 'grinnell3611OBJ', False, '95', 'grinnell3611OBJ.tiff', '/Volumes/exports/college-life/OBJ']

  3. Finding best fuzzy filename matches for 'grinnell3610OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['3', 'grinnell3610OBJ', False, '95', 'grinnell3610OBJ.tiff', '/Volumes/exports/college-life/OBJ'] ... ```

Owner

  • Name: Digital Grinnell
  • Login: Digital-Grinnell
  • Kind: user
  • Location: Grinnell, Iowa
  • Company: Grinnell College Libraries

GitHub Events

Total
Last Year

Dependencies

python-requirements.txt pypi
  • cachetools ==5.3.0
  • certifi ==2022.12.7
  • charset-normalizer ==3.1.0
  • colorama ==0.4.6
  • fuzzywuzzy ==0.18.0
  • google-auth ==2.16.3
  • google-auth-oauthlib ==1.0.0
  • gspread ==5.7.2
  • idna ==3.4
  • oauthlib ==3.2.2
  • pyasn1 ==0.4.8
  • pyasn1-modules ==0.2.8
  • requests ==2.28.2
  • requests-oauthlib ==1.3.1
  • rsa ==4.9
  • six ==1.16.0
  • urllib3 ==1.26.15