https://github.com/digital-grinnell/network-file-finder
A Python script desinged to recursively "find" a list of files in network storage.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary
Repository
A Python script desinged to recursively "find" a list of files in network storage.
Basic Info
- Host: GitHub
- Owner: Digital-Grinnell
- Language: Python
- Default Branch: main
- Size: 96.7 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Network File Finder
A Python script designed to recursively "find" a list of files in network storage. The list of files will be a range of cells from the specified column of a specified Google Sheet. If an exact match can't be found, a "fuzzy" search is initiated to look for any matching filenames, but with a wildcard in place of the original extension. The effect can be seen in the sample output from a "fuzzy" result shown below.
``` Finding a filename match for '/Volumes/DGIngest/Reunion//grinnell-26849.jpg'... NONE FOUND! Starting 'fuzzy' search for: '/Volumes/DGIngest/Reunion//grinnell-26849.*'...
NOPE, could not even find a FUZZY match!
```
Python
See Proper Python for applicable guidance when enabling Python parts of this workflow.
Quick Start
bash
╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main›
╰─$ git pull
Already up to date.
╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main›
╰─$ python3 -m venv .venv
╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●›
╰─$ source .venv/bin/activate
(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●›
╰─$ pip3 install -r python-requirements.txt
Then...
zsh
(.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●›
╰─$ python3 network-file-finder.py --column A --worksheet https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit\#gid\=750240076 --tree-path /Volumes/exports/ --output-csv
Use
Runing python3 network-file-finder.py --help in the project directory generates this "help" guidance:
```zsh Number of arguments: 1 Argument List:' ['--help']
python3 network-file-finder.py --help --output-csv --keep-file-list --worksheet
Note: Do NOT add a slash to the end of the --tree-path!
--output-csv
Specifying this optional argument sets show = True causing the script to open/create a match-list.csv output file which will include a shorthand summary of each filename match outcome.
-c or --column
Specifying this optional argument sets column equal to the specified value causing the script to read target filenames from the corresponding column of the Google Sheet. column defaults to G (typically the OBJ column) if not specifed. If specified, -c or --column must be a single uppercase letter A through Z.
Sample
```zsh (.venv) ╭─mcfatem@MAC02FK0XXQ05Q ~/GitHub/network-file-finder ‹main●› ╰─$ python3 network-file-finder.py --column A --worksheet https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit#gid=750240076 --tree-path /Volumes/exports/ --output-csv /Users/mcfatem/GitHub/network-file-finder/.venv/lib/python3.10/site-packages/fuzzywuzzy/fuzz.py:11: UserWarning: Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning warnings.warn('Using slow pure-python SequenceMatcher. Install python-Levenshtein to remove this warning')
Number of arguments: 7 Argument List:' ['--column', 'A', '--worksheet', 'https://docs.google.com/spreadsheets/d/17uNXLP5aTSCfYZ8FXBqTvDd-z0F19FJeAOK5TsCr-PI/edit#gid=750240076', '--tree-path', '/Volumes/exports/', '--output-csv']
No --regex specified, matching will consider ALL paths and files. Skipping match for 'objectid' in worksheet row 0
Finding best fuzzy filename matches for 'grinnell3601OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['1', 'grinnell3601OBJ', False, '95', 'grinnell3601OBJ.tiff', '/Volumes/exports/college-life/OBJ']
Finding best fuzzy filename matches for 'grinnell3611OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['2', 'grinnell3611OBJ', False, '95', 'grinnell3611OBJ.tiff', '/Volumes/exports/college-life/OBJ']
Finding best fuzzy filename matches for 'grinnell3610OBJ'... No significant --regex limit specified. !!! Found BEST matching file: ['3', 'grinnell3610OBJ', False, '95', 'grinnell3610OBJ.tiff', '/Volumes/exports/college-life/OBJ'] ... ```
Owner
- Name: Digital Grinnell
- Login: Digital-Grinnell
- Kind: user
- Location: Grinnell, Iowa
- Company: Grinnell College Libraries
- Website: https://digital.grinnell.edu
- Repositories: 30
- Profile: https://github.com/Digital-Grinnell
GitHub Events
Total
Last Year
Dependencies
- cachetools ==5.3.0
- certifi ==2022.12.7
- charset-normalizer ==3.1.0
- colorama ==0.4.6
- fuzzywuzzy ==0.18.0
- google-auth ==2.16.3
- google-auth-oauthlib ==1.0.0
- gspread ==5.7.2
- idna ==3.4
- oauthlib ==3.2.2
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- requests ==2.28.2
- requests-oauthlib ==1.3.1
- rsa ==4.9
- six ==1.16.0
- urllib3 ==1.26.15