PyLithics
PyLithics: A Python package for stone tool analysis - Published in JOSS (2022)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in JOSS metadata -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
✓Institutional organization owner
Organization alan-turing-institute has institutional domain (turing.ac.uk) -
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
Repository for the Paleoanalytics project.
Basic Info
- Host: GitHub
- Owner: alan-turing-institute
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://www.turing.ac.uk/research/research-projects/palaeoanalytics
- Size: 218 MB
Statistics
- Stars: 18
- Watchers: 4
- Forks: 1
- Open Issues: 10
- Releases: 1
Topics
Metadata Files
README.md
Welcome to Palaeoanalytics!
Repository for the Palaeoanalytics project. A collaboration between The Alan Turing Institute and the University of Cambridge.
Table of Contents:
- Table of Contents:
- 📖 About the project
- 👥 The team
- 📦 The
PyLithicspackage - 🖌 Drawing style for
PyLithics - 👋 Contributing
- Development and testing of
PyLithics - Citing
PyLithics - 📝 License
📖 About the project
Archaeologists have long used stone tools (lithics) to reconstruct the behavior of prehistoric hominins. While techniques
have become more quantitative, there still remain barriers to optimizing data retrieval. Machine learning and computer
vision approaches can be developed to extract quantitative and trait data from lithics, photographs and drawings. PyLithics
has been developed to capture data from 2D line drawings, focusing on the size, shape and technological attributes of flakes.
PyLithicsis an open-source, free for use software package for processing lithic artefact illustrations scanned from
the literature. This tool accurately identifies, outlines, and computes lithic shape and linear measures, and returns user
ready data. It has been optimized for feature extraction and measurement using a number of computer vision techniques
including pixel intensity thresholding, edge detection, contour finding, custom template matching and image kernels.
On both conventional and modern drawings, PyLithics can identify and label platform, lateral, dorsal, and ventral surfaces,
as well as individual dorsal surface scar shape, size, orientation, diversity, number, and flaking order. Complete size
and shape metrics of individual scars and whole flakes can be calculated and recorded. Orientation and flaking direction
of dorsal scars can also be calculated. The resulting data can be used for metrical analysis, extracting features indicative
of typologies and technological processes. Data output can easily be employed to explore patterns of variation within and between assemblages.
👥 The team
These are the members of the Palaeoanalytics team as updated August 2021:
| Name | Role | email | Github | | --- | --- | --- | --- | | Dr. Jason Gellis | Senior Data Scientist (Dimensions AI) & Researcher (University of Cambridge) | jg760@cam.ac.uk | @JasonGellis | | Dr. Camila Rangel Smith | Research Data Scientist (The Alan Turing Institute) | crangelsmith@turing.ac.uk |@crangelsmith | | Prof. Robert Foley | Principal Investigator (University of Cambridge) | raf10@cam.ac.uk| Rob-LCHES
📦 The PyLithics package
PyLithics: A Python package for stone tool analysis
Workflow
PyLithics is devised to work with illustrations of lithic objects common to publications in archaeology and anthropology. Lithic illustrators have established conventions regarding systems of artefact orientation and proportions. Lithics are normally drawn at a 1:1 scale, with the vertical axis orthogonal to the striking platform. A preferred method is to orient and illustrate various aspects of an artefact as a series of adjacent surfaces at 90-degree rotations from the principal view (usually the dorsal surface). Each aspect contains internal details (i.e., flake scars, cortical areas, etc.), indication of flaking direction radial lines (ripples), and the inclusion of a metric scale (for more information about lithic drawings see [@Martingell:1988]). Currently, PyLithics is optimized to work with unifacial flakes and bifaces, which are relatively flat, two-dimensional objects.
The inputs for PyLithics are images of lithic objects, images of their associated scales, and a metadata CSV file linking the two and giving the scale measurement in millimeters.
PyLithics processes the images with the following steps (and as illustrated in the schema below):
- Import and match images to associated image ID and scale image from CSV metadata file.
- Calculate a conversion of pixels to millimeters based on the size of the associated scale from CSV metadata file. If no scale is present, measurements will be in pixels
- Apply noise removal and contrast stretching to images to minimize pixel variation.
- Pixel intensity thresholding of images to prepare for contour finding.
- Apply edge detection and contour finding to thresholded images.
- Calculate metrics of lithic surface features from found contours -- area, length, breath, shape, number of vertices.
- Select contours which outline an entire lithic object's surfaces, or select contours of inner scars greater than 3% and less than 50% of the total size of its surface.
- Classify these selected surface contours as "Dorsal", "Ventral", "Lateral", and/or "Platform" depending on presence or absence. Assign scar contours to these surfaces.
- If present, find arrows using connected components and template matching, measure their angle and assign angle to associated scar.
- Plot resulting surface and scar contours on the original images for validation.
- Output data in a hierarchical json file detailing measurements of surface and scar contours.
Here you can find a schema of the workflow described above:

Installation
The PyLithics package requires Python 3.7 or greater. To install, start by creating a fresh virtual environment.
python3 -m venv palaeo
source palaeo/bin/activate
For Windows OS:
Set-ExecutionPolicy Unrestricted -Scope Process
.\palaeo\Scripts\activate
Clone the repository.
git clone https://github.com/alan-turing-institute/Palaeoanalytics.git
Enter the repository and check out a relevant branch if necessary (the develop branch contains the most up-to-date stable version of the code, but this branch is fast moving.
If you want to have a stable and static version it is better to use main branch).
cd Palaeoanalytics
git checkout main
Install 'PyLithics'.
pip install .
The pip install . command will call setup.py to install and configure PyLithics and its required packages listed in the requirements.txt file.
Note: For Mac users we recommend an OS versions=> 10.14 to prevent build problems.
Running PyLithics
PyLithics can be run via command line. The following command displays all available options:
bash
pylithics_run --help
Output:
```bash
usage: pylithics_run [-h] -c config-file [--inputdir INPUTDIR]
[--outputdir OUTPUTDIR]
Run lithics characterization pipeline
optional arguments: -h, --help show this help message and exit -c config-file, --config config-file the model config file (YAML) --inputdir INPUTDIR path to input directory where images are found --outputdir OUTPUTDIR path to output directory to save processed image outputs --metadatafilename METADATAFILENAME CSV file with metadata on images and scales --get_arrows If a lithic contains arrows, find them and add them to the data
```
💫 Quickstart
In order to provide a quick start we have provided an example dataset including images, scales and metadata. You
can run a quick analysis in this dataset by running:
python
pylithics_run -c configs/test_config.yml --input_dir data --output_dir output --metadata_filename meta_data.csv --get_arrows
More generally, given that you have a set of lithics images (and its respective scales), you can run the PyLithics processing script with the following:
python
pylithics_run -c configs/test_config.yml --input_dir <path_to_input_dir> --output_dir <path_to_output_directory> --metadata_filename meta_data.csv
The images found in <path_to_input_dir> should follow this directory structure:
```bash inputdirectory ├── metadata.csv ├── images ├── lithicid1.png ├── lithicid2.png └── lithicid3.png . . . ├── lithicidn.png └── scales ├── scaleid1.png ├── scaleid2.png ├── scaleid3.png . . . └── scaleid4.png
```
where the mapping between the lithics and scale images should be available in the metadata CSV file.
This CSV file should have as a minimum the following 3 variables:
- PA_ID: corresponding the lithics image id (the name of the image file),
- scale_ID: The scale id (name of the scale image file)
- PA_scale: The scale measurement (how many millimeters this scale represents).
An example of this table, where one scale corresponds to several images is the following:
| PAID | scaleID | PAscale | |------------|-----------|----------| | lithicid1 | scaleid1 | 5 | | lithicid2 | scaleid2 | 5 | | lithicid3 | scale_id3 | 5 |
Note
In the scenario that the scale and csv file are not available, it is possible to run the analysis only using the images
with the command:
pylithics_run -c configs/test_config.yml --input_dir <path_to_input_dir> --output_dir <path_to_output_directory>
lithics image files must still be inside the '
The test_config.yml config file contains the following options:
```yaml
threshold: 0.01 contourparameter: 0.1 contourfullyconnected: 'low' minimumpixelscontour: 0.01 denoiseweight: 0.06 contrast_stretch: [4, 96]
```
The config is optimized to work with the images in an example dataset. If you want to use PyLithics with different styles of
drawing you might have to modify this configuration file. You can modify or create your on config file and provide it to the CLI.
Output from PyLithics
Output images
Output images are saved in the output directory for validation of the data extraction process. An example of these images are the following:
Output data
The output dataset is a JSON file with data for the lithic objects found in an image. The data is
hierarchically organized by lithic surfaces (ventral, dorsal, platform). For each
surface the metrics from its scars are recorded. In this data output example, you can find the json file
that results from running PyLithics on the above images, with comments to better understand the feature hierarchy and variables.
🖌 Drawing style for PyLithics
We are working hard in developing methods to cater to all styles of stone tools drawings. However, at the moment PyLithics
works best with the following styles:

If you want to help us optimize PyLithics for different drawing styles we welcome your contributions!
👋 Contributing
We welcome contributions from anyone interested in the project. There are lots of ways to contribute, not just writing code.
If you have ideas on how to extend/improve PyLithics do get in touch with members of the team via email. See our
Contributor Guidelines to learn more about how you can contribute and how we work together as a
community in GitHub. Because PyLithics' code changes frequently we test and deploy current builds and updates via Travis CI.
Every time a change in the PyLithics code is pushed to the Palaeoanalytics repository, the travis.yml
file, which contains essential information about the PyLithics programming environment and version, triggers these automated
tests. TravisCI will automatically create a virtual build of PyLithics, and run the software to ensure that integration
of new code is stable and functioning. Upon completion of tests, TravisCI will generate a virtual build pass or fail
report and notify PyLithics team members and contributing developers of any issues. Because the process is automated
there is no need for contributors to open a TravisCI account. All contributions will have to successfully pass these automated
tests to be merged into the main branch.
Development and testing of PyLithics
PyLithics uses the pytest library for automated functional testing of code
development and integration. These tests are easily run from the project directory using the command:
pytest -s
Citing PyLithics
📝 License
This software is licensed under the terms of the GNU General Public License v3.0 (GNU GPLv3).
Owner
- Name: The Alan Turing Institute
- Login: alan-turing-institute
- Kind: organization
- Email: info@turing.ac.uk
- Website: https://turing.ac.uk
- Repositories: 477
- Profile: https://github.com/alan-turing-institute
The UK's national institute for data science and artificial intelligence.
JOSS Publication
PyLithics: A Python package for stone tool analysis
Authors
Tags
Human evolution Archaeology Lithic analysis Prehistoric technology Computer visionCitation (CITATION.cff)
cff-version: v1.0.0 message: "If you use this software, please cite it as below." authors: - family-names: "Gellis" given-names: "Jason J." orcid: "https://orcid.org/0000-0002-9929-789X" - family-names: "Rangel Smith" given-names: "Camila" orcid: "https://orcid.org/0000-0002-0227-836X" - family-names: "Foley" given-names: "Robert A." orcid: "https://orcid.org/0000-0003-0479-3039" title: "PyLithics: A Python package for stone tool analysis" version: 1.0 date-released: 2021-08-27 url: "https://github.com/alan-turing-institute/Palaeoanalytics/"
GitHub Events
Total
- Watch event: 1
- Delete event: 22
- Push event: 90
- Create event: 6
Last Year
- Watch event: 1
- Delete event: 22
- Push event: 90
- Create event: 6
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| JasonGellis | j****s@g****m | 279 |
| crangelsmith | c****h@g****m | 254 |
| Rob-LCHES | 7****S | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 35
- Total pull requests: 77
- Average time to close issues: 5 months
- Average time to close pull requests: 23 days
- Total issue authors: 5
- Total pull request authors: 2
- Average comments per issue: 1.91
- Average comments per pull request: 0.38
- Merged pull requests: 72
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 2 minutes
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- crangelsmith (16)
- MichaelHoltonPrice (7)
- JasonGellis (6)
- steko (4)
- Rob-LCHES (2)
Pull Request Authors
- JasonGellis (60)
- crangelsmith (18)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- matplotlib *
- numpy *
- opencv-contrib-python *
- pandas *
- pytest *
- pyyaml ==5.4.1
- scikit-image *
- scipy *
- setuptools ==60.9.0
