PyLithics

PyLithics: A Python package for stone tool analysis - Published in JOSS (2022)

https://github.com/alan-turing-institute/palaeoanalytics

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
    Organization alan-turing-institute has institutional domain (turing.ac.uk)
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

hut23 hut23-530

Scientific Fields

Mathematics Computer Science - 84% confidence
Engineering Computer Science - 60% confidence
Last synced: 6 months ago · JSON representation ·

Repository

Repository for the Paleoanalytics project.

Basic Info
Statistics
  • Stars: 18
  • Watchers: 4
  • Forks: 1
  • Open Issues: 10
  • Releases: 1
Topics
hut23 hut23-530
Created over 5 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Citation

README.md

Welcome to Palaeoanalytics!

Repository for the Palaeoanalytics project. A collaboration between The Alan Turing Institute and the University of Cambridge.

License: GPL v3 Build Status

Table of Contents:

📖 About the project

Archaeologists have long used stone tools (lithics) to reconstruct the behavior of prehistoric hominins. While techniques have become more quantitative, there still remain barriers to optimizing data retrieval. Machine learning and computer vision approaches can be developed to extract quantitative and trait data from lithics, photographs and drawings. PyLithics has been developed to capture data from 2D line drawings, focusing on the size, shape and technological attributes of flakes.

PyLithicsis an open-source, free for use software package for processing lithic artefact illustrations scanned from the literature. This tool accurately identifies, outlines, and computes lithic shape and linear measures, and returns user ready data. It has been optimized for feature extraction and measurement using a number of computer vision techniques including pixel intensity thresholding, edge detection, contour finding, custom template matching and image kernels. On both conventional and modern drawings, PyLithics can identify and label platform, lateral, dorsal, and ventral surfaces, as well as individual dorsal surface scar shape, size, orientation, diversity, number, and flaking order. Complete size and shape metrics of individual scars and whole flakes can be calculated and recorded. Orientation and flaking direction of dorsal scars can also be calculated. The resulting data can be used for metrical analysis, extracting features indicative of typologies and technological processes. Data output can easily be employed to explore patterns of variation within and between assemblages.

👥 The team

These are the members of the Palaeoanalytics team as updated August 2021:

| Name | Role | email | Github | | --- | --- | --- | --- | | Dr. Jason Gellis | Senior Data Scientist (Dimensions AI) & Researcher (University of Cambridge) | jg760@cam.ac.uk | @JasonGellis | | Dr. Camila Rangel Smith | Research Data Scientist (The Alan Turing Institute) | crangelsmith@turing.ac.uk |@crangelsmith | | Prof. Robert Foley | Principal Investigator (University of Cambridge) | raf10@cam.ac.uk| Rob-LCHES

📦 The PyLithics package

PyLithics: A Python package for stone tool analysis

Workflow

PyLithics is devised to work with illustrations of lithic objects common to publications in archaeology and anthropology. Lithic illustrators have established conventions regarding systems of artefact orientation and proportions. Lithics are normally drawn at a 1:1 scale, with the vertical axis orthogonal to the striking platform. A preferred method is to orient and illustrate various aspects of an artefact as a series of adjacent surfaces at 90-degree rotations from the principal view (usually the dorsal surface). Each aspect contains internal details (i.e., flake scars, cortical areas, etc.), indication of flaking direction radial lines (ripples), and the inclusion of a metric scale (for more information about lithic drawings see [@Martingell:1988]). Currently, PyLithics is optimized to work with unifacial flakes and bifaces, which are relatively flat, two-dimensional objects.

The inputs for PyLithics are images of lithic objects, images of their associated scales, and a metadata CSV file linking the two and giving the scale measurement in millimeters.

PyLithics processes the images with the following steps (and as illustrated in the schema below):

  1. Import and match images to associated image ID and scale image from CSV metadata file.
  2. Calculate a conversion of pixels to millimeters based on the size of the associated scale from CSV metadata file. If no scale is present, measurements will be in pixels
  3. Apply noise removal and contrast stretching to images to minimize pixel variation.
  4. Pixel intensity thresholding of images to prepare for contour finding.
  5. Apply edge detection and contour finding to thresholded images.
  6. Calculate metrics of lithic surface features from found contours -- area, length, breath, shape, number of vertices.
  7. Select contours which outline an entire lithic object's surfaces, or select contours of inner scars greater than 3% and less than 50% of the total size of its surface.
  8. Classify these selected surface contours as "Dorsal", "Ventral", "Lateral", and/or "Platform" depending on presence or absence. Assign scar contours to these surfaces.
  9. If present, find arrows using connected components and template matching, measure their angle and assign angle to associated scar.
  10. Plot resulting surface and scar contours on the original images for validation.
  11. Output data in a hierarchical json file detailing measurements of surface and scar contours.

Here you can find a schema of the workflow described above:

Installation

The PyLithics package requires Python 3.7 or greater. To install, start by creating a fresh virtual environment. python3 -m venv palaeo source palaeo/bin/activate For Windows OS: Set-ExecutionPolicy Unrestricted -Scope Process .\palaeo\Scripts\activate Clone the repository. git clone https://github.com/alan-turing-institute/Palaeoanalytics.git Enter the repository and check out a relevant branch if necessary (the develop branch contains the most up-to-date stable version of the code, but this branch is fast moving. If you want to have a stable and static version it is better to use main branch). cd Palaeoanalytics git checkout main Install 'PyLithics'. pip install . The pip install . command will call setup.py to install and configure PyLithics and its required packages listed in the requirements.txt file.

Note: For Mac users we recommend an OS versions=> 10.14 to prevent build problems.

Running PyLithics

PyLithics can be run via command line. The following command displays all available options: bash pylithics_run --help Output: ```bash usage: pylithics_run [-h] -c config-file [--inputdir INPUTDIR] [--outputdir OUTPUTDIR]

Run lithics characterization pipeline

optional arguments: -h, --help show this help message and exit -c config-file, --config config-file the model config file (YAML) --inputdir INPUTDIR path to input directory where images are found --outputdir OUTPUTDIR path to output directory to save processed image outputs --metadatafilename METADATAFILENAME CSV file with metadata on images and scales --get_arrows If a lithic contains arrows, find them and add them to the data

```

💫 Quickstart

In order to provide a quick start we have provided an example dataset including images, scales and metadata. You can run a quick analysis in this dataset by running: python pylithics_run -c configs/test_config.yml --input_dir data --output_dir output --metadata_filename meta_data.csv --get_arrows More generally, given that you have a set of lithics images (and its respective scales), you can run the PyLithics processing script with the following:

python pylithics_run -c configs/test_config.yml --input_dir <path_to_input_dir> --output_dir <path_to_output_directory> --metadata_filename meta_data.csv The images found in <path_to_input_dir> should follow this directory structure:

```bash inputdirectory ├── metadata.csv ├── images ├── lithicid1.png ├── lithicid2.png └── lithicid3.png . . . ├── lithicidn.png └── scales ├── scaleid1.png ├── scaleid2.png ├── scaleid3.png . . . └── scaleid4.png

```

where the mapping between the lithics and scale images should be available in the metadata CSV file.

This CSV file should have as a minimum the following 3 variables:

  • PA_ID: corresponding the lithics image id (the name of the image file),
  • scale_ID: The scale id (name of the scale image file)
  • PA_scale: The scale measurement (how many millimeters this scale represents).

An example of this table, where one scale corresponds to several images is the following:

| PAID | scaleID | PAscale | |------------|-----------|----------| | lithicid1 | scaleid1 | 5 | | lithicid2 | scaleid2 | 5 | | lithicid3 | scale_id3 | 5 |

Note

In the scenario that the scale and csv file are not available, it is possible to run the analysis only using the images with the command: pylithics_run -c configs/test_config.yml --input_dir <path_to_input_dir> --output_dir <path_to_output_directory> lithics image files must still be inside the '/images/' directory. However, all the measurements will only be provided as number of pixels.

The test_config.yml config file contains the following options:

```yaml

threshold: 0.01 contourparameter: 0.1 contourfullyconnected: 'low' minimumpixelscontour: 0.01 denoiseweight: 0.06 contrast_stretch: [4, 96]

```

The config is optimized to work with the images in an example dataset. If you want to use PyLithics with different styles of drawing you might have to modify this configuration file. You can modify or create your on config file and provide it to the CLI.

Output from PyLithics

Output images

Output images are saved in the output directory for validation of the data extraction process. An example of these images are the following:

Output data

The output dataset is a JSON file with data for the lithic objects found in an image. The data is hierarchically organized by lithic surfaces (ventral, dorsal, platform). For each surface the metrics from its scars are recorded. In this data output example, you can find the json file that results from running PyLithics on the above images, with comments to better understand the feature hierarchy and variables.

🖌 Drawing style for PyLithics

We are working hard in developing methods to cater to all styles of stone tools drawings. However, at the moment PyLithics works best with the following styles:

If you want to help us optimize PyLithics for different drawing styles we welcome your contributions!

👋 Contributing

We welcome contributions from anyone interested in the project. There are lots of ways to contribute, not just writing code. If you have ideas on how to extend/improve PyLithics do get in touch with members of the team via email. See our Contributor Guidelines to learn more about how you can contribute and how we work together as a community in GitHub. Because PyLithics' code changes frequently we test and deploy current builds and updates via Travis CI. Every time a change in the PyLithics code is pushed to the Palaeoanalytics repository, the travis.yml file, which contains essential information about the PyLithics programming environment and version, triggers these automated tests. TravisCI will automatically create a virtual build of PyLithics, and run the software to ensure that integration of new code is stable and functioning. Upon completion of tests, TravisCI will generate a virtual build pass or fail report and notify PyLithics team members and contributing developers of any issues. Because the process is automated there is no need for contributors to open a TravisCI account. All contributions will have to successfully pass these automated tests to be merged into the main branch.

Development and testing of PyLithics

PyLithics uses the pytest library for automated functional testing of code development and integration. These tests are easily run from the project directory using the command:

pytest -s

Citing PyLithics

DOI

📝 License

This software is licensed under the terms of the GNU General Public License v3.0 (GNU GPLv3).

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

JOSS Publication

PyLithics: A Python package for stone tool analysis
Published
January 29, 2022
Volume 7, Issue 69, Page 3738
Authors
Jason J. Gellis ORCID
The Alan Turing Institute, University of Cambridge, Leverhulme Centre for Human Evolutionary Studies
Camila Rangel Smith ORCID
The Alan Turing Institute
Robert A. Foley ORCID
The Alan Turing Institute, University of Cambridge, Leverhulme Centre for Human Evolutionary Studies
Editor
Nikoleta Glynatsi ORCID
Tags
Human evolution Archaeology Lithic analysis Prehistoric technology Computer vision

Citation (CITATION.cff)

cff-version: v1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Gellis"
  given-names: "Jason J."
  orcid: "https://orcid.org/0000-0002-9929-789X"
- family-names: "Rangel Smith"
  given-names: "Camila"
  orcid: "https://orcid.org/0000-0002-0227-836X"
- family-names: "Foley"
  given-names: "Robert A."
  orcid: "https://orcid.org/0000-0003-0479-3039"
title: "PyLithics: A Python package for stone tool analysis"
version: 1.0
date-released: 2021-08-27
url: "https://github.com/alan-turing-institute/Palaeoanalytics/"

GitHub Events

Total
  • Watch event: 1
  • Delete event: 22
  • Push event: 90
  • Create event: 6
Last Year
  • Watch event: 1
  • Delete event: 22
  • Push event: 90
  • Create event: 6

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 534
  • Total Committers: 3
  • Avg Commits per committer: 178.0
  • Development Distribution Score (DDS): 0.478
Past Year
  • Commits: 120
  • Committers: 1
  • Avg Commits per committer: 120.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
JasonGellis j****s@g****m 279
crangelsmith c****h@g****m 254
Rob-LCHES 7****S 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 35
  • Total pull requests: 77
  • Average time to close issues: 5 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 5
  • Total pull request authors: 2
  • Average comments per issue: 1.91
  • Average comments per pull request: 0.38
  • Merged pull requests: 72
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 2 minutes
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • crangelsmith (16)
  • MichaelHoltonPrice (7)
  • JasonGellis (6)
  • steko (4)
  • Rob-LCHES (2)
Pull Request Authors
  • JasonGellis (60)
  • crangelsmith (18)
Top Labels
Issue Labels
enhancement (4) documentation (3) bug (3) metrics (2)
Pull Request Labels

Dependencies

requirements.txt pypi
  • matplotlib *
  • numpy *
  • opencv-contrib-python *
  • pandas *
  • pytest *
  • pyyaml ==5.4.1
  • scikit-image *
  • scipy *
  • setuptools ==60.9.0