cytominer-database

[DEPRECATED] A package for storing morphological profiling data.

https://github.com/cytomining/cytominer-database

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
4 of 9 committers (44.4%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary

Keywords

database microscopy profiling

Keywords from Contributors

cellprofiler cytomining guide handbook morphological-profiling

Last synced: 10 months ago · JSON representation

Repository

[DEPRECATED] A package for storing morphological profiling data.

Basic Info

Host: GitHub
Owner: cytomining
License: other
Language: Python
Default Branch: master
Homepage:
Size: 17.3 MB

Statistics

Stars: 10
Watchers: 2
Forks: 11
Open Issues: 14
Releases: 13

Topics

database microscopy profiling

Created over 10 years ago · Last pushed over 1 year ago

Metadata Files

Readme License Citation

README.rst

This package is deprecated and will no longer be supported. Please use at your own risk!

==================
cytominer-database
==================

.. image:: https://travis-ci.org/cytomining/cytominer-database.svg?branch=master
    :target: https://travis-ci.org/cytomining/cytominer-database
    :alt: Build Status

.. image:: https://readthedocs.org/projects/cytominer-database/badge/?version=latest
    :target: http://cytominer-database.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status

cytominer-database provides command-line tools for organizing measurements extracted from images.

Software tools such as CellProfiler can extract hundreds of measurements from millions of cells in a typical
high-throughput imaging experiment. The measurements are stored across thousands of CSV files.

cytominer-database helps you organize these data into a single database backend, such as SQLite.

Why cytominer-database?
=======================
While tools like CellProfiler can store measurements directly in databases, it is usually infeasible to create a
centralized database in which to store these measurements. A more scalable approach is to create a set of CSVs per
batch of images, and then later merge these CSVs.

cytominer-database ingest reads these CSVs, checks for errors, then ingests
them into a database backend. The default backend is `SQLite`.

.. code-block:: sh

	cytominer-database ingest source_directory sqlite:///backend.sqlite -c ingest_config.ini

will ingest the CSV files nested under source_directory into a `SQLite` backend
To select the `Parquet` backend add a `--parquet` flag:

.. code-block:: sh

	cytominer-database ingest source_directory sqlite:///backend.sqlite -c ingest_config.ini --parquet

The ingest_config.ini file then needs to be adjusted to contain the `Parquet` specifications.

How to use the configuration file
=================================
The configuration file ingest_config.ini must be located in the source_directory and can be modified to specify the ingestion.
There are three different sections.

The [filenames] section
-----------------------

.. code-block::

  [filenames]
  image   = image.csv      #or: Image.csv
  object  = object.csv     #or: Object.csv

cytominer-database is currently limited to the following measurement file kinds:
Cells.csv, Cytoplasm.csv, Nuclei.csv, Image.csv, Object.csv.
The [filenames] section in the configuration file saves the correct basename of existing measurement files.
This may be important in the case of inconsistent capitalization.

The [database_engine] section
-----------------------------

.. code-block::

  [ingestion_engine]
  engine = Parquet      #or: SQLite

The [database_engine] section specifies the backend.
Possible key-value pairs are:
**engine** = *SQLite* or **engine** = *Parquet*.

The [schema] section
--------------------

.. code-block::

 [schema]
 reference_option = sample         #or: path/to/reference/folder relative to source_directory
 ref_fraction     = 1              #or: any decimal value in [0, 1]
 type_conversion  = int2float      #or: all2string

The [schema] section specifies how to manage incompatibilities in the table schema of the files.
In that case, a Parquet file is fixed to a schema with which it was first opened, i.e. by the first file which is written (the reference file).
To append the data of all .csv files of that file-kind, it is important to assure the reference file satisfies certain incompatibility requirements.
For example, make sure the reference file does not miss any columns and all existing files can be automatically converted to the reference schema.
Note: This section is used only if the files are ingested to Parquet format and was developed to handle the special cases in which tables that cannot be concatenated automatically.

There are two options for the key **reference_option**:

The first option is to create a designated folder containing one .csv reference file for every kind of file ("Cytoplasm.csv", "Nuclei.csv", ...) and save the folder path in the config file as **reference_option** = *path/to/reference/folder*, where the path is relative to the source_directory from the ingest command.
These reference files' schema will determine the schema of the Parquet file into which all .csv files of its kind will be ingested.


**This option relies on manual selection, hence the chosen reference files must be checked explicitly: Make sure the .csv files are complete in the number of columns and contain no NaN values.**

Alternatively, the reference files can be found automatically from a sampled subset of all existing files.
This is the case if **reference_option** = *sample* is set.
A subset of all files is sampled uniformly at random and the first table with the maximum number of columns among all sampled .csv files is chosen as the reference table.
If this case, an additional key **ref_fraction** can be set, which specifies the fraction of files sampled among all files.
The default value is **ref_fraction** = *1* , for which all tables are compared in width.
This key is only used if "reference_option=sample".

Lastly, the key **type_conversion** determines how the schema types are handled in the case of disagreement.
The default value is *int2float*, for which all integer columns are converted to floats.
This has been proven helpful for trivial columns (0-valued column), which may be of "int" type and cannot be written into the same table as non-trivial files with non-zero float values.
Automatic type conversion can be avoided by converting all values to string-type.
This can be done by setting **type_conversion** = *all2string*.
However, the loss of type information might be a disadvantage in downstream tasks.

Owner

Name: cytomining
Login: cytomining
Kind: organization

Repositories: 27
Profile: https://github.com/cytomining

GitHub Events

Total

Fork event: 1

Last Year

Fork event: 1

Name	Email	Commits
Shantanu Singh	s**h@b**g	372
diskontinuum	2****m	145
Allen Goodman	a**n@i**m	64
mcquin	m**n@b**g	45
gwaygenomics	g**y@g**m	22
Claire McQuin	m**l@g**m	8
Dave Bunten	d**n@c**u	2
Tim Becker	t**r@b**g	1
Beth Cimini	b****7	1

Issues and Pull Requests

Last synced: 12 months ago

All Time

Total issues: 49
Total pull requests: 54
Average time to close issues: 3 months
Average time to close pull requests: about 2 months
Total issue authors: 7
Total pull request authors: 7
Average comments per issue: 1.49
Average comments per pull request: 1.28
Merged pull requests: 47
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 2
Average time to close issues: N/A
Average time to close pull requests: about 2 hours
Issue authors: 1
Pull request authors: 1
Average comments per issue: 4.0
Average comments per pull request: 1.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

shntnu (20)
mcquin (15)
bethac07 (6)
gwaybio (3)
diskontinuum (3)
cells2numbers (1)
ManonFepi (1)

Pull Request Authors

mcquin (26)
shntnu (13)
gwaybio (6)
diskontinuum (5)
d33bs (4)
bethac07 (2)
cells2numbers (1)

Top Labels

Issue Labels

Bug (12) Enhancement (11) Feature (3) Notes (1)

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- pypi 1,693 last-month

Total dependent packages: 2
(may contain duplicates)
Total dependent repositories: 1
(may contain duplicates)
Total versions: 23
Total maintainers: 6

proxy.golang.org: github.com/cytomining/cytominer-database

Documentation: https://pkg.go.dev/github.com/cytomining/cytominer-database#section-documentation
License: other
Latest release: v0.3.5
published over 1 year ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 9.1%

Average: 9.6%

Dependent repos count: 10.2%

Last synced: 11 months ago

pypi.org: cytominer-database

Homepage: https://github.com/cytomining/cytominer-database
Documentation: https://cytominer-database.readthedocs.io/
License: BSD
Latest release: 0.3.5
published over 1 year ago

Versions: 12
Dependent Packages: 1
Dependent Repositories: 1
Downloads: 1,693 Last month

Rankings

Dependent packages count: 4.8%

Downloads: 10.0%

Forks count: 10.9%

Average: 13.2%

Stargazers count: 18.5%

Dependent repos count: 21.6%

Maintainers (6)

bcimini cells2numbers gwaygenomics mcquin shntnu d33bs

Last synced: 10 months ago

conda-forge.org: cytominer_database

Software tools such as CellProfiler can extract hundreds of measurements from millions of cells in a typical high-throughput imaging experiment. The measurements are stored across thousands of CSV files. cytominer-database helps you organize these data into a single database backend, such as SQLite.

Homepage: https://github.com/cytomining/cytominer-database
License: BSD-3-Clause
Latest release: 0.3.4
published about 4 years ago

Versions: 2
Dependent Packages: 1
Dependent Repositories: 0

Rankings

Dependent packages count: 28.8%

Dependent repos count: 34.0%

Average: 39.1%

Forks count: 40.9%

Stargazers count: 52.6%

Last synced: 11 months ago

Dependencies

requirements.txt pypi

csvkit >=1.0.3
numpy >=1.17.0
pyarrow >=0.16.0
pytest >=3.2.2
sphinx >=1.6.4
sphinx_rtd_theme >=0.2.5b1

setup.py pypi

backports.tempfile >=1.0rc1
click >=6.7
configparser >=3.5.0
csvkit >=1.0.2
pandas >=0.20.3

cytominer-database

Science Score: 36.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.rst

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/cytomining/cytominer-database

Rankings

pypi.org: cytominer-database

Rankings

Maintainers (6)

conda-forge.org: cytominer_database

Rankings

Dependencies