https://github.com/bin-cao/tcgpr
[NPJ Com Mat 2023 | Small 2024] Machine Learning Algorithm : outlier identifying, feature selection
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: wiley.com, nature.com -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Repository
[NPJ Com Mat 2023 | Small 2024] Machine Learning Algorithm : outlier identifying, feature selection
Basic Info
Statistics
- Stars: 14
- Watchers: 4
- Forks: 5
- Open Issues: 0
- Releases: 2
Topics
Metadata Files
README.md
TCGPR
A Python library for divide-and-conquer (TCGPR) - an efficient strategy tailored for small datasets in materials science and beyond.
📖 Citation
If you use this code in your research, please cite the following papers:
Li T., Cao B., Su T., ... Feng L., Zhang T. Machine Learning-Engineered Nanozyme System for Synergistic Anti-Tumor Ferroptosis/Apoptosis Therapy, SMALL Link to paper
Wei Q., Cao B., Yuan H., ... Dong Z., Zhang T. Divide and conquer: Machine learning accelerated design of lead-free solder alloys with high strength and high ductility, npj Computational Materials Link to paper
📜 Project History
2022: TCGPR was first proposed and implemented, in collaboration with Mr. Hao Yuan (experiments) and Mr. Qinghua Wei (experiments). It was successfully applied to the optimization of lead-free solder alloys. → Published in npj Computational Materials News link
2024: After two years of development, TCGPR was enhanced with sequential feature selection and outlier detection. In collaboration with Mr. Tianliang Li (experiments) and Mr. Tianhao Su (computations), it was applied to anti-tumor ferroptosis studies. → Published in SMALL News link
🧠 Algorithm Overview
For an in-depth explanation of the algorithm, see the TCGPR Introduction PDF.
🔧 Installation
Install TCGPR via PyPI:
bash
pip install PyTcgpr
To verify the installation:
bash
pip show PyTcgpr
To upgrade to the latest version:
bash
pip install --upgrade PyTcgpr
🚀 Getting Started
1. Data Screening | Partition Mode
```python from PyTcgpr import TCGPR
TCGPR.fit( filePath = "data.csv", initialsetcap = 3, samplingcap = 2, upsearch = 500, CV = 'LOOCV', Task = 'Partition' ) ```
2. Data Screening | Identification Mode
```python from PyTcgpr import TCGPR
TCGPR.fit( filePath = "data.csv", samplingcap = 2, upsearch = 500, CV = 'LOOCV', Task = 'Identification' ) ```
3. Feature Selection Mode
```python from PyTcgpr import TCGPR
TCGPR.fit( filePath = "data.csv", Mission = 'FEATURE', samplingcap = 2, upsearch = 500, CV = 'LOOCV' ) ```
⚙️ Parameters
```python :param Mission: str, default='DATA' - 'DATA': Perform data screening - 'FEATURE': Perform feature selection
:param filePath: str Path to input dataset in CSV format
:param initialsetcap: int or list Initial subset size or index list for Partition mode
:param sampling_cap: int, default=1 Number of items selected per iteration
:param measure: str, default='Pearson' Correlation type: 'Pearson' or 'Determination'
:param ratio: float Tolerance threshold for correlation-based filtering
:param target: int, default=1 Number of targets in regression (for feature selection)
:param weight: float, default=0.2 Weight coefficient in GGMF score calculation
:param up_search: int, default=500 Upper limit for search iterations
:param exploit_coef: float, default=2 Variance constraint for EI acquisition function
:param exploit_model: bool, default=False If True, disables GGMF and uses only R values
:param CV: int or str, default=10 Cross-validation: integer (e.g., 5, 10) or 'LOOCV' ```
📤 Output
After running, TCGPR outputs a CSV file with the remaining samples:
bash
Dataset_remained_by_TCGPR.csv
📦 Source Code
Compatible with Windows, Linux, and macOS.
🧾 Patent
👨🔧 Maintainer
Maintained by Bin Cao 📧 Email: bcao686@connect.hkust-gz.edu.cn Feel free to open an issue or contact me for any questions, bugs, or collaboration opportunities.
🤝 Contributing
Contributions and suggestions are welcome!
- Report bugs or request features via GitHub Issues
- Submit a pull request with improvements or fixes
- Interested in research collaboration? Please get in touch!
Owner
- Name: 曹斌 | Bin CAO
- Login: Bin-Cao
- Kind: user
- Location: Shanghai
- Company: Shanghai University
- Repositories: 5
- Profile: https://github.com/Bin-Cao
Machine learning | Materials Informatics|Mechanics
GitHub Events
Total
- Watch event: 5
- Push event: 13
- Fork event: 1
Last Year
- Watch event: 5
- Push event: 13
- Fork event: 1
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 2
-
Total downloads:
- pypi 166 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 18
- Total maintainers: 1
pypi.org: pytcgpr
Tree-Classifier for gaussian process model (TCGPR) is a data preprocessing algorithm based on the Gaussian correlation among data.
- Homepage: https://github.com/Bin-Cao/TCGPR
- Documentation: https://pytcgpr.readthedocs.io/
- License: MIT License
-
Latest release: 1.3.1
published almost 2 years ago
Rankings
Maintainers (1)
pypi.org: tcgpr
Tree-Classifier for gaussian process model (TCGPR) is a data preprocessing algorithm based on the Gaussian correlation among data.
- Homepage: https://github.com/Bin-Cao/TCGPR
- Documentation: https://tcgpr.readthedocs.io/
- License: MIT License
-
Latest release: 1.5.0
published almost 3 years ago