Science Score: 52.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization climerlab has institutional domain (www.cs.umsl.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary
Keywords
Repository
A cut-and-solve based feature selection for continous data
Basic Info
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
CSFS
A cut-and-solve based feature selection algorithm for continous data.
To Use
Configure the Makefile with the locaion of the IBL ILOG CPLEX and open mpi libraries and binary
Compile with the Makefile by navigating to the root directory and entering: make
Update configuration file
Run the program. For an example enter: mpirun -np 4 ./csfs
Configuration
DATAFILE - Tab seperated file where the first NUMCASES columns are cases and the next NUM_CTRLS columns are controls. The row indicate features.
RISK - Boolean that indicates if risk patterns (true) or protective patterns (false) should be found.
NUMCASES - The number of cases in DATAFILE.
NUMCTRLS - The number of controls in DATAFILE.
NUMEXPRS - The number of features in DATAFILE.
NUMHEADROWS - The number of header rows in DATA_FILE.
NUMHEADCOLS - The number of header columns in DATA_FILE.
PATTERN_SIZE - The number of marker states in the pattern(s) to be found.
USELOWERCUTOFF - A boolean indicates if the STARTINGLOWERBOUND is used. This lower bound can be updated during the search. USELOWERCUTOFF and USESOLUTIONPOOL_THRESHOLD cannot both be set to true at the same time.
USESOLUTIONPOOLTHRESHOLD - A boolean indicating of the SOLUTIONPOOLTHRESHOLD is to be used. When using the SOLUTIONPOOLTHRESHOLD, all patterns with a beter objective value will be retained. USELOWERCUTOFF and USESOLUTIONPOOLTHRESHOLD cannot both be set to true at the same time.
SOLUTIONPOOLTHRESHOLD - Threshold used to retaine solutions in the pool. Used if USESOLUTIONPOOL_THRESHOLD is true.
STARTINGLOWERBOUND - Starting lower bound when 0 USELOWERCUTOFF is true.
STARTINGUPPERBOUND - Starting upper bound when 0 USELOWERCUTOFF is true.
QUIET - Boolean that limits the output to only the most important items when true. QUIET and VERBOSE cannot both be set to true at the same time.
VERBOSE - Boolean that controls if all outputs are dispayed. QUIET and VERBOSE cannot both be set to true at the same time.
PRINTCPLEXOUTPUT - Boolean controlling if the CPLEX output is displayed.
TOL - Tolerance value for used for rounding decimals to integers in CPLEX.
IDPREFIX - Prefix of ID column in DATAFILE.
MISSINGSYMBOL - String used to indicate missing data in DATAFILE.
CPLEX_SEED - Seed provide to CPLEX.
USESPARSECONTRAINTS - Boolean that indicates if additional contraints for the sparse problem are used.
NUMBINS - The number of bins, from HIGH, NORM, LOW, NOTHIGH, and NOT_LOW to be used.
USE_HIGH - Set to true if HIGH variable will be used in pattern.
USE_NORM - Set to true if NORM variable will be used in pattern.
USE_LOW - Set to true if LOW variable will be used in pattern.
USENOTHIGH - Set to true if NOT_HIGH variable will be used in pattern.
USENOTLOW - Set to true if NOT_LOW variable will be used in pattern.
HIGHVALUE - Value in DATAFILE that indicates high expression.
NORMVALUE - Value in DATAFILE that indicates normal expression.
LOWVALUE - Value in DATAFILE that indicates low expression.
NOTLOWVALUE - Value in DATA_FILE that indicates not low expression.
NOTHIGHVALUE - Value in DATA_FILE that indicates not high expression.
SETNATRUE - Boolean used to indicate if missing data is treated as both high and low.
Output
*.log - File containing the collection of patterns
Notes
Requires Open MPI and IBM ILOG CPLEX
DATA_FILE should be tab seperate, the columns represent individuals and the rows represent features
Owner
- Name: Climer Lab
- Login: ClimerLab
- Kind: organization
- Location: Saint Louis Missouri
- Website: http://www.cs.umsl.edu/~climer/
- Repositories: 1
- Profile: https://github.com/ClimerLab
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software, please cite it as below." authors: - family-names: "Smith" given-names: "Ken" orcid: "https://orcid.org/0000-0002-7292-8268" title: "Cut-and-Solve Feature Selection" version: 1.0.0 license: BSD-3-Clause license-url: "https://github.com/ClimerLab/CSFS/blob/main/LICENSE" repository-code: "https://github.com/ClimerLab/CSFS/" keywords: - feature selection - youden j - Cut-and-Solve type: software url: "https://github.com/ClimerLab/CSFS/"