segmentor
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.1%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: yujing1997
- License: mit
- Language: Python
- Default Branch: main
- Size: 7.53 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies
Semantic Segmentation (TiaToolBox)
How to install gdc-client for data transfer from TCGA data
conda create -n gdc-client-env python=3.8 -yconda activate gdc-client-envwget https://gdc.cancer.gov/system/files/public/file/gdc-client_2.3_Ubuntu_x64-py3.8-ubuntu-20.04.zipunzip /home/yujing/gdc-client_2.3_Ubuntu_x64.zipgdc-client --version: now can see gdc-client as an executablemv gdc-client /home/yujing/miniconda3/envs/gdc-client-env/bin/gdc-client --version
To download TCGA data example
conda activate gdc-client-env- manifest file path:
/home/yujing/dockhome/Multimodality/Segment/tmp/manifest/gdc_manifest.2024-11-03.txt - cd to where you'd like the
gdc-client download -m/home/yujing/dockhome/Multimodality/Segment/tmp/manifest/gdc_manifest.2024-11-03.txt- download the TCGA-CESC data
- cd /Data/Yujing/Segment/tmp/tcgacescsvs
- gdc-client download -m/Data/Yujing/Segment/tmp/tcgacescmanifest/gdc_manifest.2024-11-11.txt
- note that
/Data/Yujing/Segment/tmp/tcga_cesc_manifest/gdc_manifest.2024-11-11_v2.txtis the same as.//Data/Yujing/Segment/tmp/tcga_cesc_manifest/gdc_manifest.2024-11-11.txtwhile excluding the cases already downloaded on/Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_Cedar_HG_filemap.tsv, just to continue downloading - gdc-client download -m//Data/Yujing/Segment/tmp/tcgacescmanifest/gdcmanifest.2024-11-11v2.txt
Semantic segmetation
- on test slides
./dockhome/Multimodality/Segment/tmp/blca_svs/30e4624b-6f48-429b-b1d9-6a6bc5c82c5e/TCGA-2F-A9KO-01Z-00-DX1.195576CF-B739-4BD9-B15B-4A70AE287D3E.svs - scripts:
- dockhome/Multimodality/Segment/tmp/scripts/semantic_segmentation.py
- dockhome/Multimodality/Segment/tmp/scripts/visualizesemanticsegmentation.py
- ./visualizesemanticsegmentation.py visualizes the segmentation outpus
- WSI outputs 5 channels of probability maps for each class: label_dict = {"Tumour": 0, "Stroma": 1, "Inflammatory": 2, "Necrosis": 3, "Others": 4}
- Must map pixel by pixel of semantic segmentation output to the Pan-Cancer-Nuclei-Seg .csv 4k by 4k files
- Merge semantic segmentation output to the Pan-Cancer-Nuclei-Seg .csv 4k by 4k files
- Once a pixel class is obtained, within a QAed segmented nuclei (number of pixels), majority vote for classification of nuclei class
- PixelInAreas was already reported in the Pan-Cancer-Nuclei-Seg .csv files
- Output for each patch, PixelInAreas vector for each class
- Output for each WSI: concatenation of PixelInAreas vectors for each class ffrom each patch
nuclei_classify.py can classify invididual nuclei (segmentation) to semantic classes based on its overlap
- Per patch analysis available. ~16 seconds per patch.
- Now Testing for all patches in a folder. Optimize speed
- Test folder path:
- Need to match file names
Is it possible to directly mount, read, and write files from ssh Narval to this local machine?
sshfs your_username@narval.compute_canada.ca:/remote/path/to/folder /local/mount/point
sshfs yujingz@narval.computecanada.ca:/home/yujingz/scratch/NUCLEISIZECODE/Pan-Cancer-Nuclei-Seg/ScientificData/svs-files/tcgacesc /Data/Yujing/Segment/tmp/tcga_cesc
Partitioning the semantic segmentation task on multiple compute clusters
- https://docs.google.com/spreadsheets/d/1RuG-e45JQRM5R15ijCUWbn3fyhfOJsIG9Im8_PFep5Q/edit?usp=sharing
- shows this partitioning of the task
- Proton GPU (local), NarvalYZ, NarvalHG, CedarYZ, CedarHG, BelugaYZ, BelugaHG
- Estimate how long each one takes
- see the Narval README.md for more details
Ran parts of tcga_cesc on Proton, the rest on Narval
- /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bash_scripts/run_semantic_seg_for_loop.sh
- ran for filemap: /Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_Proton_filemap.tsv
- /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bash_scripts/run_semantic_seg_for_loop2.sh
- ran for filemap: /Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_to_be_rerun.tsv
extract or classify nuclei from semantic segmentation of WSI
- /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/nucleiclassifyextractjobarraysnarval.sh
- ran on Narval for the segmented WSIs ran and stored on the rrg project folder there.
- It is a job array therefore submitted by sbatch ..script.sh. It creates x number of tasks where each corresponds to one row of the SAMPLE_SHEET or the filemap defined in the script
- Here on Proton, we can't create job arrays, so will need to modify them into for loops where each loop processes a row from the given SAMPLESHEET >>./tmp/scripts/bashscripts/nucleiclassifyextractforloop.sh corresponds to /Data/Yujing/Segment/tmp/tcgacescmanifest/runpartition/runProtonfilemap.tsv whose semantic segmentation was run from /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/runsemanticsegforloop.sh
./tmp/scripts/bashscripts/nucleiclassifyextractforloop2.sh corresponds to /Data/Yujing/Segment/tmp/tcgacescmanifest/runpartition/runtobererun.tsv whose semantic segmentation was run from /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/runsemanticsegforloop2.sh
Manuscript Methodology Figure visuals
Note: the following scripts consist of the making of each component of the manuscript methodology figure to be reproducible. With the semanticseg step, we first randomly generated 5 patches (experiemnts) of the semantic segmentation of 4k by 4k patches of a WSI that matches with the cescpolygon csv file names. We visually selected representable ones including a variety type of tissue classes and coverage (did not select patches with lots of white space background for exapmle). Each bash script has the option to do the random patch selection visualization or with a defined XSTART, YSTART, and PATCHSIZE manually. Once the ones to show in the manuscript were picked, they were correspondingly generated for the nucleiclassify_overlay, and the nuclei binary segmentation.
- semanticseg: /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/ManuscriptVisualizations/semanticsegvisualprogressbar.sh
- semanticseg + cescpolygon nuclei binary seg nucleiclassifyoverlay: /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/ManuscriptVisualizations/nucleioverlayvisual.sh
- nuclei binary segmentation mask: based on /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/polygontomasks.py
```bibtex @software{ZouTumornuclearsize2024, author = {Zou, Yujing and Glickman, Harry and Pelmus, Manuela and Maleki, Farhad and Bahoric, Boris and Lecavalier-Barsoum, Magali and Enger, Shirin A.}, doi = {XXX}, month = dec, title = {{Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies: development of a multivariable prediction model}}, url = {https://github.com/engerlab/segmentor}}, version = {1.0.0}, year = {2024} }
Owner
- Name: Yujing Zou
- Login: yujing1997
- Kind: user
- Company: McGill University
- Twitter: yujingzou
- Repositories: 1
- Profile: https://github.com/yujing1997
McGill Medical Physics Ph.D. candidate; M.Sc.: Medical Radiation Physics, B.Sc.: Mathematics and Physiology joint major, and minor in Physics.
Citation (CITATION.cff)
cff-version: 1.2.0 message: "If you use this software or research work, please consider citing it as below." authors: - family-names: "Zou" given-names: "Yujing" - family-names: "Glickman" given-names: "Harry" - family-names: "Pelmus" given-names: "Manuela" - family-names: "Maleki" given-names: "Farhad" - family-names: "Bahoric" given-names: "Boris" - family-names: "Lecavalier-Barsoum" given-names: "Magali" - family-names: "Enger" given-names: "Shirin A." title: "Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies: development of a multivariable prediction model" version: 1.0.0 date-released: 2024-12-17 doi: "XXX" url: "https://github.com/engerlab/segmentor}"
GitHub Events
Total
- Delete event: 2
- Push event: 3
- Public event: 1
- Create event: 2
Last Year
- Delete event: 2
- Push event: 3
- Public event: 1
- Create event: 2