Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: yujing1997
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 7.53 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created almost 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies

Alt text

Semantic Segmentation (TiaToolBox)

How to install gdc-client for data transfer from TCGA data

  • conda create -n gdc-client-env python=3.8 -y
  • conda activate gdc-client-env
  • wget https://gdc.cancer.gov/system/files/public/file/gdc-client_2.3_Ubuntu_x64-py3.8-ubuntu-20.04.zip
  • unzip /home/yujing/gdc-client_2.3_Ubuntu_x64.zip
  • gdc-client --version: now can see gdc-client as an executable
  • mv gdc-client /home/yujing/miniconda3/envs/gdc-client-env/bin/
  • gdc-client --version

To download TCGA data example

  • conda activate gdc-client-env
  • manifest file path: /home/yujing/dockhome/Multimodality/Segment/tmp/manifest/gdc_manifest.2024-11-03.txt
  • cd to where you'd like the
  • gdc-client download -m/home/yujing/dockhome/Multimodality/Segment/tmp/manifest/gdc_manifest.2024-11-03.txt
  • download the TCGA-CESC data
    • cd /Data/Yujing/Segment/tmp/tcgacescsvs
    • gdc-client download -m/Data/Yujing/Segment/tmp/tcgacescmanifest/gdc_manifest.2024-11-11.txt
    • note that /Data/Yujing/Segment/tmp/tcga_cesc_manifest/gdc_manifest.2024-11-11_v2.txt is the same as .//Data/Yujing/Segment/tmp/tcga_cesc_manifest/gdc_manifest.2024-11-11.txt while excluding the cases already downloaded on /Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_Cedar_HG_filemap.tsv, just to continue downloading
    • gdc-client download -m//Data/Yujing/Segment/tmp/tcgacescmanifest/gdcmanifest.2024-11-11v2.txt

Semantic segmetation

  • on test slides ./dockhome/Multimodality/Segment/tmp/blca_svs/30e4624b-6f48-429b-b1d9-6a6bc5c82c5e/TCGA-2F-A9KO-01Z-00-DX1.195576CF-B739-4BD9-B15B-4A70AE287D3E.svs
  • scripts:
    • dockhome/Multimodality/Segment/tmp/scripts/semantic_segmentation.py
    • dockhome/Multimodality/Segment/tmp/scripts/visualizesemanticsegmentation.py
      • ./visualizesemanticsegmentation.py visualizes the segmentation outpus
      • WSI outputs 5 channels of probability maps for each class: label_dict = {"Tumour": 0, "Stroma": 1, "Inflammatory": 2, "Necrosis": 3, "Others": 4}
  1. Must map pixel by pixel of semantic segmentation output to the Pan-Cancer-Nuclei-Seg .csv 4k by 4k files
  2. Merge semantic segmentation output to the Pan-Cancer-Nuclei-Seg .csv 4k by 4k files
  3. Once a pixel class is obtained, within a QAed segmented nuclei (number of pixels), majority vote for classification of nuclei class
    • PixelInAreas was already reported in the Pan-Cancer-Nuclei-Seg .csv files
    • Output for each patch, PixelInAreas vector for each class
    • Output for each WSI: concatenation of PixelInAreas vectors for each class ffrom each patch

nuclei_classify.py can classify invididual nuclei (segmentation) to semantic classes based on its overlap

  • Per patch analysis available. ~16 seconds per patch.
  • Now Testing for all patches in a folder. Optimize speed
    • Test folder path:
  • Need to match file names

Is it possible to directly mount, read, and write files from ssh Narval to this local machine?

sshfs your_username@narval.compute_canada.ca:/remote/path/to/folder /local/mount/point sshfs yujingz@narval.computecanada.ca:/home/yujingz/scratch/NUCLEISIZECODE/Pan-Cancer-Nuclei-Seg/ScientificData/svs-files/tcgacesc /Data/Yujing/Segment/tmp/tcga_cesc

Partitioning the semantic segmentation task on multiple compute clusters

  • https://docs.google.com/spreadsheets/d/1RuG-e45JQRM5R15ijCUWbn3fyhfOJsIG9Im8_PFep5Q/edit?usp=sharing
    • shows this partitioning of the task
    • Proton GPU (local), NarvalYZ, NarvalHG, CedarYZ, CedarHG, BelugaYZ, BelugaHG
    • Estimate how long each one takes
    • see the Narval README.md for more details

Ran parts of tcga_cesc on Proton, the rest on Narval

- /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bash_scripts/run_semantic_seg_for_loop.sh
    - ran for filemap: /Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_Proton_filemap.tsv
- /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bash_scripts/run_semantic_seg_for_loop2.sh
    - ran for filemap: /Data/Yujing/Segment/tmp/tcga_cesc_manifest/run_partition/run_to_be_rerun.tsv

extract or classify nuclei from semantic segmentation of WSI

  • /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/nucleiclassifyextractjobarraysnarval.sh
    • ran on Narval for the segmented WSIs ran and stored on the rrg project folder there.
    • It is a job array therefore submitted by sbatch ..script.sh. It creates x number of tasks where each corresponds to one row of the SAMPLE_SHEET or the filemap defined in the script
    • Here on Proton, we can't create job arrays, so will need to modify them into for loops where each loop processes a row from the given SAMPLESHEET >>./tmp/scripts/bashscripts/nucleiclassifyextractforloop.sh corresponds to /Data/Yujing/Segment/tmp/tcgacescmanifest/runpartition/runProtonfilemap.tsv whose semantic segmentation was run from /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/runsemanticsegforloop.sh

./tmp/scripts/bashscripts/nucleiclassifyextractforloop2.sh corresponds to /Data/Yujing/Segment/tmp/tcgacescmanifest/runpartition/runtobererun.tsv whose semantic segmentation was run from /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/runsemanticsegforloop2.sh

Manuscript Methodology Figure visuals

Note: the following scripts consist of the making of each component of the manuscript methodology figure to be reproducible. With the semanticseg step, we first randomly generated 5 patches (experiemnts) of the semantic segmentation of 4k by 4k patches of a WSI that matches with the cescpolygon csv file names. We visually selected representable ones including a variety type of tissue classes and coverage (did not select patches with lots of white space background for exapmle). Each bash script has the option to do the random patch selection visualization or with a defined XSTART, YSTART, and PATCHSIZE manually. Once the ones to show in the manuscript were picked, they were correspondingly generated for the nucleiclassify_overlay, and the nuclei binary segmentation.

  • semanticseg: /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/ManuscriptVisualizations/semanticsegvisualprogressbar.sh
  • semanticseg + cescpolygon nuclei binary seg nucleiclassifyoverlay: /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/bashscripts/ManuscriptVisualizations/nucleioverlayvisual.sh
  • nuclei binary segmentation mask: based on /home/yujing/dockhome/Multimodality/Segment/tmp/scripts/polygontomasks.py

```bibtex @software{ZouTumornuclearsize2024, author = {Zou, Yujing and Glickman, Harry and Pelmus, Manuela and Maleki, Farhad and Bahoric, Boris and Lecavalier-Barsoum, Magali and Enger, Shirin A.}, doi = {XXX}, month = dec, title = {{Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies: development of a multivariable prediction model}}, url = {https://github.com/engerlab/segmentor}}, version = {1.0.0}, year = {2024} }

Owner

  • Name: Yujing Zou
  • Login: yujing1997
  • Kind: user
  • Company: McGill University

McGill Medical Physics Ph.D. candidate; M.Sc.: Medical Radiation Physics, B.Sc.: Mathematics and Physiology joint major, and minor in Physics.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software or research work, please consider citing it as below."
authors:
- family-names: "Zou"
  given-names: "Yujing"
- family-names: "Glickman"
  given-names: "Harry"
- family-names: "Pelmus"
  given-names: "Manuela"
- family-names: "Maleki"
  given-names: "Farhad"
- family-names: "Bahoric"
  given-names: "Boris"
- family-names: "Lecavalier-Barsoum"
  given-names: "Magali"
- family-names: "Enger"
  given-names: "Shirin A."
title: "Tumor nuclear size as a biomarker for post-radiotherapy progression and survival in gynecological malignancies: development of a multivariable prediction model"
version: 1.0.0
date-released: 2024-12-17
doi: "XXX"
url: "https://github.com/engerlab/segmentor}"

GitHub Events

Total
  • Delete event: 2
  • Push event: 3
  • Public event: 1
  • Create event: 2
Last Year
  • Delete event: 2
  • Push event: 3
  • Public event: 1
  • Create event: 2