https://github.com/broadinstitute/import-service
Terra Import Service
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.8%) to scientific vocabulary
Repository
Terra Import Service
Basic Info
- Host: GitHub
- Owner: broadinstitute
- License: bsd-3-clause
- Language: Python
- Default Branch: develop
- Size: 523 KB
Statistics
- Stars: 3
- Watchers: 37
- Forks: 1
- Open Issues: 0
- Releases: 38
Metadata Files
README.md
| :warning: WARNING | |:----------------------------| | Import Service is obsolete and superseded by cWDS. |
import-service
Terra Import Service. Tech doc here.
A walkthrough of the code in this repo is available at WALKTHROUGH.md.
Developer notes
First time setup
Python version 3.9 is required for import-service to run properly.
Create and activate the Python virtualenvironment:
$ python3 -m venv venv
$ source venv/bin/activate
(venv) $ poetry install
You should periodically run the poetry install line within your venv to keep it up-to-date with changes in dependencies.
Troubleshooting first time setup
If you have problems running poetry install for the first time and encounter an error like:
The license_file parameter is deprecated, use license_files instead.
There is an incompatibility between pyyaml 5.4.1 and cython 3+ which you can work around with the following from your venv:
pip install wheel
pip install "cython<3.0.0" && pip install --no-build-isolation pyyaml==5.4.1
(src for workaround)
Normal usage
Activate and deactivate the venv:
$ source venv/bin/activate
<do all your work here>
(venv) $ deactivate
To run tests:
(venv) poetry run pytest
To run the type linter, go to the repo root directory and run:
(venv) poetry run mypy ./*.py && poetry run mypy -p app
If you'd like to run import-service locally, the following steps should help:
```
At the root of the directory run:
docker build . -t
Then, find your Image ID -- run the following command and get the SHA value associated with your new container
docker ps && docker run
You should make mypy happy before opening a PR. Note that errors in some modules will be listed twice. This is annoying, but the good news is that you only have to fix them once.
Dependency Management
This repo uses poetry for dependency management. But, it deploys to App Engine and App Engine requires a requirements.txt file. Therefore,
you must simultaneously update requirements.txt AND poetry's poetry.lock/pyproject.toml when changing dependencies.
To sync requirements.txt to poetry:
poetry export -f requirements.txt -o requirements.txt --without-hashes
For other poetry commands, such as to add a new dependency or update an existing dependency, see https://python-poetry.org/docs/cli/.
Deployment (for Broad only)
Deployments to non-production and production environments are performed in Jenkins. In order to access Jenkins, you will need to be on the Broad network or logged on to the Broad VPN.
Deploy to the "dev" environment
A deployment to dev environment will be automatically triggered every time there is a commit or push to the
develop branch on Github. If you would like to deploy a different
branch or tag to the dev environment, you can do so by following the instructions below, but be aware that a new
deployment of the develop branch will be triggered if anyone commits or pushes to that branch.
Deploy to non-production environments
- Log in to Jenkins
- Navigate to the import-service-manual-deploy job
- In the left menu, click Build with Parameters
and select the
BRANCH_OR_TAGthat you want to deploy, theTARGETenvironment to which you want to deploy, and enter theSLACK_CHANNELthat you would like to receive notifications of the deploy jobs success/failure - Click the
Buildbutton
Production Deployment Checklist
When doing a production deployment, each step of the checklist must be performed.
Production Deployment Preparation
[ ] Double-check that
requirements.txtis up to date with poetry; see Dependency Management.[ ] Create and push a new version tag for the commit you want to deploy; typically this will be the head of the develop branch. Go to Releases and select 'Draft a new Release'. Create a release with a new tag. Ensure that the tag is incremented properly based on the last released version.
[ ] Create a ticket for the release and be sure to leave the 'Fix Version' field blank. Add a checklist to the ticket and select 'Load Templates' from the ... menu to the right of the checklist. Use 'Import Service Release Checklist'. You may refer to (or clone) a previous release ticket for an example. This ticket ensures that the release is recorded for compliance, and that any release notes are picked up to be published. It also helps to keep track of the steps along the way, outlined in the next section.
Deploy and Test
You must deploy to each tier one-by-one and manually test in each tier after you deploy to it. This test should consist of uploading a large-ish (~2MB should suffice) tsv to a GCP workspace and ensuring it asynchronously uploads, as well as any specific changes made in the release. You may refer also to this document, although it is now somewhat out of date. Your deployment to a tier should not be considered complete until you have successfully executed each step of the manual test on that tier. Mark each step complete on the release ticket created above.
To deploy the application code, navigate to the import-service-manual-deploy
job and click the "Build with Parameters" link. Select the TAG that you just created during the preparation steps and
the TIER to which you want to deploy:
- [ ]
devdeploy job succeeded and manual test passed- (Technically, this same commit is probably already running on
devcourtesy of the automaticdevdeployment job. However, deploying again is an important step because someone else may have triggered adevdeployment and we want to ensure that you understand the deployment process, the deployment tools are working properly, and that everything is working as intended.)
- (Technically, this same commit is probably already running on
- [ ]
alphadeploy job succeeded and manual test passed. - [ ]
stagingdeploy job succeeded and manual test passed - [ ]
proddeploy job succeeded and manual test passed- In order to deploy to
prod, you must be on the DSP Suitability Roster. You will need to log into the production Jenkins instance and use the "import-service-manual-deploy" job to release the same tag to production.
- In order to deploy to
NOTE:
* It is important that you deploy to all tiers. Because Import Service is an "indie service", we should strive to make sure
that all tiers other than dev are kept in sync and are running the same versions of code. This is essential so that
as other DSP services are tested during their release process, they can ensure that their code will work properly with
the latest version of Bond running in prod.
Deployment Maintenance & Cleanup
Import Service is a Google App Engine (GAE) application. Versions of import-service are deployed to GAE either via a merge into the develop branch (for dev apps),
or via Jenkins for all other environments. See here for specific details.
GAE allows a maximum of 210 versions of any app, so we handle cleanup of old apps as
new versions are deployed.
DSP caps the number of versions that can be stored at 50 (this is just an arbitrary number), just to ensure plenty of versions available for rollback if a bug were introduced.
In cleanup_scripts:
The delete-old-app-engine-versions-cleanup bash script handles cleanup of multiple versions, sorted by oldest. It has a cap of at least 50 versions that must remain after cleanup.
For other environments other than prod (such as alpha or staging), the script must be run manually. This is to ensure that the deletion of versions is intentional by an authenticated user.
For prod, the suggestion is that the deletions occur deliberately in the Google App Engine console.
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
Last Year
Dependencies
- Flask ==2.0.2
- MarkupSafe ==2.0.1
- flask-restx ==0.5.1
- gcsfs ==2021.11.1
- google-api-python-client ==2.33.0
- google-auth ==2.3.3
- google-cloud-logging ==3.0.0
- google-cloud-pubsub ==2.9.0
- jsonschema ==4.2.1
- memunit ==0.5.2
- mypy ==0.910
- pandas ==1.3.5
- pyarrow ==6.0.1
- pydantic ==1.8.2
- pyhumps ==3.0.2
- pymysql ==1.0.2
- pypfb ==0.5.0
- pytest ==6.2.5
- requests ==2.26.0
- setuptools ==59.5.0
- sqlalchemy ==1.4.28
- sqlalchemy-repr ==0.0.2
- sqlalchemy-stubs ==0.4
- types-requests ==2.26.1
- werkzeug ==2.0.3
- wheel ==0.37.0
- python 3.9.7 build
- us.gcr.io/broad-dsp-gcr-public/base/python 3.9-debian build