uniprot.ws

R Interface to UniProt Web Services

https://github.com/bioconductor/uniprot.ws

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    4 of 21 committers (19.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.6%) to scientific vocabulary

Keywords

bioconductor-package core-package

Keywords from Contributors

bioconductor bioinformatics hdf5 rhdf5 pathway-analysis mirror image-analysis rnaseq derfinder similarity-measurement
Last synced: 6 months ago · JSON representation

Repository

R Interface to UniProt Web Services

Basic Info
Statistics
  • Stars: 7
  • Watchers: 7
  • Forks: 7
  • Open Issues: 0
  • Releases: 0
Topics
bioconductor-package core-package
Created over 8 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog

README.md

UniProt.ws

r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("UniProt.ws")

Configuring UniProt.ws

The UniProt.ws package provides a select interface to the UniProt web service.

r suppressPackageStartupMessages({ library(UniProt.ws) }) up <- UniProt.ws(taxId=9606)

If you already know about the select interface, you can immediately learn about the various methods for this object by just looking it’s the help page.

r help("UniProt.ws")

When you load the UniProt.ws package, it creates a UniProt.ws object. If you look at the object you will see some helpful information about it.

``` r up

> UniProt.ws interface object:

> Taxonomy ID: 9606

> Species name: Homo sapiens (Human)

> List species with 'availableUniprotSpecies()'

```

By default, you can see that the UniProt.ws object is set to retrieve records from Homo sapiens. But you can change that of course. In order to change it, you first need to look up the appropriate taxonomy ID for the species that you are interested in. Uniprot provides support for over 20 thousand species, so there are a few to choose from! In order to make this easier, we have provided the helper function availableUniprotSpecies which will list all the supported species along with their taxonomy ids. When you call the availableUniprotSpecies function, it’s recommended that you make use of the pattern argument to limit your queries like this:

``` r availableUniprotSpecies(pattern="musculus")

> kingdom Taxon Node Official (scientific) name

> ANTMS E 520121 Anthocoris musculus

> ANTMU E 208057 Anthoscopus musculus

> APOMU E 238007 Apomys musculus

> BAIMU E 213557 Baiomys musculus

> BALMU E 9771 Balaenoptera musculus

> BLEMU E 197864 Blepharisma musculus

> MOUSE E 10090 Mus musculus

> MUSMB E 35531 Mus musculus bactrianus

> MUSMC E 10091 Mus musculus castaneus

> MUSMM E 57486 Mus musculus molossinus

> MUSMS E 186842 Mus musculus x Mus spretus

> MUSMX E 477816 Mus musculus musculus x Mus musculus castaneus

> POVM1 V 1891730 Mus musculus polyomavirus 1

```

Once you have learned the taxonomy ID for the species of interest, you can then change the taxonomy id for the UniProt.ws object using taxId setter or by calling the constructor for UniProt.ws

``` r mouseUp <- UniProt.ws(10090) mouseUp

> UniProt.ws interface object:

> Taxonomy ID: 10090

> Species name: Mus musculus (Mouse)

> List species with 'availableUniprotSpecies()'

```

As you can see the species is different for the mouseUp new object.

Using UniProt.ws

Once you are safisfied that you have an uniport.ws that is using the appropriate organsims, you can make use of the standard set of methods in a select interface. Specifically: columns, keytypes, keys and select.

You will probably notice that there are a large number of columns that can be retrieved.

``` r head(keytypes(up))

> [1] "Allergome" "ArachnoServer" "Araport" "BioCyc"

> [5] "BioGRID" "BioMuta"

```

And most (but not all) of these fields can also be used as keytypes.

``` r head(columns(up))

> [1] "absorption" "accession"

> [3] "annotationscore" "ccactivity_regulation"

> [5] "ccallergen" "ccalternative_products"

```

If necessary you can also look up the keys of a given type. But please be warned that the web service is slow at this particular kind of lookup. So if you really want to do this kind of operation you are probably going to want to save the result to your R session.

r egs <- keys(up, "GeneID")

Finally, you can loop up whatever combinations of columns, keytypes and keys that you need when using select.

Note. ‘ENTREZ_GENE’ is now ‘GeneID’

``` r keys <- c("1","2") columns <- c("xrefpdb", "xrefhgnc", "sequence") kt <- "GeneID" res <- select(up, keys, columns, kt) res

> From Entry PDB

> 1 1 P04217

> 2 1 V9HWD8

> 3 2 P01023 1BV8;2P9R;6TAV;7O7L;7O7M;7O7N;7O7O;7O7P;7O7Q;7O7R;7O7S;7VON;7VOO;

> HGNC

> 1 HGNC:5;

> 2

> 3 HGNC:7;

> Sequence

> 1 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQLFKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPWLSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDFQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELILSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFELHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAVLRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTFESELSDPVELLVAES

> 2 MSMLVVFLLLWGVTWGPVTEAAIFYETQPSLWAESESLLKPLANVTLTCQAHLETPDFQLFKNGVAQEPVHLDSPAIKHQFLLTGDTQGRYRCRSGLSTGWTQLSKLLELTGPKSLPAPWLSMAPVSWITPGLKTTAVCRGVLRGVTFLLRREGDHEFLEVPEAQEDVEATFPVHQPGNYSCSYRTDGEGALSEPSATVTIEELAAPPPPVLMHHGESSQVLHPGNKVTLTCVAPLSGVDFQLRRGEKELLVPRSSTSPDRIFFHLNAVALGDGGHYTCRYRLHDNQNGWSGDSAPVELILSDETLPAPEFSPEPESGRALRLRCLAPLEGARFALVREDRGGRRVHRFQSPAGTEALFELHNISVADSANYSCVYVDLKPPFGGSAPSERLELHVDGPPPRPQLRATWSGAVLAGRDAVLRCEGPIPDVTFELLREGETKAVKTVRTPGAAANLELIFVGPQHAGNYRCRYRSWVPHTFESELSDPVELLVAES

> 3 MGKNKLLHPSLVLLLLVLLPTDASVSGKPQYMVLVPSLLHTETTEKGCVLLSYLNETVTVSASLESVRGNRSLFTDLEAENDVLHCVAFAVPKSSSNEEVMFLTVQVKGPTQEFKKRTTVMVKNEDSLVFVQTDKSIYKPGQTVKFRVVSMDENFHPLNELIPLVYIQDPKGNRIAQWQSFQLEGGLKQFSFPLSSEPFQGSYKVVVQKKSGGRTEHPFTVEEFVLPKFEVQVTVPKIITILEEEMNVSVCGLYTYGKPVPGHVTVSICRKYSDASDCHGEDSQAFCEKFSGQLNSHGCFYQQVKTKVFQLKRKEYEMKLHTEAQIQEEGTVVELTGRQSSEITRTITKLSFVKVDSHFRQGIPFFGQVRLVDGKGVPIPNKVIFIRGNEANYYSNATTDEHGLVQFSINTTNVMGTSLTVRVNYKDRSPCYGYQWVSEEHEEAHHTAYLVFSPSKSFVHLEPMSHELPCGHTQTVQAHYILNGGTLLGLKKLSFYYLIMAKGGIVRTGTHGLLVKQEDMKGHFSISIPVKSDIAPVARLLIYAVLPTGDVIGDSAKYDVENCLANKVDLSFSPSQSLPASHAHLRVTAAPQSVCALRAVDQSVLLMKPDAELSASSVYNLLPEKDLTGFPGPLNDQDNEDCINRHNVYINGITYTPVSSTNEKDMYSFLEDMGLKAFTNSKIRKPKMCPQLQQYEMHGPEGLRVGFYESDVMGRGHARLVHVEEPHTETVRKYFPETWIWDLVVVNSAGVAEVGVTVPDTITEWKAGAFCLSEDAGLGISSTASLRAFQPFFVELTMPYSVIRGEAFTLKATVLNYLPKCIRVSVQLEASPAFLAVPVEKEQAPHCICANGRQTVSWAVTPKSLGNVNFTVSAEALESQELCGTEVPSVPEHGRKDTVIKPLLVEPEGLEKETTFNSLLCPSGGEVSEELSLKLPPNVVEESARASVSVLGDILGSAMQNTQNLLQMPYGCGEQNMVLFAPNIYVLDYLNETQQLTPEIKSKAIGYLNTGYQRQLNYKHYDGSYSTFGERYGRNQGNTWLTAFVLKTFAQARAYIFIDEAHITQALIWLSQRQKDNGCFRSSGSLLNNAIKGGVEDEVTLSAYITIALLEIPLTVTHPVVRNALFCLESAWKTAQEGDHGSHVYTKALLAYAFALAGNQDKRKEVLKSLNEEAVKKDNSVHWERPQKPKAPVGHFYEPQAPSAEVEMTSYVLLAYLTAQPAPTSEDLTSATNIVKWITKQQNAQGGFSSTQDTVVALHALSKYGAATFTRTGKAAQVTIQSSGTFSSKFQVDNNNRLLLQQVSLPELPGEYSMKVTGEGCVYLQTSLKYNILPEKEEFPFALGVQTLPQTCDEPKAHTSFQISLSVSYTGSRSASNMAIVDVKMVSGFIPLKPTVKMLERSNHVSRTEVSSNHVLIYLDKVSNQTLSLFFTVLQDVPVRDLKPAIVKVYDYYETDEFAIAEYNAPCSKDLGNA

```

sessionInfo()

``` r sessionInfo()

> R Under development (unstable) (2024-11-01 r87285)

> Platform: x86_64-pc-linux-gnu

> Running under: Ubuntu 22.04.5 LTS

>

> Matrix products: default

> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0

> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0

>

> locale:

> [1] LCCTYPE=enUS.UTF-8 LC_NUMERIC=C

> [3] LCTIME=enUS.UTF-8 LCCOLLATE=enUS.UTF-8

> [5] LCMONETARY=enUS.UTF-8 LCMESSAGES=enUS.UTF-8

> [7] LCPAPER=enUS.UTF-8 LC_NAME=C

> [9] LCADDRESS=C LCTELEPHONE=C

> [11] LCMEASUREMENT=enUS.UTF-8 LC_IDENTIFICATION=C

>

> time zone: America/New_York

> tzcode source: system (glibc)

>

> attached base packages:

> [1] stats graphics grDevices utils datasets methods base

>

> other attached packages:

> [1] UniProt.ws2.47.4 BiocStyle2.35.0

>

> loaded via a namespace (and not attached):

> [1] rappdirs0.3.3 generics0.1.3 RSQLite_2.3.9

> [4] hms1.1.3 digest0.6.37 magrittr_2.0.3

> [7] evaluate1.0.1 fastmap1.2.0 blob_1.2.4

> [10] jsonlite1.8.9 progress1.2.3 AnnotationDbi_1.69.0

> [13] GenomeInfoDb1.43.2 DBI1.2.3 BiocManager_1.30.25

> [16] httr1.4.7 purrr1.0.2 UCSC.utils_1.3.0

> [19] Biostrings2.75.3 codetools0.2-20 httr2_1.0.7

> [22] cli3.6.3 rlang1.1.4 crayon_1.5.3

> [25] dbplyr2.5.0 XVector0.47.1 Biobase_2.67.0

> [28] bit644.5.2 withr3.0.2 cachem_1.1.0

> [31] yaml2.3.10 BiocBaseUtils1.9.0 tools_4.5.0

> [34] memoise2.0.1 dplyr1.1.4 filelock_1.0.3

> [37] GenomeInfoDbData1.2.13 BiocGenerics0.53.3 curl_6.0.1

> [40] rjsoncons1.3.1 vctrs0.6.5 R6_2.5.1

> [43] png0.1-8 stats44.5.0 lifecycle_1.0.4

> [46] BiocFileCache2.15.0 KEGGREST1.47.0 S4Vectors_0.45.2

> [49] IRanges2.41.2 bit4.5.0.1 pkgconfig_2.0.3

> [52] pillar1.10.0 glue1.8.0 tidyselect_1.2.1

> [55] xfun0.49 tibble3.2.1 rstudioapi_0.17.1

> [58] knitr1.49 AnVILBase1.1.0 htmltools_0.5.8.1

> [61] rmarkdown2.29 compiler4.5.0 prettyunits_1.2.0

```

Owner

  • Name: Bioconductor
  • Login: Bioconductor
  • Kind: organization

Software for the analysis and comprehension of high-throughput genomic data

GitHub Events

Total
  • Issues event: 12
  • Watch event: 3
  • Issue comment event: 34
  • Push event: 13
Last Year
  • Issues event: 12
  • Watch event: 3
  • Issue comment event: 34
  • Push event: 13

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 199
  • Total Committers: 21
  • Avg Commits per committer: 9.476
  • Development Distribution Score (DDS): 0.628
Past Year
  • Commits: 45
  • Committers: 5
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.2
Top Committers
Name Email Commits
LiNk-NY m****z@r****g 74
Marc Carlson m****n@f****g 36
Martin Morgan m****n@r****g 17
Nitesh Turaga n****a@g****m 14
Dan Tenenbaum d****a@f****g 14
Valerie Obenchain v****a@f****g 7
Herve Pages h****s@f****g 6
LiNk-NY m****s@r****g 5
J Wokaty j****y@s****u 4
lshep l****d@r****g 3
James MacDonald j****n@m****u 3
Daniel Van Twisk d****k@r****g 2
vobencha v****a@g****m 2
vobencha v****n@r****g 2
LiNk-NY m****9@g****m 2
J Wokaty j****y 2
Hervé Pagès h****s@f****g 2
Sonali Arora s****a@f****g 1
Jim MacDonald j****n@v****u 1
Jim MacDonald j****n@o****u 1
Martin Morgan m****n@f****g 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 26
  • Total pull requests: 3
  • Average time to close issues: 5 months
  • Average time to close pull requests: 2 days
  • Total issue authors: 21
  • Total pull request authors: 2
  • Average comments per issue: 3.73
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 25 days
  • Average time to close pull requests: N/A
  • Issue authors: 4
  • Pull request authors: 0
  • Average comments per issue: 5.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lars20070 (3)
  • Arthfael (2)
  • vw087 (2)
  • toobiwankenobi (2)
  • kevinwang09 (1)
  • biodan25 (1)
  • Jo-riz (1)
  • nrnml (1)
  • Fred-White94 (1)
  • CreLox (1)
  • flying-sheep (1)
  • HarwayZ (1)
  • ghost (1)
  • buijt (1)
  • Camsid (1)
Pull Request Authors
  • LiNk-NY (2)
  • jmacdon (1)
Top Labels
Issue Labels
reprex needed (1)
Pull Request Labels

Dependencies

DESCRIPTION cran
  • BiocGenerics >= 0.13.8 depends
  • RSQLite * depends
  • methods * depends
  • utils * depends
  • AnnotationDbi * imports
  • BiocBaseUtils * imports
  • BiocFileCache * imports
  • cellxgenedp * imports
  • httpcache * imports
  • httr * imports
  • jsonlite * imports
  • progress * imports
  • BiocStyle * suggests
  • RUnit * suggests
  • knitr * suggests