https://github.com/breandan/tracelink

🔗 Trace Link Prediction from code to documentation

https://github.com/breandan/tracelink

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • â—‹
    CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • â—‹
    .zenodo.json file
  • ✓
    DOI references
    Found 2 DOI reference(s) in README
  • ✓
    Academic publication links
    Links to: arxiv.org
  • â—‹
    Academic email domains
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary

Keywords

documentation link-prediction reccomender traceability
Last synced: 5 months ago · JSON representation

Repository

🔗 Trace Link Prediction from code to documentation

Basic Info
  • Host: GitHub
  • Owner: breandan
  • Language: TeX
  • Default Branch: master
  • Homepage:
  • Size: 16.6 MB
Statistics
  • Stars: 0
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
documentation link-prediction reccomender traceability
Created over 6 years ago · Last pushed over 3 years ago
Metadata Files
Readme

README.md

TraceLink

The goal of this project is to link source code to documentation and other realated software artifacts. We train a recommender system that suggests a list of documents sorted by their relevance to a given context or code snippet. For uncommon tokens, this should at least include all documents which refer to the token directly (e.g. an inverted index), as well as documents which are semantically or contextually related to the source code in non-obvious ways.

Approach

We train a variational autoencoder and use the encoder to project short sequences of text with their accompanying link into link space. In the same manner, we train a second VAE on documents, to learn a document space embedding. Finally we train a supervised model from link space to document space, i.e. to predict the document(s) which a link with unknown destination may have targeted.

Datasets

The following datasets are used to extract relevant links from documentation:

StackExchange contains a large dataset of programming related Q&A:

It may be interesting to explore code search and suggestion, in a similar manner.

Preprocessing

Links matching a simple pattern are collected from API documentation.

Sample

The following is an excerpt from the post-processed documentation dataset:

link context source target fragment "qgsprocessingalgorithm.h:223" "orithm::groupIdvirtual QString groupId() constReturns the unique ID of the group this algorithm belongs to. Definition: <<LNK>> " "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsprocessingalgorithm_8h_source.html" "#l00223" "QgsProcessingFeatureBasedAlgorithm" " <<LNK>> An abstract QgsProcessingAlgorithm base class for processing algorithms which operate "feature-by-fea...Definition: qgsp" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/classQgsProcessingFeatureBasedAlgorithm.html" "" "qgsprocessingalgorithm.h:867" "ithmAn abstract QgsProcessingAlgorithm base class for processing algorithms which operate "feature-by-fea...Definition: <<LNK>> " "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsprocessingalgorithm_8h_source.html" "#l00867"

Experiments

  • Compare doc2vec with keyphrase / bag-of-words extraction.
  • Compare in-vocabulary to out-of-vocabulary retrieval precision.
  • Stack trace entity alignment to e.g. GitHub lines of code.
  • IDE based context alignment to e.g. StackOverflow issues.

References

Owner

  • Name: breandan
  • Login: breandan
  • Kind: user

GitHub Events

Total
Last Year

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

preprocessing/build.gradle.kts maven
  • com.github.ISCAS-PMC:roll-library -SNAPSHOT implementation
  • com.github.breandan:kaliningraph 0.0.2 implementation
  • com.github.breandan:progex master-SNAPSHOT implementation
  • com.github.ghaffarian:graphs master-SNAPSHOT implementation
  • com.github.ghaffarian:nanologger master-SNAPSHOT implementation
  • com.google.guava:guava 31.1-jre implementation
  • edu.stanford.nlp:stanford-corenlp 3.9.2 implementation
  • io.github.vovak.astminer:astminer 0.6 implementation
  • me.xdrop:fuzzywuzzy 1.4.0 implementation
  • org.apache.commons:commons-compress 1.21 implementation
  • org.apache.commons:commons-vfs2 2.9.0 implementation
  • org.apache.lucene:lucene-analyzers-common 8.11.2 implementation
  • org.apache.lucene:lucene-core 9.2.0 implementation
  • org.apache.lucene:lucene-queryparser 9.2.0 implementation
  • org.jsoup:jsoup 1.15.2 implementation