https://github.com/breandan/tracelink
🔗 Trace Link Prediction from code to documentation
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
â—‹CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
â—‹.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
â—‹Academic email domains
-
â—‹Institutional organization owner
-
â—‹JOSS paper metadata
-
â—‹Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary
Keywords
Repository
🔗 Trace Link Prediction from code to documentation
Basic Info
Statistics
- Stars: 0
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
TraceLink
The goal of this project is to link source code to documentation and other realated software artifacts. We train a recommender system that suggests a list of documents sorted by their relevance to a given context or code snippet. For uncommon tokens, this should at least include all documents which refer to the token directly (e.g. an inverted index), as well as documents which are semantically or contextually related to the source code in non-obvious ways.
Approach
We train a variational autoencoder and use the encoder to project short sequences of text with their accompanying link into link space. In the same manner, we train a second VAE on documents, to learn a document space embedding. Finally we train a supervised model from link space to document space, i.e. to predict the document(s) which a link with unknown destination may have targeted.
Datasets
The following datasets are used to extract relevant links from documentation:
StackExchange contains a large dataset of programming related Q&A:
It may be interesting to explore code search and suggestion, in a similar manner.
Preprocessing
Links matching a simple pattern are collected from API documentation.
Sample
The following is an excerpt from the post-processed documentation dataset:
link context source target fragment
"qgsprocessingalgorithm.h:223" "orithm::groupIdvirtual QString groupId() constReturns the unique ID of the group this algorithm belongs to. Definition: <<LNK>> " "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsprocessingalgorithm_8h_source.html" "#l00223"
"QgsProcessingFeatureBasedAlgorithm" " <<LNK>> An abstract QgsProcessingAlgorithm base class for processing algorithms which operate "feature-by-fea...Definition: qgsp" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/classQgsProcessingFeatureBasedAlgorithm.html" ""
"qgsprocessingalgorithm.h:867" "ithmAn abstract QgsProcessingAlgorithm base class for processing algorithms which operate "feature-by-fea...Definition: <<LNK>> " "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsalgorithmswapxy_8h_source.html" "QGIS.tgz!/QGIS.docset/Contents/Resources/Documents/qgsprocessingalgorithm_8h_source.html" "#l00867"
Experiments
- Compare doc2vec with keyphrase / bag-of-words extraction.
- Compare in-vocabulary to out-of-vocabulary retrieval precision.
- Stack trace entity alignment to e.g. GitHub lines of code.
- IDE based context alignment to e.g. StackOverflow issues.
References
- Lancer: Your Code Tell Me What You Need, Zhou et al. (2019) [source code]
- TraceSim: A Method for Calculating Stack Trace Similarity, Vasiliev et al. (2020)
- Exploiting Code Knowledge Graph for Bug Localization via Bi-directional Attention, Zhang et al. (2020)
Owner
- Name: breandan
- Login: breandan
- Kind: user
- Website: http://brea.ndan.co
- Twitter: breandan
- Repositories: 185
- Profile: https://github.com/breandan
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- com.github.ISCAS-PMC:roll-library -SNAPSHOT implementation
- com.github.breandan:kaliningraph 0.0.2 implementation
- com.github.breandan:progex master-SNAPSHOT implementation
- com.github.ghaffarian:graphs master-SNAPSHOT implementation
- com.github.ghaffarian:nanologger master-SNAPSHOT implementation
- com.google.guava:guava 31.1-jre implementation
- edu.stanford.nlp:stanford-corenlp 3.9.2 implementation
- io.github.vovak.astminer:astminer 0.6 implementation
- me.xdrop:fuzzywuzzy 1.4.0 implementation
- org.apache.commons:commons-compress 1.21 implementation
- org.apache.commons:commons-vfs2 2.9.0 implementation
- org.apache.lucene:lucene-analyzers-common 8.11.2 implementation
- org.apache.lucene:lucene-core 9.2.0 implementation
- org.apache.lucene:lucene-queryparser 9.2.0 implementation
- org.jsoup:jsoup 1.15.2 implementation