https://github.com/dbpedia/extraction-framework

The software used to extract structured data from Wikipedia

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
12 of 101 committers (11.9%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.4%) to scientific vocabulary

Keywords from Contributors

rdf shacl web-ontology-language data-quality data-quality-checks data-validation schema-validation unit-testing langchain

Last synced: 10 months ago · JSON representation

Repository

The software used to extract structured data from Wikipedia

Basic Info

Host: GitHub
Owner: dbpedia
Language: Scala
Default Branch: master
Size: 185 MB

Statistics

Stars: 901
Watchers: 85
Forks: 283
Open Issues: 191
Releases: 6

Created over 13 years ago · Last pushed over 1 year ago

Metadata Files

Readme

DBpedia Information Extraction Framework

Homepage: http://dbpedia.org
Documentation: http://dev.dbpedia.org/Extraction
Get in touch with DBpedia: https://wiki.dbpedia.org/join/get-in-touch
Slack: join the #dev-team slack channel within the the DBpedia Slack workspace - the main point for developement updates and discussions

About DBpedia
Getting Started
The DBpedia Extraction Framework
Contribution Guidelines
- Developer's Certificate of Origin
License

About DBpedia

DBpedia is a crowd-sourced community effort to extract structured information from Wikipedia and make this information available on the Web. DBpedia allows you to ask sophisticated queries against Wikipedia, and to link the different data sets on the Web to Wikipedia data. We hope that this work will make it easier for the huge amount of information in Wikipedia to be used in some new interesting ways. Furthermore, it might inspire new mechanisms for navigating, linking, and improving the encyclopedia itself.
To check out the projects of DBpedia, visit the official DBpedia website.

Getting Started

The Easy Way - Execution using the MARVIN release bot

Running the extraction framework is a relatively complex task which is in details documented in the advanced QuickStart guide. To run the extraction process same as the DBpedia core team does, you can do using the MARVIN release bot. The MARVIN bot automates the overall extraction process, from downloading the ontology, mappings and Wikipedia dumps, to extraction and post-processing the data. ``` git clone https://git.informatik.uni-leipzig.de/dbpedia-assoc/marvin-config cd marvin-config ./setup-or-reset-dief.sh

test run Romanian extraction, very small

./marvinextractionrun.sh test

around 4-7 days

./marvinextractionrun.sh generic ```

Standalone Execution

If you plan to work on improving the codebase of the framework you would need to run the extraction framework alone as described in the QuickStart guide. This is highly recommended, since during this process you will learn a lot about the extraction framework.

Extractors represent the core of the extraction framework. So far, many extractors have been developed for extraction of particular information from different Wikimedia projects. To learn more, check the New Extractors guide, which explains the process of writing new extractor.
Check the Debugging Guide and learn how to debug the extraction framework.

Execution using Apache Spark

In order to speed up the extraction process, the extraction framework has been adopted to run on Apache Spark. Currently, more than half of the extractors can be executed using Spark. The extraction process using Spark is a slightly different process and requires different Execution. Check the QuickStart guide on how to run the extraction using Apache Spark.

Note: if possible, new extractors should be implemented using Apache Spark. To learn more, check the New Extractors guide, which explains the process of writing new extractor.

The DBpedia Extraction Framework

The DBpedia community uses a flexible and extensible framework to extract different kinds of structured information from Wikipedia. The DBpedia extraction framework is written using Scala 2.8. The framework is available from the DBpedia Github repository (GNU GPL License). The change log may reveal more recent developments. More recent configuration options can be found here: https://github.com/dbpedia/extraction-framework/wiki

The DBpedia extraction framework is structured into different modules

Core Module : Contains the core components of the framework.
Dump extraction Module : Contains the DBpedia dump extraction application.

Core Module

Components

Source : The Source package provides an abstraction over a source of Media Wiki pages.
WikiParser : The Wiki Parser package specifies a parser, which transforms an Media Wiki page source into an Abstract Syntax Tree (AST).
Extractor : An Extractor is a mapping from a page node to a graph of statements about it.
Destination : The Destination package provides an abstraction over a destination of RDF statements.

In addition to the core components, a number of utility packages offers essential functionality to be used by the extraction code:

Ontology Classes used to represent an ontology. Methods for both, reading and writing ontologies are provided. All classes are located in the namespace org.dbpedia.extraction.ontology
DataParser Parsers to extract data from nodes in the abstract syntax tree. All classes are located in the namespace org.dbpedia.extraction.dataparser
Util Various utility classes. All classes are located in the namespace org.dbpedia.extraction.util

Dump extraction Module

More recent configuration options can be found here: https://github.com/dbpedia/extraction-framework/wiki/Extraction-Instructions.

To know more about the extraction framework, click here

Contribution Guidelines

If you want to work on one of the issues, assign yourself to it or at least leave a comment that you are working on it and how.
If you have an idea for a new feature, make an issue first, assign yourself to it, then start working.
Please make sure you have read the Developer's Certificate of Origin, further down on this page!

Fork the main extraction-framework repository on GitHub.
Clone this fork onto your machine (git clone <your_repo_url_on_github>).
Switch to the dev branch (git checkout dev).
From the latest revision of the dev branch, make a new development branch from the latest revision. Name the branch something meaningful, for example fixRestApiParams (git checkout dev -b fixRestApiParams).
Make changes and commit them to this branch.
- Please commit regularly in small batches of things "that go together" (for example, changing a constructor and all the instance creating calls). Putting a huge batch of changes in one commit is bad for code reviews.
- In the commit messages, summarize the commit in the first line using not more than 70 characters. Leave one line blank and describe the details in the following lines, preferably in bullet points, like in 7776e31....
When you are done with a bugfix or feature, rebase your branch onto extraction-framework/dev (git pull --rebase git://github.com/dbpedia/extraction-framework.git). Resolve possible conflicts and commit.
Push your branch to GitHub (git push origin fixRestApiParams).
Send a pull request from your branch into extraction-framework/dev via GitHub.
- In the description, reference the associated commit (for example, "Fixes #123 by ..." for issue number 123).
- Your changes will be reviewed and discussed on GitHub.
- In addition, Travis-CI will test if the merged version passes the build.
- If there are further changes you need to make, because Travis said the build fails or because somebody caught something you overlooked, go back to item 4. Stay on the same branch (if it is still related to the same issue). GitHub will add the new commits to the same pull request.
- When everything is fine, your changes will be merged into extraction-framework/dev, finally the dev together with your improvements will be merged with the master branch.

Please keep in mind: - Try not to modify the indentation. If you want to re-format, use a separate "formatting" commit in which no functionality changes are made. - Never rebase the master onto a development branch (i.e. never call rebase from extraction-framework/master). Only rebase your branch onto the dev branch, if and only if nobody already pulled from the development branch! - If you already pushed a branch to GitHub, later rebased the master onto this branch and then tried to push again, GitHub won't let you saying "To prevent you from losing history, non-fast-forward updates were rejected". If (and only if) you are sure that nobody already pulled from this branch, add --force to the push command.
"Don’t rebase branches you have shared with another developer."
"Rebase is awesome, I use rebase exclusively for everything local. Never for anything that I've already pushed."
"Never ever rebase a branch that you pushed, or that you pulled from another person" - In general, we prefer Scala over Java.

More tips: - Guides to setup your development environment for IntelliJ IDEA or Eclipse. - Get help with the Maven build or another form of installation. - Download some data to work with. - How to run from Scala/Java or from a JAR. - Having different troubles? Check the troubleshooting page or post on https://forum.dbpedia.org.

Important: Developer's Certificate of Origin

By sending a pull request to the extraction-framework repository on GitHub, you implicitly accept the Developer's Certificate of Origin 1.1

License

The source code is under the terms of the GNU General Public License, version 2.

Owner

Name: DBpedia
Login: dbpedia
Kind: organization
Email: dbpedia-discussion@lists.sourceforge.net

Website: http://dbpedia.org
Repositories: 119
Profile: https://github.com/dbpedia

GitHub Events

Total

Create event: 1
Commit comment event: 2
Issues event: 7
Watch event: 50
Issue comment event: 47
Push event: 6
Pull request review comment event: 5
Pull request event: 12
Pull request review event: 11
Fork event: 16

Last Year

Create event: 1
Commit comment event: 2
Issues event: 7
Watch event: 50
Issue comment event: 47
Push event: 6
Pull request review comment event: 5
Pull request event: 12
Pull request review event: 11
Fork event: 16

Committers

Last synced: over 2 years ago

All Time

Total Commits: 6,601
Total Committers: 101
Avg Commits per committer: 65.356
Development Distribution Score (DDS): 0.593

Past Year

Commits: 6
Committers: 2
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.167

Top Committers

Name	Email	Commits
Jona Christopher Sahnwaldt	j**t@g**m	2,688
Dimitris Kontokostas	j**t@g**m	1,046
chile12	m**g@g**m	377
	m**y@i**e	256
Jonas Brekle	j**e@g**m	240
Max Jakob	m**b@g**m	228
gaurav	g**v@g**m	209
Daniel Fleischhacker	d**l@i**e	132
Termilion	r**i@l**e	105
jlareck	m**7@g**m	99
Marvin Hofer	v**m@y**e	98
alismayilov	a**v@g**m	86
Andrea Di Menna	n**z@g**m	78
Sebastian Hellmann	h**n@i**e	77
Sebastian Serth	S**h@s**e	68
Jim Regan	j**n@g**m	54
wmaroy	m**r@g**m	49
Nono314	h**o@h**m	45
JJ-Author	J****r	44
kurzum	k**m@g**m	44
hady elsahar	h**r@g**m	40
Andrea Di Menna	a**m@i**m	30
wmaroy	w**y@u**e	29
Lena	l**r@p**e	27
Nilesh Chakraborty	n**h@n**m	26
Julien Cojan	c**l@t**r	22
aklakan	r**n@g**m	22
Pablo Mendes	p**s@g**m	22
feroshjacob	f**b@g**m	20
Paul	P**l@W**1	20
and 71 more...

Committer Domains (Top 20 + Academic)

informatik.uni-leipzig.de: 3 student.hpi.de: 2 inria.fr: 2 vm116.(none): 1 fbk.eu: 1 redaction-developpez.com: 1 data.lirmm.fr: 1 inf.fu-berlin.de: 1 diffbot.com: 1 ontotext.com: 1 student.hpi.uni-potsdam.de: 1 users.sourceforge.net: 1 studserv.uni-leipzig.de: 1 fumi.me: 1 testty.unice.fr: 1 nileshc.com: 1 posteo.de: 1 ugent.be: 1 inqmobile.com: 1 github.com: 1 informatik.uni-mannheim.de: 1 akswnc7.informatik.uni-leipzig.de: 1 wifo5-03.informatik.uni-mannheim.de: 1 dbpedia.informatik.uni-leipzig.de: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 79
Total pull requests: 55
Average time to close issues: about 1 year
Average time to close pull requests: 8 months
Total issue authors: 43
Total pull request authors: 18
Average comments per issue: 3.62
Average comments per pull request: 1.51
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 20

Past Year

Issues: 5
Pull requests: 14
Average time to close issues: 29 days
Average time to close pull requests: 13 days
Issue authors: 4
Pull request authors: 6
Average comments per issue: 3.0
Average comments per pull request: 1.57
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

VladimirAlexiev (15)
kurzum (13)
jimkont (5)
JJ-Author (4)
ninniuz (2)
ghost-2362003 (2)
reckart (2)
s204451 (1)
LorenzBuehmann (1)
redsk (1)
desislava-hristova-ontotext (1)
mubashar1199 (1)
Nono314 (1)
uleodolter (1)
gatemezing (1)

Pull Request Authors

dependabot[bot] (20)
Meti-Adane (12)
haniyakonain (7)
datalogism (5)
deba-iitbh (2)
tech0priyanshu (2)
thoffma (2)
TallTed (1)
contact-andy (1)
Julio-Noe (1)
ninniuz (1)
adrapereira (1)
jlareck (1)
jimkont (1)
ghost-2362003 (1)

Top Labels

Issue Labels

type: data (50) status: fix-required (33) status: minidump-test-required (32) status: triage-discussion-needed (22) GSoC Warmup task (9) status: accepted (8) type: software-bug (8) status: fix-provided (8) enhancement (6) priority (6) type: hosting (5) status: minidump-test-provided (4) question (4) status: cannot reproduce (3) type: sofware-build (3) status: test-method-required (3) status: verification-discussion-needed (3) _DBpedia Live (3) feature-fix-required-by-community (3) dbpedia.org/.* (1) related: Ren & Stimpy (1) de.dbpedia.org (1) feature: template-test (1) type: documentation (1) Needs More Examples (1) status: duplicate (1)

Pull Request Labels

dependencies (20) java (5)

Packages

Total packages: 8
Total downloads: unknown
Total docker downloads: 53

Total dependent packages: 12
(may contain duplicates)
Total dependent repositories: 39
(may contain duplicates)
Total versions: 20

repo1.maven.org: org.dbpedia.extraction:core

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/core/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 11
Dependent Repositories: 23
Docker Downloads: 53

Rankings

Dependent repos count: 5.0%

Dependent packages count: 5.6%

Docker downloads count: 5.6%

Average: 8.2%

Forks count: 11.9%

Stargazers count: 12.8%

Last synced: 10 months ago

repo1.maven.org: org.dbpedia.extraction:dump

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/dump/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 1
Dependent Repositories: 12

Rankings

Dependent repos count: 7.1%

Forks count: 11.9%

Stargazers count: 12.8%

Average: 16.2%

Dependent packages count: 33.0%

Last synced: 10 months ago

repo1.maven.org: org.dbpedia.extraction:scripts

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/scripts/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 3

Rankings

Forks count: 11.9%

Stargazers count: 12.8%

Dependent repos count: 13.8%

Average: 22.2%

Dependent packages count: 50.1%

Last synced: 11 months ago

repo1.maven.org: org.dbpedia.extraction:wiktionary

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/wiktionary/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 1

Rankings

Forks count: 11.9%

Stargazers count: 12.8%

Dependent repos count: 20.8%

Average: 23.9%

Dependent packages count: 50.1%

Last synced: 10 months ago

repo1.maven.org: org.dbpedia:extraction

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia/extraction/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 32.0%

Average: 40.4%

Dependent packages count: 48.9%

Last synced: 10 months ago

repo1.maven.org: org.dbpedia.lookup:dbpedia-lookup

DBpedia Lookup is a web service that can be used to look up DBpedia URIs by related keywords

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.lookup/dbpedia-lookup/
License: Apache License, Version 2.0
Latest release: 3.1
published over 10 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 32.0%

Average: 40.4%

Dependent packages count: 48.9%

Last synced: 11 months ago

repo1.maven.org: org.dbpedia.extraction:server

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/server/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 3.9
published over 10 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 32.0%

Average: 40.4%

Dependent packages count: 48.9%

Last synced: 10 months ago

repo1.maven.org: org.dbpedia.extraction:live

DBpedia Extraction Framework is a flexible and extensible framework to extract different kinds of structured information from Wikipedia

Homepage: http://www.dbpedia.org
Documentation: https://appdoc.app/artifact/org.dbpedia.extraction/live/
License: GNU GENERAL PUBLIC LICENSE
Latest release: 4.1
published over 10 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 32.0%

Average: 40.4%

Dependent packages count: 48.9%

Last synced: 11 months ago

Dependencies

core/pom.xml maven

com.fasterxml.jackson.core:jackson-databind
com.fasterxml.jackson.module:jackson-module-scala_${scala.compat.version}
com.google.guava:guava
org.apache.commons:commons-compress
org.apache.httpcomponents:httpclient
org.apache.jena:jena-arq
org.jsoup:jsoup
org.scala-lang:scala-library
org.scala-lang:scala-xml
org.scalaj:scalaj-http_${scala.compat.version}
org.scalatest:scalatest_${scala.compat.version}
org.sweble.wikitext:swc-engine
org.sweble.wom3:sweble-wom3 2.1.0
org.wikidata.wdtk:wdtk-datamodel
org.wikidata.wdtk:wdtk-dumpfiles
uk.ac.ed.ph.snuggletex:snuggletex-core
uk.ac.ed.ph.snuggletex:snuggletex-jeuclid
uk.ac.ed.ph.snuggletex:snuggletex-upconversion

dump/pom.xml maven

com.github.pathikrit:better-files_${scala.compat.version}
com.github.scopt:scopt_2.11
com.google.guava:guava
commons-io:commons-io 2.6
mysql:mysql-connector-java
org.aksw.rdfunit:rdfunit-validate
org.apache.hadoop:hadoop-common 3.2.1
org.apache.hadoop:hadoop-mapreduce-client-core 3.2.1
org.apache.spark:spark-core_${scala.compat.version}
org.apache.spark:spark-sql_${scala.compat.version}
org.dbpedia.extraction:core
org.dbpedia.extraction:scripts
org.scala-lang:scala-xml
org.scalatest:scalatest_${scala.compat.version} 3.0.8 test

live/pom.xml maven

org.slf4j:slf4j-api 1.7.7 compile
org.slf4j:slf4j-log4j12 1.7.7 compile
com.fasterxml.jackson.module:jackson-module-scala_2.11 2.5.2
com.google.code.gson:gson 2.2.2
com.jolbox:bonecp 0.8.0.RELEASE
com.lightbend.akka:akka-stream-alpakka-sse_2.11 0.10
com.typesafe.akka:akka-actor_2.11 2.5.19
com.typesafe.akka:akka-slf4j_2.11 2.5.19
com.typesafe.akka:akka-stream_2.11 2.5.19
io.socket:socket.io-client 0.2.1
log4j:log4j 1.2.17
mysql:mysql-connector-java 5.1.26
net.sourceforge.collections:collections-generic 4.01
org.apache.commons:commons-lang3 3.1
org.dbpedia.extraction:core
org.dspace:oclc-harvester2 0.1.12
org.ini4j:ini4j 0.5.2
org.scala-lang:scala-xml
org.scalatest:scalatest_2.11
xalan:xalan 2.7.1

pom.xml maven

com.databricks:spark-xml_2.11 0.4.1
com.fasterxml.jackson.core:jackson-core 2.6.0
com.fasterxml.jackson.core:jackson-databind 2.6.0
com.fasterxml.jackson.module:jackson-module-scala_2.11 2.6.0
com.github.pathikrit:better-files_2.11 3.8.0
com.github.scopt:scopt_2.11 3.7.1
com.google.guava:guava 26.0-jre
mysql:mysql-connector-java 5.1.20
org.apache.commons:commons-compress 1.4.1
org.apache.httpcomponents:httpclient 4.3.4
org.apache.jena:jena-arq 3.7.0
org.apache.spark:spark-core_2.11 2.2.1
org.apache.spark:spark-sql_2.11 2.2.1
org.dbpedia.extraction:core 4.2-SNAPSHOT
org.dbpedia.extraction:scripts 4.2-SNAPSHOT
org.jsoup:jsoup 1.8.3
org.scala-lang:scala-actors 2.11.4
org.scala-lang:scala-library 2.11.4
org.scala-lang:scala-reflect 2.11.4
org.scala-lang:scala-xml 2.11.0-M4
org.scalaj:scalaj-http_2.11 2.2.1
org.sweble.wikitext:swc-engine 2.1.0
org.sweble.wom3:sweble-wom3 2.1.0
org.wikidata.wdtk:wdtk-datamodel 0.11.1
org.wikidata.wdtk:wdtk-dumpfiles 0.8.0
uk.ac.ed.ph.snuggletex:snuggletex-core 1.2.2
uk.ac.ed.ph.snuggletex:snuggletex-jeuclid 1.2.2
uk.ac.ed.ph.snuggletex:snuggletex-upconversion 1.2.2
junit:junit 4.12 test
org.aksw.rdfunit:rdfunit-core 0.8.21 test
org.aksw.rdfunit:rdfunit-io 0.8.21 test
org.aksw.rdfunit:rdfunit-model 0.8.21 test
org.aksw.rdfunit:rdfunit-validate 0.8.21 test
org.scalatest:scalatest_2.11 2.2.1 test

scripts/pom.xml maven

com.fasterxml.jackson.core:jackson-core 2.5.0
commons-validator:commons-validator 1.5.1
org.dbpedia.extraction:core
org.openrdf.sesame:sesame-rio-jsonld 2.8.0
org.openrdf.sesame:sesame-rio-nquads 2.8.0
org.openrdf.sesame:sesame-rio-rdfxml 2.8.0
org.openrdf.sesame:sesame-rio-turtle 2.8.0
org.scala-lang:scala-reflect 2.11.4
org.scalaj:scalaj-http_2.11 2.2.1
org.scalatest:scalatest_2.11

server/pom.xml maven

com.sun.jersey:jersey-server 1.12
org.apache.jena:apache-jena-libs 3.7.0
org.dbpedia.extraction:core
org.scala-lang:scala-actors
org.scala-lang:scala-xml
org.scalatest:scalatest_2.11

wiktionary/pom.xml maven

org.dbpedia.extraction:core
org.dbpedia.extraction:dump ${project.version}
org.openrdf:openrdf-model 1.2.7
org.springframework:spring 2.5.6

.github/workflows/maven.yml actions

act10ns/slack v1 composite
actions/checkout v2 composite
actions/setup-java v1 composite

.github/workflows/minidumpdoc.yml actions

EndBug/add-and-commit v4.4.0 composite
actions/checkout v2 composite
actions/setup-java v1 composite

Dockerfile docker

maven 3-jdk-8 build

https://github.com/dbpedia/extraction-framework

Science Score: 36.0%

Keywords from Contributors

Repository

Basic Info

Statistics

Metadata Files

README.md

DBpedia Information Extraction Framework

Contents

About DBpedia

Getting Started

The Easy Way - Execution using the MARVIN release bot

test run Romanian extraction, very small

around 4-7 days

Standalone Execution

Execution using Apache Spark

The DBpedia Extraction Framework

Core Module

Dump extraction Module

Contribution Guidelines

Important: Developer's Certificate of Origin

License

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

repo1.maven.org: org.dbpedia.extraction:core

Rankings

repo1.maven.org: org.dbpedia.extraction:dump

Rankings

repo1.maven.org: org.dbpedia.extraction:scripts

Rankings

repo1.maven.org: org.dbpedia.extraction:wiktionary

Rankings

repo1.maven.org: org.dbpedia:extraction

Rankings

repo1.maven.org: org.dbpedia.lookup:dbpedia-lookup

Rankings

repo1.maven.org: org.dbpedia.extraction:server

Rankings

repo1.maven.org: org.dbpedia.extraction:live

Rankings

Dependencies