Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Keywords
Repository
A Crawler to search for keywords and compare the score
Basic Info
Statistics
- Stars: 0
- Watchers: 2
- Forks: 1
- Open Issues: 31
- Releases: 1
Topics
Metadata Files
README.md
UniDisk
A Crawler to search for keywords and compare the score
Installation
Prerequisites
- Java 8
- Maven
- Tomcat 8.X
- IntelliJ Ultimate (preferred)
Installation steps for primefaces
- build an artifact
- mvn install
- mvn test
- run with tomcat and artifact war
Docker
We use Docker to simplify development.
Run docker compose up -d to create a development environment with a MySQL, Solr and Tomcat Server.
The API is then available under localhost:8081/unidisk/rest.
In order for code changes to take effect, the services need to be rebuilt (docker-compose up -d --build). This takes some time therefore
it makes sense to instead kill the web service (docker rm -fv unidisk_api_1) and start the server via IntelliJ (or any other way).
If the server is started via IntelliJ the API is accessible from localhost:8080/unidiskwar/rest_.
Admin Dashboards
Go to localhost:8082 to open Adminer (MySQL Dashboard) or http://localhost:8983 for the Solr dashboard. Enter the following crendetials for Adminer: Server: db Benutzer: user Passwort: secret Datenbank: unidisk
Authentication
We use Firebase for authentication purposes. Accounts must be created via the Firebase console or CLI. Self sign up is currently not possible.
Authentication Method
The authentication method can be changed by modifying the authentication property in unidisk.properties. If the specified value is not firebase all incoming requests are authenticated if any bearer token is set in the request header.
Database
The project contains configuration files for an in memory as well as a MySQL database. The configuration can be changed by replacing the content of the config variable in the HibernateUtil class.
Additional configurations can be added by placing a configuration file into the resource directory. Afterwards reference that file in the HibernateUtil class by changing the config variable to
public class HibernateUtil {
private static final String someOtherConfig = "hibernate.foo.bar.cfg.xml";
private static String config = someOtherConfig;
...
}
Mock setup
You can populate the database by changing the content of the TestSetupBean class. The init function stores predefined data from the ApplicationState in the currently configured database. This happens every time the server starts. It's therefore recommended to remove the content of the init function after the first start of the app (unless you use an in memory database).
In Memory Problems
If you are on pages that require a specific entity id (e.g. project page) and redeploy the server, the former id might not exist anymore and the page is unable to load.
MySQL Development Problems
org.hibernate.id.IdentifierGenerationException: could not read a hi value - you need to populate the table: hibernate_sequence
This error occurs if you truncate the hibernate_sequence table. Delete the database instance and restart the server to fix this problem.
Solr
Follow the installation guide from https://lucene.apache.org/solr/guide/7_0/index.html.
Setup
If solr isn't running execute solr start -p 8983. The further setup will use port 8983 as default.
If you chose another one make sure to adapt the urls and commands.
Firebase
Ask one of the maintainers for the Firebase service account file and place it into crawler/src/main/resources as firebase-sa.json. This is only necessary if you want to use Firebase authentication.
Create Core
Run solr create -c unidisc. After the command finished you should see the message
Created new core 'unidisc' in the console/terminal. You should now see the
unidisc core section at http://localhost:8983/solr/#/unidisc/core-overview.
Verify Setup
The test case testFieldInputAndQuery in SimpleCrawlTest should now run successfully.
You can also run shootTheMoon which crawls websites and posts the result to solr. The test doesn't
terminate but the number of documents in the unidisc core should increase.
Owner
- Name: Jan Bernoth
- Login: B3J4y
- Kind: user
- Twitter: JanBernoth
- Repositories: 19
- Profile: https://github.com/B3J4y
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: UniDisk
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- given-names: Jan
family-names: Bernoth
email: jan.bernoth@uni-potsdam.de
affiliation: University of Potsdam
orcid: 'https://orcid.org/0000-0002-4127-0053'
- given-names: Julian
family-names: Dehne
orcid: 'https://orcid.org/0000-0001-9265-9619'
- given-names: Tim
family-names: Sauvageot
GitHub Events
Total
Last Year
Dependencies
- javax.servlet:javax.servlet-api 3.0.1 provided
- org.apache.tomcat:tomcat-catalina 7.0.40 provided
- com.fasterxml.jackson.core:jackson-annotations 2.11.1
- com.fasterxml.jackson.core:jackson-databind 2.11.1
- com.google.code.gson:gson 2.8.6
- com.google.firebase:firebase-admin 7.3.0
- com.h2database:h2 1.4.197
- com.sun.xml.bind:jaxb-impl 2.3.0
- commons-codec:commons-codec 1.9
- commons-lang:commons-lang 2.6
- edu.uci.ics:crawler4j 4.4.0
- javax.activation:activation 1.1.1
- javax.ws.rs:javax.ws.rs-api 2.0.1
- javax.xml.bind:jaxb-api 2.3.0
- junit:junit 4.13.1
- log4j:log4j 1.2.17
- mysql:mysql-connector-java 8.0.23
- org.apache.commons:commons-lang3 3.5
- org.apache.commons:commons-math3 3.6.1
- org.apache.solr:solr-solrj 8.1.0
- org.glassfish.jaxb:jaxb-runtime 2.3.0
- org.glassfish.jersey.containers:jersey-container-servlet 2.19
- org.glassfish.jersey.core:jersey-client 2.19
- org.glassfish.jersey.core:jersey-server 2.19
- org.glassfish.jersey.media:jersey-media-json-jackson 2.25
- org.hibernate:hibernate-core 5.4.30.Final
- org.hibernate:hibernate-jpamodelgen 5.4.30.Final
- org.jboss.weld.servlet:weld-servlet-shaded 3.0.2.Final
- org.jsoup:jsoup 1.12.1
- uk.co.jemos.podam:podam 7.1.1.RELEASE
- org.junit.jupiter:junit-jupiter-api 5.3.2 test
- org.junit.jupiter:junit-jupiter-engine 5.3.2 test
- org.junit.jupiter:junit-jupiter-params 5.3.2 test
- org.mockito:mockito-core 2.23.4 test
- html-to-image 0.1.1
- 1860 dependencies
- @babel/core ^7.12.13 development
- @babel/plugin-transform-runtime ^7.12.15 development
- @babel/preset-env ^7.12.13 development
- @types/jest ^24.9.1 development
- @types/node ^12.19.16 development
- @types/react ^16.9.34 development
- @types/react-dom ^16.9.6 development
- @types/react-router-dom ^5.1.4 development
- @typescript-eslint/eslint-plugin ^2.34.0 development
- @typescript-eslint/parser ^2.34.0 development
- eslint ^6.6.0 development
- eslint-config-prettier ^6.11.0 development
- eslint-plugin-import ^2.22.0 development
- eslint-plugin-prettier ^3.1.4 development
- eslint-plugin-simple-import-sort ^5.0.3 development
- faker ^4.1.0 development
- jest ^26.6.3 development
- prettier ^2.0.5 development
- ts-jest ^26.5.1 development
- tsconfig-paths ^3.9.0 development
- typescript ~3.9.7 development
- @material-ui/core ^4.9.11
- @material-ui/icons ^4.9.1
- @material-ui/lab ^4.0.0-alpha.50
- @testing-library/jest-dom ^4.2.4
- @testing-library/react ^9.5.0
- @testing-library/user-event ^7.2.1
- @tinymce/tinymce-react ^3.8.0
- axios ^0.21.1
- firebase ^8.6.8
- html-to-image ^1.3.21
- material-table ^1.69.2
- moment ^2.24.0
- ol ^6.5.0
- query-string ^6.12.1
- react ^16.13.1
- react-beautiful-dnd ^13.0.0
- react-dom ^16.13.1
- react-router-dom ^5.1.2
- react-scripts 3.4.1
- styled-components ^5.1.0
- unstated-next ^1.1.0
- unstated-typescript ^2.1.7
- matplotlib *
- pandas *
- seaborn *
- cycler ==0.10.0
- kiwisolver ==1.3.2
- matplotlib ==3.4.3
- numpy ==1.21.2
- pandas ==1.3.3
- pillow ==8.3.2
- pyparsing ==2.4.7
- python-dateutil ==2.8.2
- pytz ==2021.1
- scipy ==1.7.1
- seaborn ==0.11.2
- six ==1.16.0
- fastapi *
- numpy *
- top2vec *
- uvicorn *
- asgiref ==3.4.1
- click ==8.0.1
- cycler ==0.10.0
- cython ==0.29.24
- fastapi ==0.68.1
- gensim ==3.8.3
- h11 ==0.12.0
- hdbscan ==0.8.27
- importlib-metadata ==4.7.1
- joblib ==1.0.1
- kiwisolver ==1.3.2
- llvmlite ==0.37.0
- matplotlib ==3.4.3
- numba ==0.54.0
- numpy ==1.20.3
- pandas ==1.3.2
- pillow ==8.3.1
- pydantic ==1.8.2
- pynndescent ==0.5.4
- pyparsing ==2.4.7
- python-dateutil ==2.8.2
- pytz ==2021.1
- scikit-learn ==0.24.2
- scipy ==1.7.1
- six ==1.16.0
- smart-open ==5.2.1
- starlette ==0.14.2
- threadpoolctl ==2.2.0
- top2vec ==1.0.26
- typing-extensions ==3.10.0.0
- umap-learn ==0.5.1
- uvicorn ==0.15.0
- wordcloud ==1.8.1
- zipp ==3.5.0
- maven 3.6.3-jdk-8-slim build
- tomcat 8.5 build
- adminer latest
- mysql 8
- solr 8
- python 3.7 build