neo4j-graph-database
Modeling a high-energy physics citation network in Neo4j, importing data from CSV files, and executing Cypher queries. Tasks include designing a property graph model, importing data, and querying the database for various insights. The deliverables comprise a report detailing the graph model, import commands, Cypher queries with results and script
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Repository
Modeling a high-energy physics citation network in Neo4j, importing data from CSV files, and executing Cypher queries. Tasks include designing a property graph model, importing data, and querying the database for various insights. The deliverables comprise a report detailing the graph model, import commands, Cypher queries with results and script
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
Neo4j Graph Database Assignment
Assignment 2 for the Mining Big Datasets Course of AUEB's MSc in Business Analytics.
Dataset
You are provided with a subset of the high energy physics theory citation network, comprising authors, articles, journals, and citations between articles. The dataset contains: - 29,555 articles with id, title, year, journal, and abstract - 15,420 authors with names - 836 journals with names - 352,807 citations among papers You can download the dataset (Citation Dataset) from moodle in CSV format. The dataset files include: - ArticleNodes.csv: Information about Article nodes (id, title, year, journal, and abstract). - AuthorNodes.csv: Article id and the name of the author(s). - Citations.csv: Information about citations between articles (articleId,--[Cites]->, articleId).
Property Graph Model
Model the data as a property graph by designing the appropriate entities and assigning the relevant labels, types, and properties. Include attributes that describe each node and edge type without repetitions. Ensure nodes are connected only when necessary.
Importing the Dataset into Neo4j
Create a graph database on Neo4j and load the citation network elements using the provided CSV files. You can load the dataset directly from the CSV files using the Neo4j browser, Neo4j import tool, or any supported programming language. Consider creating proper indexes on your model properties to improve loading and query response times.
Querying the Database
Execute the following queries using the Cypher language:
- Identify the top 5 authors with the most citations from other papers.
- Determine the top 5 authors with the most collaborations with different authors.
- Find the author who has written the most papers without collaborations.
- Discover the author who published the most papers in 2001.
- Identify the journal with the most papers about "gravity" in 1998.
- Find the top 5 papers with the most citations.
- Retrieve papers that mention both "holography" and "anti de sitter" in the abstract.
- Find the shortest path between two authors ('C.N. Pope' and 'M. Schweda').
- Repeat the previous query but only using edges between authors and papers.
- Find all authors with shortest path lengths > 25 from author 'Edward Witten' considering only edges between authors and articles.
Assignment Handout
Your deliverable should include: 1. Report.pdf: - Detailed graph model description. - Commands used for importing files to the database. - Cypher code for required queries with results. 2. Program/Script: Implementations for any step of the assignment. 3. queries.cy: A text file containing the queries expressed in Cypher language.
Owner
- Name: Lefteris Souflas
- Login: Lefteris-Souflas
- Kind: user
- Location: Athens, Greece
- Repositories: 1
- Profile: https://github.com/Lefteris-Souflas
Data-Engineering / Data-Science / Business-Analytics enthusiast, holding a Master of Science in Business Analytics with a specialization in DB Administration
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0