neo4j-graph-database

Modeling a high-energy physics citation network in Neo4j, importing data from CSV files, and executing Cypher queries. Tasks include designing a property graph model, importing data, and querying the database for various insights. The deliverables comprise a report detailing the graph model, import commands, Cypher queries with results and script

https://github.com/lefteris-souflas/neo4j-graph-database

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

cypher-query-language graph-database graph-model neo4j
Last synced: 6 months ago · JSON representation

Repository

Modeling a high-energy physics citation network in Neo4j, importing data from CSV files, and executing Cypher queries. Tasks include designing a property graph model, importing data, and querying the database for various insights. The deliverables comprise a report detailing the graph model, import commands, Cypher queries with results and script

Basic Info
  • Host: GitHub
  • Owner: Lefteris-Souflas
  • License: mit
  • Language: Cycript
  • Default Branch: main
  • Homepage:
  • Size: 11.4 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
cypher-query-language graph-database graph-model neo4j
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Neo4j Graph Database Assignment

Assignment 2 for the Mining Big Datasets Course of AUEB's MSc in Business Analytics.

Dataset

You are provided with a subset of the high energy physics theory citation network, comprising authors, articles, journals, and citations between articles. The dataset contains: - 29,555 articles with id, title, year, journal, and abstract - 15,420 authors with names - 836 journals with names - 352,807 citations among papers You can download the dataset (Citation Dataset) from moodle in CSV format. The dataset files include: - ArticleNodes.csv: Information about Article nodes (id, title, year, journal, and abstract). - AuthorNodes.csv: Article id and the name of the author(s). - Citations.csv: Information about citations between articles (articleId,--[Cites]->, articleId).

Property Graph Model

Model the data as a property graph by designing the appropriate entities and assigning the relevant labels, types, and properties. Include attributes that describe each node and edge type without repetitions. Ensure nodes are connected only when necessary.

Importing the Dataset into Neo4j

Create a graph database on Neo4j and load the citation network elements using the provided CSV files. You can load the dataset directly from the CSV files using the Neo4j browser, Neo4j import tool, or any supported programming language. Consider creating proper indexes on your model properties to improve loading and query response times.

Querying the Database

Execute the following queries using the Cypher language:

  1. Identify the top 5 authors with the most citations from other papers.
  2. Determine the top 5 authors with the most collaborations with different authors.
  3. Find the author who has written the most papers without collaborations.
  4. Discover the author who published the most papers in 2001.
  5. Identify the journal with the most papers about "gravity" in 1998.
  6. Find the top 5 papers with the most citations.
  7. Retrieve papers that mention both "holography" and "anti de sitter" in the abstract.
  8. Find the shortest path between two authors ('C.N. Pope' and 'M. Schweda').
  9. Repeat the previous query but only using edges between authors and papers.
  10. Find all authors with shortest path lengths > 25 from author 'Edward Witten' considering only edges between authors and articles.

Assignment Handout

Your deliverable should include: 1. Report.pdf: - Detailed graph model description. - Commands used for importing files to the database. - Cypher code for required queries with results. 2. Program/Script: Implementations for any step of the assignment. 3. queries.cy: A text file containing the queries expressed in Cypher language.

Owner

  • Name: Lefteris Souflas
  • Login: Lefteris-Souflas
  • Kind: user
  • Location: Athens, Greece

Data-Engineering / Data-Science / Business-Analytics enthusiast, holding a Master of Science in Business Analytics with a specialization in DB Administration

GitHub Events

Total
Last Year

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 7
  • Total Committers: 1
  • Avg Commits per committer: 7.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Lefteris Souflas 1****s 7

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels