tfg

Scrape the fediverse to create a social graph

https://github.com/pecuchetian/tfg

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (4.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Scrape the fediverse to create a social graph

Basic Info
  • Host: GitHub
  • Owner: pecuchetian
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 784 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 2 years ago · Last pushed about 2 years ago
Metadata Files
Readme License Citation

README.html













1. Connections


# connect to my hetzner instance

ssh pecuchet@[your-server-address] -i .ssh/id_ed25519

# connect with neo4j and jupyter tunnels


ssh -N -L 8888:[your-server-address]:8888 -L 7474:[your-server-address]:7474 -L 7687:[your-server-address]:7687  [your-server-address]  -i ~/.ssh/id_ed25519

2. Python and jupyter

2.1. Create venv and install jupyter in it


cd /home/pecuchet/UOC

python3 -m venv tfg-venv

source tfg-venv/bin/activate

# python3 -m pip install jupyter

jupyter lab

Start jupyter like this.

Have this script ready
# start_tfg_jupyter
source /home/pecuchet/UOC/tfg-venv/bin/activate

nohup jupyter lab &

REMOTE jupyter without ssh tunnel [instructions here](https://dbusteed.github.io/setup-jupyter-lab-on-remote-server/ "instructions").

Access on local machine [http://[your-server-address]:8000/lab](http://[your-server-address]:8000/lab).

3. Neo4j

Access Neo4j Browser on the server with a ssh tunnel with your local machine.

# use sudo for restarting neo4j

cypher-shell

# usr neo4j pw [PASSWORD]

# ssh tunnel for browser

ssh -N -L 7474:[your-server-address]:7474 -L 7687:[your-server-address]:7687  [your-server-address]  -i ~/.ssh/id_ed25519

# visit on local machine http://localhost:7474/browser/

3.1. Data model

A sample user and server would be:

User: {
  "identity": 0,
  "labels": [
    "User"
  ],
  "properties": {
    "server": "https://mastodon.eugasser.com",
    "followers": 11,
    "following": 62,
    "name": "pecuchet",
    "uri": "https://mastodon.eugasser.com/users/pecuchet"
  },
  "elementId": "0"
}

Server : {
  "identity": 3856,
  "labels": [
    "Server"
  ],
  "properties": {
    "url": "https://mastodon.eugasser.com"
  },
  "elementId": "3856"
}

We have the following relationships.

(:User)-[:FOLLOWS]->(:User)

(:User)-[:IN_COMUNITY]->(:Server)

(:User)-[:SCRAPED_ON]->(:Round)  

3.2. Concurrency/Threading

4. State of things/ TODOS

  1. DONE SOLVE DUPLICATE ENTRIES IN Db
  2. PARALLELIZE [3/4]:
    1. DONE Queue object is SetQueue. Allows control of duplicates.
    2. DONE Clean users.py and implement mastodon api usage on 401 response on AP endpoint
    3. TODO Implement some sort of max retries/endless loop control
    4. DONE Implement a done attribute in neo4j as a FINISHEDON relationship. Create Nodes of type :Timestamp with a timestamp attribute
      1. DONE relationship is (:User)-[SCRAPEDON]->(:Round) where Round has attributes Round.id = 1, 2, 3 and Round.startedon, Round.finishedon timestamp().
      2. DONE Mark (:User) as scraped when we receive non 200 response or 0 friends. Call verifyfriendcount or something.
  3. TODO Fix request response bugs
  4. TODO Improve general scrape speed. Consistency of worker thread number.
  5. TODO Neo4j hangs every 24 hours. Find out why and fix.

Created: 2023-11-21 Tue 12:02

Validate

Owner

  • Login: pecuchetian
  • Kind: user
  • Location: Empordà

Long term website dev. Currently learning Data Science @ UOC

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Fediverse social graph
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Marc
    family-names: Deixt
    email: mdeixt@uoc.edu
    affiliation: UOC
abstract: >-
  Fediverse social graph is a project aimed at creating a
  social graph of fediverse users and their
  following/followers relationships. It is part of the final
  project in the Applied Data Science degree of Marc Deixt.
keywords:
  - networks
  - graph
  - scraping
  - fediverse
  - neo4j
  - python
license: CC-BY-NC-1.0

GitHub Events

Total
Last Year