networks_gephi

RISE crash course: Gephi

https://github.com/rise-unibas/networks_gephi

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
    Organization rise-unibas has institutional domain (rise.unibas.ch)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.7%) to scientific vocabulary

Keywords

gephi network-analysis
Last synced: 6 months ago · JSON representation ·

Repository

RISE crash course: Gephi

Basic Info
  • Host: GitHub
  • Owner: RISE-UNIBAS
  • Default Branch: main
  • Homepage:
  • Size: 1.45 MB
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
gephi network-analysis
Created about 3 years ago · Last pushed 11 months ago
Metadata Files
Readme Citation

README.md

Network analysis in the Humanities. Gephi

Introduction course to Network analysis and visualization with Gephi.

By José Luis Losada

☞ Course outline

Network analysis in the Humanities

Showcase

Networks

|network |nodes|edges| |--|--|--| |Theater Plays|character|co-appearance on the scene| |Stylometry|plays|stylistic similarity| |Scientific collaboration|authors|co-authoring| |...|... |...|

  • Method of representing connection or interaction patterns between parts of a system.

  • The concept of network supposes a relational structure that can be studied (1) in a logical and mathematical way: Graph theory (discipline). History: Euler and the seven bridges of Königsberg.

  • (2) Exploration through visualization.

“Networks are extraordinary calculating devices, but they are also maps, instruments of navigation and representation” (Jacomy 2017: 155).

Basic concepts. Nodes and edges

  • Network: points joined by lines.
  • points: nodes or vertices.
  • lines: edges o links.
  • Attributes: extra information about nodes or edges
  • Types of networks:

Simple Network

Bipartite Network

Multiple Network

Multiple and Directed Network

Formalization and file formats

Formalization

Edgelists, matrices, adjacency lists

Edgelist: it is a set of structured data that contains at least two columns: a column of nodes that are the source of a connection (source) and another column of nodes that are the destination of the connection (target). The rest of the columns correspond to the attributes.

|source |target|weight|lang|type| |-------|------|----|-----|----| |Juan|Elena |4 |esp |undirected| |Juan|Hans |2 |de |undirected| |Juan|Marta |1 |eng |undirected| |Juan|Marek |1 |de |undirected| |...|... |... |...|...|

Adjacency matrix: a square matrix (equal number of columns and rows)

| |Juan|Hans|Elena|Marta|Marek| |--|--|------|----|-----|----| |Juan|0|1|1|1|1| |Hans|1|0|0|1|1| |Elena|1|0|0|0|0| |Marta|1|1|0|0|0| |Marek|1|1|0|0|0|

File Formats

  • CSV. Edgelist in CSV:

source,target,language,weight Juan,Elena,esp,4 Juan,Hans,de,2 Juan,Marta,eng,1 Juan,Marek,de,1 Juan,Marek,esp,1 Juan,Marek,pol,5 Hans,Marta,eng,1 Hans,Marek,de,1

  • CSV. Edgelist + Nodes in CSV:

``` source,target 1,4 1,2 1,3

id,Label 1,Juan 2,Hans 3,Marta 4,Elena ``` It is recommended to save structured data in CSV, although Gephi accepts tables in Excel.

  • gexf (XML)

xml [...] <node id="Marek" label="Marek"> <attvalues> <attvalue for="att1" value="2.0"/> </attvalues> <viz:size value="4.0"/> <viz:position x="-22.013721" y="26.080078"/> <viz:color r="255" g="99" b="71"/> </node> </nodes> <edges> <edge id="0" source="Juan" target="Hans" weight="2.0"/> <edge id="1" source="Juan" target="Elena" weight="4.0"/> <edge id="2" source="Juan" target="Marta"/> <edge id="3" source="Juan" target="Marek" weight="7.0"/> <edge id="4" source="Hans" target="Marta"/> <edge id="5" source="Hans" target="Marek"/> </edges> </graph> </gexf>

Visualization (spatialization)

Same graph, different layout.

Bipartite network

Algorithms for drawing the graph

  • Common Gephi Algorithms: Force Atlas, Fruchterman Reingold,...

Metrics

  • Degree centrality: nº of connections.
  • Betweenness centrality: bridge nodes.
  • Eigenvector centrality: nodes connected to well-connected nodes.
  • Modularity (Louvain, Leiden algorithms): clusters of nodes.
  • ...

degree-distribution

Tools

Workflow: from data to visualization.

work flow

  • Programming languages (full workflow): R, Python, JavaScript,...
  • OpenRefine, Table2net,...
  • Tableau, Nodegoat,...
  • Gephi, Cytoscape, VOSviewer, Graphext, orange,...

Gephi. Open Graph Viz Platform

Gephi has restarted its development in recent years. It can be downloaded from its https://gephi.org page or directly from the repository on github gephi/releases.

One of the advantages of the new versions (since 0.9.3) is that it already comes with Java (program language and execution environment for programs such as Gephi). More about the installation at https://gephi.org/users/install/.

New in 2023! Gephi Lite

Interface: Panel Overview

Plugins for Gephi:

They are located in Tools > Plugin. They add extra functionalities to Gephi (metrics, import, export, spatializations, ...).

  • Multimode networks transformation: it projects a bipartite network into a simple one.

  • Sigma exporter: it exports the graph to visualize it dynamically using javascript and html.

  • Leiden algorithm: Modularity algorithm.

Data for this course

CSV and GEXF files are located in the folder /data in this repository

Theater

Co-appearance character networks in theater. The source of the data is https://dracor.org, from where they can be downloaded; I add them to /data just as back up copy.

  • calderon_VidaEsSueno_ezlinavis.csv
  • span000014-valle-luces.gexf

Literary awards

35 literary awards and 1325 award-winning authors: data obtained from Wikidata. CSV table with 3 variables: prizes, winners and gender (masc./fem.); bipartite network and simple networks in GEXF format.

  • authors_and_awards.csv
  • authors_and_awards.gexf
  • authors.gexf
  • awards.gexf

Dataset (+ node and egdes lists) is available in editio/premios-literarios and Zenodo: José Luis Losada (2022) DOI

Stylometry

Stylometry Network of plays of 17th. C. Spanish Theater. The nodes represent plays linked according to their stylistic similarity. Analysis performed using the consensus tree (2000-5000 MFW) and Delta distance with the R package, stylo (Eder, Rybicki and Kestemont, 2016), on a corpus of circa 700 plays and 50 authors. Interactive visualization in: Stylometry on Drama

  • stylometry_theater.gexf

Bibliography

Co-authoring network of 3500 publications on Stylometry. The bibliography has been compiled by Christof Schöch, Bibliography on Stylometry, 2017, DOI: 10.5281/zenodo.835190.

  • biblio_stylo.gexf

Correspondence

Correspondence network of Alexander von Humboldt (sample of 105 letters). Data obtained from edition humboldt digital (CC BY-SA 4.0.) Sender, receiver, and date sent extracted from letters encoded in TEI.

  • humboldt_edgelist.csv
  • humboldt_network.gexf

Step-by-step instructions

Character networks

☞ Practice the basics of an edgelist, how to load it into Gephi and perform the first steps of visualization and metrics.

  1. Dracor > tools > https://ezlinavis.dracor.org > Examples > Calderón de la Barca> download edge list.
  2. Gephi > File > Import spreadsheet (CSV) > next > finish.
  3. Layout: Fruchterman Reingold.
  4. Nodes size based on degree: Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 10 - max. 50].
  5. Nodes labels: "copy data to other column" (Data laboratory). Alternative: "select attributes to display as labels" (Overview).
  6. Centrality measures (Betweenness/Eigenvector): Segismundo vs Clarín (statistics > Network Diameter; Eigenvector Centrality).

☞ Familiarize with GEXF file format, open en Gephi, nodes attribute (male/female).

  1. Dracor > corpora > Spanish Drama Corpus > Valle Inclán, Luces de bohemia > Downloads > Archivo en gexf.
  2. Gephi > open > [no changes] > ok.
  3. Data exploration: label, gender (Data laboratory).
  4. Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender
  5. Layout: Force Atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 40] > run|stop.

From the data to the network: awards and winners

☞ Transform structured data (CSV) into an edgelist (GEXF)

  1. /data > authors_and_awards.csv
  2. table2net (transformation in the browser).
  3. Load table > Type of Network > Nodes > Build the network > Download.
  • 3.1 Network type: bipartite.
  • 3.2 Nodes 1: authors | attribute: masc/fem.
  • 3.3 Nodos 2: awards.

Awards and winners network (1)

☞ Explore bipartite networks.

  1. Gephi > open authors_and_awards.gexf.
  2. Layout: Force Atlas 2 > run|stop; > Prevent overlap > run|stop; Zoom
  3. Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Type
  4. Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree min. 10 - max. 50.
  5. Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.
  6. [reset colors] > Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > gender.

Awards and winners network (2)

☞ Explore simple networks

Files are available in /data/awards.gexf; /data/authors.gexf. They can also be created from the structured data (CSV) with (table2net) o using a transformation from the bipartite network (☞ vide infra).

  1. Gephi > open awards.gexf
    • Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 50]
    • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].
    • Modularity: Community detection > Modularity > run.
    • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.
  • Check centrality metrics:
    • Statistics > eigenvector Centrality.
    • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > eigenvector Centrality.
  1. Gephi > open authors.gexf
    • Layout: Layout: Fruchterman Reingold.
    • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > sexlabel.
    • Appearance > nodes > size [icon circles ] > Ranking > Choose an attribute > Degree [min. 5 - max. 30].

☞ Switching from one type of network to another (projection).

  1. Plugin: multimodal networks transformation.
  • Bipartite Network.
  • Load attributes > type:
    • Award > Author / Author > Award (Simple network of awards)
    • Author > Award / Award > Author (Simple network of authors)
  • Remove nodes, edges.
  • Run.

Stylometry

☞ Explore textual networks

  1. Gephi > open stylometry_theater.gexf.
  2. Layout: Force atlas 2 [Prevent overlap, Disuade Hubs, Scaling = 200].
  3. Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Classes (autores) > Palette > Generate [Limit number of colors: unchecked] > generate.
  4. Appearance > nodes > size [icon circles ] > Unique > size = 20.
  5. Nodes Labels: Show node Labels; More settings > Labels > Hide non-selected.

Compare with modularity algorithms:

  • Modularity: Community detection > Modularity > run.
  • Appearance > nodes > color [icon palette ] > Partition > Choose an attribute > Modularity Class.

Bibliography

☞ Explore disconnected networks

  1. Gephi > open biblio_stylo.gexf.
  2. Layout: Fruchterman Reingold (compare with Force Atlas 2).
  3. Compare with modularity algorithms.

Correspondece

☞ Explore directed networks, Gephi's limits with multiple edges, filters and timelines.

  1. Gephi > File > Import spreadsheet (CSV) > next > Time representation [Intervals] > Finish > Edges merge strategy [Don't merge]
  • Layout: Fruchterman Reingold
  • Nodes labels: "copy data to other column" (Data laboratory) to allow for searching (cmd/ctrl F); (Overview): labels "Hide non-selected"; (Overview): edges "Selection color checked" (in-out).
  1. (Data laboratory) multiple edges? Humboldt -> Ehrenberg

  1. Gephi > File > Import spreadsheet (CSV) [...] Finish > Edges merge strategy [merge] > New workspace.

  2. Filters (see Using filters in Gephi)

  • Filters > Edges > Mutual Edges > Filter
  • Filters > Topology > In Degree | Out Degree > Filter
  1. Timeline
  • Use the network with multiple edges (be aware of the limitations also for the timeline)

  • (Data laboratory) Merge columns > date_sent > columns to merge > merge strategy > Create time interval > Parse dates

  • Enable timeline > Set time format (bottom left) [date format] > Set play settings (bottom left) [one bound].

Out of Gephi: Publication possibilities

☞ Static and dynamic forms of graph representation outside Gephi

  1. Panel Overview: Screeshot (left), More settings (right)...
  2. Panel Preview: export SVG, PNG, PDF.
  3. Plugin: Sigma Exporter. It creates a folder with the required libraries, data and files to display the graph interactively in a browser. It is necessary to upload it to a web server, for example, using Github Pages. For testing purposes, It is possible to launch a local server: Instructions.
  4. Retina (Web app, beta): Visualization in the browser (offline / online) from a GEXF file.
  5. Cosmograph: Visualization in the browser from a .csv file, also timelines.

Tutorials, manuals, references

License

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Owner

  • Name: RISE-UNIBAS
  • Login: RISE-UNIBAS
  • Kind: organization
  • Email: rise@unibas.ch
  • Location: Switzerland

The University of Basel's Research and Infrastructure Support

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Network analysis in the Humanities. Gephi
message: >-
  If you use this dataset, please cite it using the metadata
  from this file.
type: dataset
authors:
  - given-names: 'José Luis '
    family-names: Losada Palenzuela
    email: joseluis.losadapalenzuela@unibas.ch
    affiliation: University of Basel
    orcid: 'https://orcid.org/0000-0002-6530-1328'
repository-code: 'https://github.com/RISE-UNIBAS/networks_gephi'
url: 'https://rise.unibas.ch/'
abstract: >-
  This GitHub repository provides an introductory course to
  network analysis and visualization using Gephi, primarily
  focusing on applications within the humanities. It
  includes course outlines, showcases of network analysis in
  various fields such as literature, history, and cultural
  studies, and hands-on tutorials with data and step-by-step
  instructions. The course covers basic network concepts,
  file formats, metrics, and tools, along with Gephi plugins
  and data for practical exercises. It also explores static
  and dynamic graph representations outside Gephi and offers
  a list of tutorials, manuals, and references for further
  learning
keywords:
  - Network Analysis
  - Gephi
  - Visualization
  - Graphs
  - Humanities Data
license: CC-BY-4.0

GitHub Events

Total
  • Push event: 4
Last Year
  • Push event: 4