Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 3 committers (33.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: juliandehne
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 80 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

Delab Trees

A library to analyze conversation trees.

Installation

pip install delab_trees

Learning Resource

In order to learn more about the library including some metrics and research questions that can be answered, have a look at the more detailed jupyter notebook. It uses the Quarto Jupyterlab Extension for rendering. Use this for a better readability.

python python3 -m pip install jupyterlab-quarto

Get started

Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delabtrees/data/dataset[reddit|twitter]notext.pkl. The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic access have been omitted.

The trees are loaded from tables like this:

| | treeid | postid | parentid | authorid | text | created_at | |---:|----------:|----------:|------------:|:------------|:------------|:--------------------| | 0 | 1 | 1 | nan | james | I am James | 2017-01-01 01:00:00 | | 1 | 1 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 | | 2 | 1 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 | | 3 | 1 | 4 | 1 | john | I am John | 2017-01-01 04:00:00 | | 4 | 2 | 1 | nan | james | I am James | 2017-01-01 01:00:00 | | 5 | 2 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 | | 6 | 2 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 | | 7 | 2 | 4 | 3 | john | I am John | 2017-01-01 04:00:00 |

This dataset contains two conversational trees with four posts each.

Currently, you need to import conversational tables as a pandas dataframe like this:

Explanation

The TreeManager object in delab_trees holds a dictionary of trees. The keys in this dictionary are tree_ids, and the values are the actual tree structures, each represented as a DelabTree object. This setup allows you to access individual conversation trees by their unique ID, enabling easy retrieval and manipulation of specific trees.

Each DelabTree instance contains two main attributes:

  • self.reply_graph: a NetworkX DiGraph representation of the conversation as a directed graph.
  • self.df: a pandas DataFrame representation of the conversation as a table, preserving the structure and metadata of each post.

Example

To demonstrate, let’s create a sample dataset and initialize the TreeManager. We'll then access an individual tree using its ID and show both representations (graph and table) of that tree.

```python import pandas as pd from delab_trees import TreeManager

Sample dataset with two separate conversational trees (tree_id 1 and 2)

data = { 'treeid': [1, 1, 1, 1, 2, 2, 2, 2], 'postid': [1, 2, 3, 4, 1, 2, 3, 4], 'parentid': [None, 1, 2, 1, None, 1, 2, 3], 'authorid': ["james", "mark", "steven", "john", "james", "mark", "steven", "john"], 'text': ["I am James", "I am Mark", "I am Steven", "I am John", "I am James again", "I am Mark again", "I am Steven again", "I am John again"], 'created_at': [pd.Timestamp('2017-01-01T01'), pd.Timestamp('2017-01-01T02'), pd.Timestamp('2017-01-01T03'), pd.Timestamp('2017-01-01T04'), pd.Timestamp('2018-01-01T01'), pd.Timestamp('2018-01-01T02'), pd.Timestamp('2018-01-01T03'), pd.Timestamp('2018-01-01T04')] }

Load data into a DataFrame

df = pd.DataFrame(data)

Initialize TreeManager with the DataFrame

manager = TreeManager(df)

Access the dictionary of trees

trees = manager.trees

Verify the structure of the dictionary (keys and types of values)

print("Keys in TreeManager dictionary (tree IDs):", trees.keys()) # Output: dict_keys([1, 2])

Access a specific tree by tree_id

treeid = 1 tree = trees[treeid]

Display the NetworkX graph representation of the tree

print("Graph representation of tree:", tree.reply_graph)

Display the DataFrame (table) representation of the tree

print("Table representation of tree:\n", tree.df) ```

You can now analyze the reply trees basic metrics:

```python from delabtrees.main import gettesttree from delabtrees.delab_tree import DelabTree

testtree : DelabTree = gettesttree() assert testtree.totalnumberofposts() == 4 assert testtree.averagebranchingfactor() > 0 ```

A summary of basic metrics can be attained by calling

```python from delabtrees.testdatamanager import gettesttree from delabtrees.delab_tree import DelabTree

testtree : DelabTree = gettesttree() print(testtree.getauthormetrics())

>>> removed [] and changed {} (merging subsequent posts of the same author)

>>>{'james': , 'steven': , 'john': , 'mark': }

```

Library Functions

| Function Name | Parameters | Return Type | Description | |--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------------------| | __init__ | df: pd.DataFrame, g: MultiDiGraph = None | None | Initializes a DelabTree instance with a pandas DataFrame and an optional graph. | | __str__ | None | str | Returns a string representation of the DelabTree. | | from_recursive_tree | root_node: TreeNode | DelabTree | Creates a DelabTree instance from a TreeNode recursive structure. | | branching_weight | None | float | Calculates the branching weight of the tree. | | average_branching_factor | None | float | Computes the average branching factor of the tree. | | root_dominance | None | float | Computes a dominance metric for the root author in the conversation. | | total_number_of_posts | None | int | Returns the total number of posts in the conversation. | | depth | None | int | Calculates the depth of the reply graph. | | as_reply_graph | None | DiGraph | Generates a directed reply graph from the DataFrame. | | as_author_graph | None | MultiDiGraph | Creates a directed graph combining reply and author relations. | | as_author_interaction_graph | None | DiGraph | Projects the author interaction graph to capture who answered whom. | | as_tree | None | DiGraph | Creates a tree representation of the reply graph using BFS traversal. | | as_post_list | None | list[DelabPost] | Converts the DataFrame to a list of DelabPost objects. | | as_recursive_tree | None | TreeNode | Converts the DataFrame into a recursive tree structure. | | as_biggest_connected_tree | stateless: bool = True | Union[DelabTree, Graph] | Finds and returns the largest connected component of the reply graph. | | as_removed_cycles | as_delab_tree: bool = True, compute_arborescence: bool = False | Union[DiGraph, DelabTree] | Removes cycles in the reply graph and optionally computes a minimum spanning arborescence. | | as_attached_orphans | as_delab_tree: bool = True | Union[DelabTree, DiGraph] | Attaches orphaned nodes to the root node to recreate a similar tree structure. | | as_merged_self_answers_graph | as_delab_tree: bool = True, return_deleted: bool = False | Union[DiGraph, DelabTree, tuple] | Merges sequential posts by the same author into one post, returning a new DelabTree. | | as_flow_duo | min_length_flows: int = 6, min_post_branching: int = 3, min_pre_branching: int = 3, metric: str = "sentiment", verbose: bool = False | FLowDuo | Computes the two conversation flows with the highest difference in a specified metric. | | get_conversation_flows | as_list: bool = False | tuple[dict[str, list[DelabPost]], str] | Returns all conversation flows (paths from root to leaf). | | get_flow_candidates | length_flow: int, filter_function: Callable[[list[DelabPost]], bool] = None | list[list[DelabPost]] | Filters conversation flows by length and an optional filter function. | | get_author_metrics | None | dict[str, AuthorMetric] | Computes centrality metrics (closeness, betweenness, Katz) for each author. | | get_average_author_metrics | None | AuthorMetric | Calculates average centrality metrics for all authors in the graph. | | get_baseline_author_vision | None | dict[str, float] | Computes a baseline vision score for each author based on reply behavior. | | get_single_author_metrics | author_id: str | AuthorMetric | Returns centrality metrics for a specific author. | | validate_internal_structure | None | None | Validates that the DataFrame has unique post IDs and aligns with the graph structure. | | validate | verbose: bool = True, check_for: str = "all", check_time_stamps_differ: bool = True | bool | Validates the graph structure, including cycles, connectivity, and node names. | | paint_faulty_graph | None | None | Visualizes a faulty reply graph with truncated node labels. | | paint_reply_graph | None | None | Draws the reply graph in a circular layout. | | paint_author_graph | None | None | Visualizes the author interaction graph. |

Advanced Use Cases

More complex metrics that use the full dataset for training can be gotten by the manager:

```python import pandas as pd from delab_trees import TreeManager

d = {'treeid': [1] * 4, 'postid': [1, 2, 3, 4], 'parentid': [None, 1, 2, 1], 'authorid': ["james", "mark", "steven", "john"], 'text': ["I am James", "I am Mark", " I am Steven", "I am John"], "createdat": [pd.Timestamp('2017-01-01T01'), pd.Timestamp('2017-01-01T02'), pd.Timestamp('2017-01-01T03'), pd.Timestamp('2017-01-01T04')]} df = pd.DataFrame(data=d) manager = TreeManager(df) # creates one tree rbvisiondictionary : dict["treeid", dict["authorid", "visionmetric"]] = manager.getrbvision() ```

The following two complex metrics are implemented:

```python from delabtrees.testdatamanager import gettest_manager

manager = gettestmanager() rbvisiondictionary = manager.getrbvision() # predict an author having seen a post pbvisiondictionary = manager.getpbvision() # predict an author to write the next post ```

How to cite

```latex @article{dehnedtrees23, author = {Dehne, Julian}, title = {Delab-Trees: measuring deliberation in online conversations},
url = {https://github.com/juliandehne/delab-trees}
year = {2023}, }

```

Owner

  • Login: juliandehne
  • Kind: user

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in your research, please cite it using the following metadata."
title: "delab-trees, a python library to analyze conversation trees"
version: 0.3.4
date-released: 2023-09-18

authors:
  - given-names: Julian
    family-names: Dehne
    affiliation: "University of Göttingen"
    orcid: "https://orcid.org/0000-0001-9265-9619"

repository-code: "https://github.com/juliandehne/delab-trees"
license: MIT

GitHub Events

Total
  • Push event: 9
Last Year
  • Push event: 9

Committers

Last synced: 11 months ago

All Time
  • Total Commits: 83
  • Total Committers: 3
  • Avg Commits per committer: 27.667
  • Development Distribution Score (DDS): 0.024
Past Year
  • Commits: 16
  • Committers: 2
  • Avg Commits per committer: 8.0
  • Development Distribution Score (DDS): 0.063
Top Committers
Name Email Commits
Julian Dehne j****e@g****m 81
pvonderhaar a****1@s****e 1
dehnejn j****e@g****g 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: about 2 hours
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • v-gold (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 24 last-month
  • Total dependent packages: 1
  • Total dependent repositories: 0
  • Total versions: 21
  • Total maintainers: 1
pypi.org: delab-trees

a library to analyse reply trees in forums and social media

  • Versions: 21
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 24 Last month
Rankings
Dependent packages count: 7.0%
Downloads: 13.3%
Average: 16.9%
Dependent repos count: 30.5%
Maintainers (1)
Last synced: 10 months ago

Dependencies

setup.py pypi
  • numpy *
  • tensorflow ==2.11.0
requirements_old.txt pypi
  • keras *
  • matplotlib *
  • networkx *
  • numpy ==1.22.3
  • pandas *
  • pytest ==7.1.2
  • scikit-learn *
  • setuptools *
  • tensorflow *
  • tqdm *