delab-trees
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: juliandehne
- License: mit
- Language: Python
- Default Branch: main
- Size: 80 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Delab Trees
A library to analyze conversation trees.
Installation
pip install delab_trees
Learning Resource
In order to learn more about the library including some metrics and research questions that can be answered, have a look at the more detailed jupyter notebook. It uses the Quarto Jupyterlab Extension for rendering. Use this for a better readability.
python
python3 -m pip install jupyterlab-quarto
Get started
Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delabtrees/data/dataset[reddit|twitter]notext.pkl. The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic access have been omitted.
The trees are loaded from tables like this:
| | treeid | postid | parentid | authorid | text | created_at | |---:|----------:|----------:|------------:|:------------|:------------|:--------------------| | 0 | 1 | 1 | nan | james | I am James | 2017-01-01 01:00:00 | | 1 | 1 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 | | 2 | 1 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 | | 3 | 1 | 4 | 1 | john | I am John | 2017-01-01 04:00:00 | | 4 | 2 | 1 | nan | james | I am James | 2017-01-01 01:00:00 | | 5 | 2 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 | | 6 | 2 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 | | 7 | 2 | 4 | 3 | john | I am John | 2017-01-01 04:00:00 |
This dataset contains two conversational trees with four posts each.
Currently, you need to import conversational tables as a pandas dataframe like this:
Explanation
The TreeManager object in delab_trees holds a dictionary of trees. The keys in this dictionary are tree_ids, and the values are the actual tree structures, each represented as a DelabTree object. This setup allows you to access individual conversation trees by their unique ID, enabling easy retrieval and manipulation of specific trees.
Each DelabTree instance contains two main attributes:
self.reply_graph: a NetworkXDiGraphrepresentation of the conversation as a directed graph.self.df: a pandasDataFramerepresentation of the conversation as a table, preserving the structure and metadata of each post.
Example
To demonstrate, let’s create a sample dataset and initialize the TreeManager. We'll then access an individual tree using its ID and show both representations (graph and table) of that tree.
```python import pandas as pd from delab_trees import TreeManager
Sample dataset with two separate conversational trees (tree_id 1 and 2)
data = { 'treeid': [1, 1, 1, 1, 2, 2, 2, 2], 'postid': [1, 2, 3, 4, 1, 2, 3, 4], 'parentid': [None, 1, 2, 1, None, 1, 2, 3], 'authorid': ["james", "mark", "steven", "john", "james", "mark", "steven", "john"], 'text': ["I am James", "I am Mark", "I am Steven", "I am John", "I am James again", "I am Mark again", "I am Steven again", "I am John again"], 'created_at': [pd.Timestamp('2017-01-01T01'), pd.Timestamp('2017-01-01T02'), pd.Timestamp('2017-01-01T03'), pd.Timestamp('2017-01-01T04'), pd.Timestamp('2018-01-01T01'), pd.Timestamp('2018-01-01T02'), pd.Timestamp('2018-01-01T03'), pd.Timestamp('2018-01-01T04')] }
Load data into a DataFrame
df = pd.DataFrame(data)
Initialize TreeManager with the DataFrame
manager = TreeManager(df)
Access the dictionary of trees
trees = manager.trees
Verify the structure of the dictionary (keys and types of values)
print("Keys in TreeManager dictionary (tree IDs):", trees.keys()) # Output: dict_keys([1, 2])
Access a specific tree by tree_id
treeid = 1 tree = trees[treeid]
Display the NetworkX graph representation of the tree
print("Graph representation of tree:", tree.reply_graph)
Display the DataFrame (table) representation of the tree
print("Table representation of tree:\n", tree.df) ```
You can now analyze the reply trees basic metrics:
```python from delabtrees.main import gettesttree from delabtrees.delab_tree import DelabTree
testtree : DelabTree = gettesttree() assert testtree.totalnumberofposts() == 4 assert testtree.averagebranchingfactor() > 0 ```
A summary of basic metrics can be attained by calling
```python from delabtrees.testdatamanager import gettesttree from delabtrees.delab_tree import DelabTree
testtree : DelabTree = gettesttree() print(testtree.getauthormetrics())
>>> removed [] and changed {} (merging subsequent posts of the same author)
>>>{'james': , 'steven': , 'john': , 'mark': }
```
Library Functions
| Function Name | Parameters | Return Type | Description |
|--------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------|--------------------------------------------------------------------------------------------|
| __init__ | df: pd.DataFrame, g: MultiDiGraph = None | None | Initializes a DelabTree instance with a pandas DataFrame and an optional graph. |
| __str__ | None | str | Returns a string representation of the DelabTree. |
| from_recursive_tree | root_node: TreeNode | DelabTree | Creates a DelabTree instance from a TreeNode recursive structure. |
| branching_weight | None | float | Calculates the branching weight of the tree. |
| average_branching_factor | None | float | Computes the average branching factor of the tree. |
| root_dominance | None | float | Computes a dominance metric for the root author in the conversation. |
| total_number_of_posts | None | int | Returns the total number of posts in the conversation. |
| depth | None | int | Calculates the depth of the reply graph. |
| as_reply_graph | None | DiGraph | Generates a directed reply graph from the DataFrame. |
| as_author_graph | None | MultiDiGraph | Creates a directed graph combining reply and author relations. |
| as_author_interaction_graph | None | DiGraph | Projects the author interaction graph to capture who answered whom. |
| as_tree | None | DiGraph | Creates a tree representation of the reply graph using BFS traversal. |
| as_post_list | None | list[DelabPost] | Converts the DataFrame to a list of DelabPost objects. |
| as_recursive_tree | None | TreeNode | Converts the DataFrame into a recursive tree structure. |
| as_biggest_connected_tree | stateless: bool = True | Union[DelabTree, Graph] | Finds and returns the largest connected component of the reply graph. |
| as_removed_cycles | as_delab_tree: bool = True, compute_arborescence: bool = False | Union[DiGraph, DelabTree] | Removes cycles in the reply graph and optionally computes a minimum spanning arborescence. |
| as_attached_orphans | as_delab_tree: bool = True | Union[DelabTree, DiGraph] | Attaches orphaned nodes to the root node to recreate a similar tree structure. |
| as_merged_self_answers_graph | as_delab_tree: bool = True, return_deleted: bool = False | Union[DiGraph, DelabTree, tuple] | Merges sequential posts by the same author into one post, returning a new DelabTree. |
| as_flow_duo | min_length_flows: int = 6, min_post_branching: int = 3, min_pre_branching: int = 3, metric: str = "sentiment", verbose: bool = False | FLowDuo | Computes the two conversation flows with the highest difference in a specified metric. |
| get_conversation_flows | as_list: bool = False | tuple[dict[str, list[DelabPost]], str] | Returns all conversation flows (paths from root to leaf). |
| get_flow_candidates | length_flow: int, filter_function: Callable[[list[DelabPost]], bool] = None | list[list[DelabPost]] | Filters conversation flows by length and an optional filter function. |
| get_author_metrics | None | dict[str, AuthorMetric] | Computes centrality metrics (closeness, betweenness, Katz) for each author. |
| get_average_author_metrics | None | AuthorMetric | Calculates average centrality metrics for all authors in the graph. |
| get_baseline_author_vision | None | dict[str, float] | Computes a baseline vision score for each author based on reply behavior. |
| get_single_author_metrics | author_id: str | AuthorMetric | Returns centrality metrics for a specific author. |
| validate_internal_structure | None | None | Validates that the DataFrame has unique post IDs and aligns with the graph structure. |
| validate | verbose: bool = True, check_for: str = "all", check_time_stamps_differ: bool = True | bool | Validates the graph structure, including cycles, connectivity, and node names. |
| paint_faulty_graph | None | None | Visualizes a faulty reply graph with truncated node labels. |
| paint_reply_graph | None | None | Draws the reply graph in a circular layout. |
| paint_author_graph | None | None | Visualizes the author interaction graph. |
Advanced Use Cases
More complex metrics that use the full dataset for training can be gotten by the manager:
```python import pandas as pd from delab_trees import TreeManager
d = {'treeid': [1] * 4, 'postid': [1, 2, 3, 4], 'parentid': [None, 1, 2, 1], 'authorid': ["james", "mark", "steven", "john"], 'text': ["I am James", "I am Mark", " I am Steven", "I am John"], "createdat": [pd.Timestamp('2017-01-01T01'), pd.Timestamp('2017-01-01T02'), pd.Timestamp('2017-01-01T03'), pd.Timestamp('2017-01-01T04')]} df = pd.DataFrame(data=d) manager = TreeManager(df) # creates one tree rbvisiondictionary : dict["treeid", dict["authorid", "visionmetric"]] = manager.getrbvision() ```
The following two complex metrics are implemented:
```python from delabtrees.testdatamanager import gettest_manager
manager = gettestmanager() rbvisiondictionary = manager.getrbvision() # predict an author having seen a post pbvisiondictionary = manager.getpbvision() # predict an author to write the next post ```
How to cite
```latex
@article{dehnedtrees23,
author = {Dehne, Julian},
title = {Delab-Trees: measuring deliberation in online conversations},
url = {https://github.com/juliandehne/delab-trees}
year = {2023},
}
```
Owner
- Login: juliandehne
- Kind: user
- Repositories: 3
- Profile: https://github.com/juliandehne
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software in your research, please cite it using the following metadata."
title: "delab-trees, a python library to analyze conversation trees"
version: 0.3.4
date-released: 2023-09-18
authors:
- given-names: Julian
family-names: Dehne
affiliation: "University of Göttingen"
orcid: "https://orcid.org/0000-0001-9265-9619"
repository-code: "https://github.com/juliandehne/delab-trees"
license: MIT
GitHub Events
Total
- Push event: 9
Last Year
- Push event: 9
Committers
Last synced: 11 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Julian Dehne | j****e@g****m | 81 |
| pvonderhaar | a****1@s****e | 1 |
| dehnejn | j****e@g****g | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: about 2 hours
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- v-gold (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 24 last-month
- Total dependent packages: 1
- Total dependent repositories: 0
- Total versions: 21
- Total maintainers: 1
pypi.org: delab-trees
a library to analyse reply trees in forums and social media
- Homepage: https://github.com/juliandehne/delab-trees
- Documentation: https://delab-trees.readthedocs.io/
- License: MIT
-
Latest release: 0.4.2
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- numpy *
- tensorflow ==2.11.0
- keras *
- matplotlib *
- networkx *
- numpy ==1.22.3
- pandas *
- pytest ==7.1.2
- scikit-learn *
- setuptools *
- tensorflow *
- tqdm *