coordination-network-toolkit

A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.

https://github.com/qut-digital-observatory/coordination-network-toolkit

Science Score: 75.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    4 of 5 committers (80.0%) from academic institutions
  • Institutional organization owner
    Organization qut-digital-observatory has institutional domain (www.qut.edu.au)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.1%) to scientific vocabulary

Keywords

similarity-networks social-media-analysis social-science
Last synced: 6 months ago · JSON representation ·

Repository

A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.

Basic Info
  • Host: GitHub
  • Owner: QUT-Digital-Observatory
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 5 MB
Statistics
  • Stars: 79
  • Watchers: 6
  • Forks: 14
  • Open Issues: 25
  • Releases: 15
Topics
similarity-networks social-media-analysis social-science
Created over 5 years ago · Last pushed over 3 years ago
Metadata Files
Readme License Citation

README.md

Coordination Network Toolkit

A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.

Rationale and Background

Network visualisation of a co-retweet network

Social media activity doesn't occur in a vaccuum. Individuals on social media are often taking part in coordinated activities such as protest movements or interest-based communities.

Social media platforms are also used strategically to boost particular messages in line with political campaign goals or for commercial profit and scamming. This involves multiple accounts posting or reposting the same content, repeatedly and within a short time window (e.g. within 1 minute).

This software provides a toolkit to detect coordinated activity on social media and to generate networks that map the actors and their relationships. It provides a general purpose toolkit for multiple types of coordinated activity on any type of social media platform.

Fundamentally this toolkit produces networks where the nodes are accounts, and the weighted edges between these accounts are the number of messages from those accounts that meet some criterion for a type of coordination. This toolkit already implements several approaches to detecting different types of coordination, and is intended to be extensible to more cases in the future.

Firstly, it includes functionality for co-tweeting and co-retweeting (Keller et al., 2019; Schafer et al., 2017), where accounts post exactly the same text (co-tweets) or repost the same post within a short time window (co-retweets). Secondly, it includes functionality for co-link analysis, where multiple accounts post the same URLs repeatedly and in a short time window of each other (Giglietto et al., 2020). Thirdly, it adds two new types of network types: co-reply, where accounts are replying to the same post repeatedly together; and co-similarity, where accounts post similar text (but not exact duplicates), which relaxes the strict assumption of co-tweeting.

Types of Coordination Networks

Briefly, the following network types are supported. For more information see the documentation

  1. Co-retweet: reposting the same post
  2. Co-tweet: posting identical text
  3. Co-similarity: posting similar text (Jaccard similarity or user-defined)
  4. Co-link: posting the same link
  5. Co-reply: replying to the same post
  6. Co-post: posting any time of message within the same time window

The default time window is 60 seconds for all network types.

Reading List

Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Information, Communication and Society, 1–25.

Graham, T., Bruns, A., Zhu, G., & Campbell, R. (2020). Like a virus: The coordinated spread of coronavirus disinformation. Report commissioned for the Centre for Responsible Technology.

Keller, F. B., Schoch, D., Stier, S., & Yang, J. (2020). Political Astroturfing on Twitter: How to coordinate a disinformation Campaign. Political Communication, 37(2), 256-280.

Schafer, F., Evert, S., & Heinrich, P. (2017). Japan’s 2014 General Election: Political Bots, Right-Wing Internet Activism, and Prime Minister Shinz Abe’s Hidden Nationalist Agenda. Big Data, 5(4), 294–309.

Installation and Requirements

This tool requires a working Python 3.6 (or later) environment.

This tool can be installed from pip - this will handle installing the necessary dependencies.

pip install coordination_network_toolkit

Once you have installed, you can use the toolkit in either of two ways:

  1. As a command-line tool (run compute_networks --help to find out how)
  2. As a Python library (import coordination_network_toolkit)

Basic Usage

Using this tool requires:

  1. Collecting data from the social media platform of choice
  2. Preparing or preprocessing the data into one of the formats the tool accepts:
    • either a specific CSV format that works across all platforms OR
    • using the platform native format (currently Twitter JSON from both V1.1 and V2 APIs, including the Twitter Academic track, are supported)
  3. Preprocessing the raw data into an SQLite database to setup the data structures for efficient computation of the different networks
  4. Generating the network of choice, storing the output in a specified file.

Examples

This is just a quick example - see also the tutorial.

Worked example - CLI tool

  1. Collect Twitter data in the native Twitter JSON format using twarc, from either the V1.1 and V2 APIs
- `twarc search '#banana' --output banana.json`
- `twarc2 search --limit 1000 --archive '#purple' purple.json  `
  1. Bring the data into a local database file called processed_bananas.db. The original data file is not modified in any way. The toolkit will handle processing the Twitter JSON formats into the necessary format, including handling things like retweets, replies, urls, and extracting the text of the tweet. It will also handle deduplication, so if a tweet is present more than once only the first instance will be recorded.
- `compute_networks processed_bananas.db preprocess --format twitter_json banana.json purple.json`
  1. Calculate a retweet network, saving the output to a graphml format that can be directly opened in a tool like (Gephi)[https://gephi.org]. These settings indicate that if two users have retweeted the same tweet within 60 seconds of each other, there is a potential link.
- `compute_networks processed_bananas.db compute co_retweet --time_window 60 --output_file bananas_retweet_60s.graphml --output_format graphml`
  1. Calculate a co-link network, again saving the output in graphml format. By default this will use the plain text of the URL for matching, so the output here will confuse urls that are shortened.
- `compute_networks processed_bananas.db compute co_link --time_window 60 --output_file bananas_colink_unresolved_60s.graphml --output_format graphml`
  1. Resolve collected URLs, to handle link shortening services. Note that this process is intentionally rate limited to resolve no more than 25 urls/second. Once resolved, URLs will not be retried, so you can safely run this command again.
- `compute_networks processed_bananas.db resolve_urls`
  1. Calculate the co-link network, this time using the resolved urls.
- `compute_networks processed_bananas.db compute co_link --time_window 60 --output_file bananas_colink_resolved_60s.graphml --output_format graphml --resolved`

Python library usage example

You can find the following example as a Jupyter notebook you can run yourself in examples/example.ipynb.

``` import coordinationnetworktoolkit as coordnettk import networkx as nx

Preprocess CSV data located at '/path/to/your/csv_file.csv' into

the database located at /path/to/your/db - if no db file

exists here it will be created for you.

dbname = '/path/to/your/db.db' coordnettk.preprocess.preprocesscsvfiles(dbname, ['/path/to/your/csv_file.csv'])

Calculate similarity network

coordnettk.computenetworks.computecosimilartweet(db_name, 60)

Load data as a networkx graph

similaritygraph = coordnettk.graph.loadnetworkxgraph(dbname, "cosimilartweet")

Play with the graph!

for g in nx.connectedcomponents(similaritygraph): print(g) ```

Supported Input Formats

Twitter

JSON data from V1.1 and V2 of the Twitter API can be ingested directly.

CSV (All other platforms)

To use the CSV ingest format, construct a CSV with a header and the following columns. The names of the columns don't matter but the order does.

  • message_id: the unique identifier of the message on the platform
  • user_id: the unique identifier of the user on the platform
  • username: the text of the username (only used for display)
  • repost_id: if the message is a verbatim report of another message (such as a retweet or reblog), this is the identifier of that other message. Empty strings will be converted to null
  • reply_id: if the message is in reply to another message, the identifier for that other message. Empty strings will be converted to null.
  • message: the text of the message.
  • timestamp: A timestamp in seconds for the message. The absolute offset does not matter, but it needs to be consistent across all rows
  • urls: A space delimited string containing all of the URLs in the message

Cite the Coordination Network Toolkit

Graham, Timothy; QUT Digital Observatory; (2020): Coordination Network Toolkit. Queensland University of Technology. (Software) https://doi.org/10.25912/RDF_1632782596538

Looking for Help?

Are you getting stuck somewhere or want to ask questions about using this toolkit? Please open an issue or bring your questions to the Digital Observatory's fortnightly office hours.

Owner

  • Name: QUT Digital Observatory
  • Login: QUT-Digital-Observatory
  • Kind: organization
  • Email: digitalobservatory@qut.edu.au
  • Location: Brisbane

The QUT Digital Observatory is a research infrastructure facility enabling understanding of the dynamic digital landscape.

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Graham
    given-names: Timothy
    orcid: "http://orcid.org/0000-0002-4053-9313"
    affiliation: "Queensland University of Technology"
  - name: "QUT Digital Observatory"
    website: https://www.qut.edu.au/research/why-qut/infrastructure/digital-observatory
title: "Coordination Network Toolkit"
doi: "10.25912/RDF_1632782596538"
repository-code: "https://github.com/QUT-Digital-Observatory/coordination-network-toolkit"
url: "https://researchdatafinder.qut.edu.au/display/n26261"
license: MIT
keywords:
  - "coordinated inauthentic behavoir"
  - "network analysis"
  - "social science"
  - "social media analysis"
  - "similarity networks"
  - "sociology"

GitHub Events

Total
  • Watch event: 8
  • Issue comment event: 3
  • Pull request event: 1
  • Fork event: 1
Last Year
  • Watch event: 8
  • Issue comment event: 3
  • Pull request event: 1
  • Fork event: 1

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 73
  • Total Committers: 5
  • Avg Commits per committer: 14.6
  • Development Distribution Score (DDS): 0.438
Top Committers
Name Email Commits
Sam Hames s****s@q****u 41
betsy e****t@q****u 19
Tim Graham t****m@q****u 11
Sam Hames s****m@h****u 1
Sam Hames s****s@u****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 9 months ago

All Time
  • Total issues: 47
  • Total pull requests: 12
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 8 days
  • Total issue authors: 17
  • Total pull request authors: 2
  • Average comments per issue: 1.23
  • Average comments per pull request: 0.25
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • SamHames (17)
  • timothyjgraham (6)
  • havardl (4)
  • myrainbowandsky (3)
  • betsybookwyrm (3)
  • weiaiwayne (2)
  • bkrdmr (2)
  • patrick-lee-warren (1)
  • dandaii (1)
  • Monika-cnb (1)
  • harisbinzia (1)
  • arjun-s2 (1)
  • psalmuel19 (1)
  • mariadelmarq (1)
  • barripdmx (1)
Pull Request Authors
  • SamHames (9)
  • betsybookwyrm (3)
  • riozhu-GZ (1)
Top Labels
Issue Labels
enhancement (6) documentation (5) good first issue (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 116 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 18
  • Total maintainers: 1
pypi.org: coordination-network-toolkit

Tools for computing networks of coordinated behaviour on social media

  • Versions: 18
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 116 Last month
Rankings
Stargazers count: 8.3%
Forks count: 9.8%
Dependent packages count: 10.1%
Average: 14.0%
Downloads: 20.3%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • certifi ==2021.5.30
  • chardet ==4.0.0
  • decorator ==4.4.2
  • idna ==2.10
  • networkx ==2.5.1
  • regex ==2019.11.1
  • requests ==2.25.1
  • urllib3 ==1.26.5
setup.py pypi
  • networkx >=2.5
  • regex >=2021.3.17
  • requests *
  • twarc >=2.4.0
  • urllib3 *
.github/workflows/python-publish.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/python-test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite