coordination-network-toolkit
A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
https://github.com/qut-digital-observatory/coordination-network-toolkit
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
4 of 5 committers (80.0%) from academic institutions -
✓Institutional organization owner
Organization qut-digital-observatory has institutional domain (www.qut.edu.au) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Keywords
Repository
A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
Basic Info
Statistics
- Stars: 79
- Watchers: 6
- Forks: 14
- Open Issues: 25
- Releases: 15
Topics
Metadata Files
README.md
Coordination Network Toolkit
A small command line tool and set of functions for studying coordination networks in Twitter and other social media data.
Rationale and Background

Social media activity doesn't occur in a vaccuum. Individuals on social media are often taking part in coordinated activities such as protest movements or interest-based communities.
Social media platforms are also used strategically to boost particular messages in line with political campaign goals or for commercial profit and scamming. This involves multiple accounts posting or reposting the same content, repeatedly and within a short time window (e.g. within 1 minute).
This software provides a toolkit to detect coordinated activity on social media and to generate networks that map the actors and their relationships. It provides a general purpose toolkit for multiple types of coordinated activity on any type of social media platform.
Fundamentally this toolkit produces networks where the nodes are accounts, and the weighted edges between these accounts are the number of messages from those accounts that meet some criterion for a type of coordination. This toolkit already implements several approaches to detecting different types of coordination, and is intended to be extensible to more cases in the future.
Firstly, it includes functionality for co-tweeting and co-retweeting (Keller et al., 2019; Schafer et al., 2017), where accounts post exactly the same text (co-tweets) or repost the same post within a short time window (co-retweets). Secondly, it includes functionality for co-link analysis, where multiple accounts post the same URLs repeatedly and in a short time window of each other (Giglietto et al., 2020). Thirdly, it adds two new types of network types: co-reply, where accounts are replying to the same post repeatedly together; and co-similarity, where accounts post similar text (but not exact duplicates), which relaxes the strict assumption of co-tweeting.
Types of Coordination Networks
Briefly, the following network types are supported. For more information see the documentation
- Co-retweet: reposting the same post
- Co-tweet: posting identical text
- Co-similarity: posting similar text (Jaccard similarity or user-defined)
- Co-link: posting the same link
- Co-reply: replying to the same post
- Co-post: posting any time of message within the same time window
The default time window is 60 seconds for all network types.
Reading List
Giglietto, F., Righetti, N., Rossi, L., & Marino, G. (2020). It takes a village to manipulate the media: coordinated link sharing behavior during 2018 and 2019 Italian elections. Information, Communication and Society, 1–25.
Graham, T., Bruns, A., Zhu, G., & Campbell, R. (2020). Like a virus: The coordinated spread of coronavirus disinformation. Report commissioned for the Centre for Responsible Technology.
Keller, F. B., Schoch, D., Stier, S., & Yang, J. (2020). Political Astroturfing on Twitter: How to coordinate a disinformation Campaign. Political Communication, 37(2), 256-280.
Schafer, F., Evert, S., & Heinrich, P. (2017). Japan’s 2014 General Election: Political Bots, Right-Wing Internet Activism, and Prime Minister Shinz Abe’s Hidden Nationalist Agenda. Big Data, 5(4), 294–309.
Installation and Requirements
This tool requires a working Python 3.6 (or later) environment.
This tool can be installed from pip - this will handle installing the necessary dependencies.
pip install coordination_network_toolkit
Once you have installed, you can use the toolkit in either of two ways:
- As a command-line tool (run
compute_networks --helpto find out how) - As a Python library (
import coordination_network_toolkit)
Basic Usage
Using this tool requires:
- Collecting data from the social media platform of choice
- Preparing or preprocessing the data into one of the formats the tool accepts:
- either a specific CSV format that works across all platforms OR
- using the platform native format (currently Twitter JSON from both V1.1 and V2 APIs, including the Twitter Academic track, are supported)
- Preprocessing the raw data into an SQLite database to setup the data structures for efficient computation of the different networks
- Generating the network of choice, storing the output in a specified file.
Examples
This is just a quick example - see also the tutorial.
Worked example - CLI tool
- Collect Twitter data in the native Twitter JSON format using twarc, from either the V1.1 and V2 APIs
- `twarc search '#banana' --output banana.json`
- `twarc2 search --limit 1000 --archive '#purple' purple.json `
- Bring the data into a local database file called processed_bananas.db. The original data file is not modified in any way. The toolkit will handle processing the Twitter JSON formats into the necessary format, including handling things like retweets, replies, urls, and extracting the text of the tweet. It will also handle deduplication, so if a tweet is present more than once only the first instance will be recorded.
- `compute_networks processed_bananas.db preprocess --format twitter_json banana.json purple.json`
- Calculate a retweet network, saving the output to a graphml format that can be directly opened in a tool like (Gephi)[https://gephi.org]. These settings indicate that if two users have retweeted the same tweet within 60 seconds of each other, there is a potential link.
- `compute_networks processed_bananas.db compute co_retweet --time_window 60 --output_file bananas_retweet_60s.graphml --output_format graphml`
- Calculate a co-link network, again saving the output in graphml format. By default this will use the plain text of the URL for matching, so the output here will confuse urls that are shortened.
- `compute_networks processed_bananas.db compute co_link --time_window 60 --output_file bananas_colink_unresolved_60s.graphml --output_format graphml`
- Resolve collected URLs, to handle link shortening services. Note that this process is intentionally rate limited to resolve no more than 25 urls/second. Once resolved, URLs will not be retried, so you can safely run this command again.
- `compute_networks processed_bananas.db resolve_urls`
- Calculate the co-link network, this time using the resolved urls.
- `compute_networks processed_bananas.db compute co_link --time_window 60 --output_file bananas_colink_resolved_60s.graphml --output_format graphml --resolved`
Python library usage example
You can find the following example as a Jupyter notebook you can run yourself in
examples/example.ipynb.
``` import coordinationnetworktoolkit as coordnettk import networkx as nx
Preprocess CSV data located at '/path/to/your/csv_file.csv' into
the database located at /path/to/your/db - if no db file
exists here it will be created for you.
dbname = '/path/to/your/db.db' coordnettk.preprocess.preprocesscsvfiles(dbname, ['/path/to/your/csv_file.csv'])
Calculate similarity network
coordnettk.computenetworks.computecosimilartweet(db_name, 60)
Load data as a networkx graph
similaritygraph = coordnettk.graph.loadnetworkxgraph(dbname, "cosimilartweet")
Play with the graph!
for g in nx.connectedcomponents(similaritygraph): print(g) ```
Supported Input Formats
JSON data from V1.1 and V2 of the Twitter API can be ingested directly.
CSV (All other platforms)
To use the CSV ingest format, construct a CSV with a header and the following columns. The names of the columns don't matter but the order does.
- message_id: the unique identifier of the message on the platform
- user_id: the unique identifier of the user on the platform
- username: the text of the username (only used for display)
- repost_id: if the message is a verbatim report of another message (such as a retweet or reblog), this is the identifier of that other message. Empty strings will be converted to null
- reply_id: if the message is in reply to another message, the identifier for that other message. Empty strings will be converted to null.
- message: the text of the message.
- timestamp: A timestamp in seconds for the message. The absolute offset does not matter, but it needs to be consistent across all rows
- urls: A space delimited string containing all of the URLs in the message
Cite the Coordination Network Toolkit
Graham, Timothy; QUT Digital Observatory; (2020): Coordination Network Toolkit. Queensland University of Technology. (Software) https://doi.org/10.25912/RDF_1632782596538
Looking for Help?
Are you getting stuck somewhere or want to ask questions about using this toolkit? Please open an issue or bring your questions to the Digital Observatory's fortnightly office hours.
Owner
- Name: QUT Digital Observatory
- Login: QUT-Digital-Observatory
- Kind: organization
- Email: digitalobservatory@qut.edu.au
- Location: Brisbane
- Website: https://www.qut.edu.au/digital-observatory
- Twitter: ObservatoryTeam
- Repositories: 10
- Profile: https://github.com/QUT-Digital-Observatory
The QUT Digital Observatory is a research infrastructure facility enabling understanding of the dynamic digital landscape.
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Graham
given-names: Timothy
orcid: "http://orcid.org/0000-0002-4053-9313"
affiliation: "Queensland University of Technology"
- name: "QUT Digital Observatory"
website: https://www.qut.edu.au/research/why-qut/infrastructure/digital-observatory
title: "Coordination Network Toolkit"
doi: "10.25912/RDF_1632782596538"
repository-code: "https://github.com/QUT-Digital-Observatory/coordination-network-toolkit"
url: "https://researchdatafinder.qut.edu.au/display/n26261"
license: MIT
keywords:
- "coordinated inauthentic behavoir"
- "network analysis"
- "social science"
- "social media analysis"
- "similarity networks"
- "sociology"
GitHub Events
Total
- Watch event: 8
- Issue comment event: 3
- Pull request event: 1
- Fork event: 1
Last Year
- Watch event: 8
- Issue comment event: 3
- Pull request event: 1
- Fork event: 1
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 73
- Total Committers: 5
- Avg Commits per committer: 14.6
- Development Distribution Score (DDS): 0.438
Top Committers
| Name | Commits | |
|---|---|---|
| Sam Hames | s****s@q****u | 41 |
| betsy | e****t@q****u | 19 |
| Tim Graham | t****m@q****u | 11 |
| Sam Hames | s****m@h****u | 1 |
| Sam Hames | s****s@u****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 9 months ago
All Time
- Total issues: 47
- Total pull requests: 12
- Average time to close issues: about 1 month
- Average time to close pull requests: 8 days
- Total issue authors: 17
- Total pull request authors: 2
- Average comments per issue: 1.23
- Average comments per pull request: 0.25
- Merged pull requests: 12
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- SamHames (17)
- timothyjgraham (6)
- havardl (4)
- myrainbowandsky (3)
- betsybookwyrm (3)
- weiaiwayne (2)
- bkrdmr (2)
- patrick-lee-warren (1)
- dandaii (1)
- Monika-cnb (1)
- harisbinzia (1)
- arjun-s2 (1)
- psalmuel19 (1)
- mariadelmarq (1)
- barripdmx (1)
Pull Request Authors
- SamHames (9)
- betsybookwyrm (3)
- riozhu-GZ (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 116 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 18
- Total maintainers: 1
pypi.org: coordination-network-toolkit
Tools for computing networks of coordinated behaviour on social media
- Homepage: https://github.com/QUT-Digital-Observatory/coordination-network-toolkit
- Documentation: https://coordination-network-toolkit.readthedocs.io/
- License: MIT
-
Latest release: 1.5.2
published over 3 years ago
Rankings
Maintainers (1)
Dependencies
- certifi ==2021.5.30
- chardet ==4.0.0
- decorator ==4.4.2
- idna ==2.10
- networkx ==2.5.1
- regex ==2019.11.1
- requests ==2.25.1
- urllib3 ==1.26.5
- networkx >=2.5
- regex >=2021.3.17
- requests *
- twarc >=2.4.0
- urllib3 *
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite