Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: JLaTondre
- License: mit
- Language: Perl
- Default Branch: main
- Homepage: https://en.wikipedia.org/wiki/User:JL-Bot
- Size: 338 KB
Statistics
- Stars: 2
- Watchers: 2
- Forks: 1
- Open Issues: 3
- Releases: 0
Metadata Files
README.md
JL-Bot
This repository contains the source code for: https://en.wikipedia.org/wiki/User:JL-Bot
The bot is primarily implemented using Perl, but some new tasks use Python.
The repository contains all of the custom code associated with the bot. The scripts utilize a number of standard packages, but all of these can be found via normal package managers. The Citations and Content tasks also require sqlite3.
The scripts require two configuration variables to be set: * WIKICONFIGDIR = location of configuration files * WIKIWORKINGDIR = location of where results are to be output
There are two configuration files required:
* bot-info.txt = Wikipedia login information for the bot.
The format is separate <keyword> = <value> lines with the keywords being USERNAME and PASSWORD.
* email-info.txt = Email address required for the crossref.org API.
The format is a separate <keyword> = <value> line with the keyword being EMAIL.
Owner
- Login: JLaTondre
- Kind: user
- Repositories: 1
- Profile: https://github.com/JLaTondre
Citation (citations/citations-common.pl)
#!/usr/bin/perl
# This script generates the most popular target results for journals (WP:JCW/TAR) and
# magazines (WP:MCW/TAR).
# This has significant overlap with citations-specified, but kept separate as handling
# magazine would over complicate citations-specified.
use warnings;
use strict;
use Benchmark;
use File::Basename;
use Getopt::Std;
use Text::LevenshteinXS qw( distance );
use lib dirname(__FILE__) . '/../modules';
use citations qw(
findCitation
findIndividual
findNormalizations
findRedirectExpansions
formatCitation
isUppercaseMatch
loadRedirects
normalizeCitation
retrieveFalsePositives
);
use citationsDB;
use mybot;
use utf8;
#
# Validate Environment Variables
#
unless (exists $ENV{'WIKI_CONFIG_DIR'}) {
die "ERROR: WIKI_CONFIG_DIR environment variable not set\n";
}
unless (exists $ENV{'WIKI_WORKING_DIR'}) {
die "ERROR: WIKI_WORKING_DIR environment variable not set\n";
}
#
# Configuration & Globals
#
my $DBTITLES = $ENV{'WIKI_WORKING_DIR'} . '/Citations/db-titles.sqlite3';
my $DBINDIVIDUAL = $ENV{'WIKI_WORKING_DIR'} . '/Citations/db-individual.sqlite3';
my $DBCOMMON = $ENV{'WIKI_WORKING_DIR'} . '/Citations/db-common.sqlite3';
my $BOTINFO = $ENV{'WIKI_CONFIG_DIR'} . '/bot-info.txt';
my @TYPES = qw(
journal magazine
);
my %MAX = (
'journal' => 3000,
'magazine' => 500,
);
my $FALSEPOSITIVES = 'User:JL-Bot/Citations.cfg';
my @TABLES = (
'CREATE TABLE commons(type TEXT, target TEXT, entries TEXT, entryCount INTEGER, lineCount INTEGER, articles TEXT, citations INTEGER)',
'CREATE TABLE revisions(type TEXT, revision TEXT)',
);
#
# Subroutines
#
sub combineSpecified {
# Combine specified targets along with redirects
my $database = shift;
my $type = shift;
my $common = shift;
print " combining $type targets and redirects ...\n";
my $combined;
my $allRedirects;
for my $selected (keys %$common) {
$combined->{$selected} = undef;
my $redirects = loadRedirects($database, $selected);
for my $redirect (keys %$redirects) {
next if ($redirect =~ /^10\.\d+$/); # skip DOI redirects
$combined->{$redirect} = undef;
$allRedirects->{$selected}->{$redirect} = 1;
}
}
return $combined, $allRedirects;
}
sub generateResult {
# Generate the final result for a given target
my $selected = shift;
my $specified = shift;
my $redirects = shift;
my $normalizations = shift;
my $falsePositives = shift;
my $type = shift;
my $dbCommon = shift;
my $citations = {}; # citation format, counts, & articles
# process record matches (if any)
if (exists $specified->{$selected}->{'record'}) {
my $record = $specified->{$selected}->{'record'};
my $target = $record->{'target'};
$citations->{$selected}->{'formatted'} = formatCitation($selected, $record);
$citations->{$selected}->{'count'} = $record->{'citation-count'};
$citations->{$selected}->{'articles'} = $record->{'articles'};
# process normalization matches to record (if any)
for my $normalization (keys %{$specified->{$selected}->{'record'}->{'normalizations'}}) {
for my $match (keys %{$normalizations->{$normalization}}) {
next if (exists $falsePositives->{$selected}->{$match});
next if (exists $specified->{$match}); # matches self or other top level
next if (isUppercaseMatch($match, $selected, $redirects->{$selected})); # both uppercase
my $record = $normalizations->{$normalization}->{$match};
$citations->{$match}->{'formatted'} = formatCitation($match, $record);
$citations->{$match}->{'count'} = $record->{'citation-count'};
$citations->{$match}->{'articles'} = $record->{'articles'};
}
}
}
# process normalization matches to selected (if any)
for my $normalization (keys %{$specified->{$selected}->{'normalizations'}}) {
for my $match (keys %{$normalizations->{$normalization}}) {
next if (exists $falsePositives->{$selected}->{$match});
next if (exists $specified->{$match}); # matches self or other top level
next if (isUppercaseMatch($match, $selected, $redirects->{$selected})); # both uppercase
my $record = $normalizations->{$normalization}->{$match};
$citations->{$match}->{'formatted'} = formatCitation($match, $record);
$citations->{$match}->{'count'} = $record->{'citation-count'};
$citations->{$match}->{'articles'} = $record->{'articles'};
}
}
# process redirects
for my $redirect (keys %{$redirects->{$selected}}) {
next if (exists $falsePositives->{$selected}->{$redirect});
# process record matches (if any)
if (exists $specified->{$redirect}->{'record'}) {
my $record = $specified->{$redirect}->{'record'};
die "selected =/= redirect! how...\n selected = $selected\n redirect = $redirect\n" if ($record->{'target'} ne $selected); # check
$citations->{$redirect}->{'formatted'} = formatCitation($redirect, $record);
$citations->{$redirect}->{'count'} = $record->{'citation-count'};
$citations->{$redirect}->{'articles'} = $record->{'articles'};
# process normalization matches to redirect records (if any)
for my $normalization (keys %{$specified->{$redirect}->{'record'}->{'normalizations'}}) {
for my $match (keys %{$normalizations->{$normalization}}) {
next if (exists $falsePositives->{$selected}->{$match});
next if (exists $falsePositives->{$redirect}->{$match});
next if (exists $specified->{$match}); # matches self or other top level
next if (isUppercaseMatch($match, $redirect, $redirects->{$selected})); # both uppercase
next if (isUppercaseMatch($match, $selected, $redirects->{$selected})); # both uppercase
my $record = $normalizations->{$normalization}->{$match};
$citations->{$match}->{'formatted'} = formatCitation($match, $record);
$citations->{$match}->{'count'} = $record->{'citation-count'};
$citations->{$match}->{'articles'} = $record->{'articles'};
}
}
}
# process normalization matches to redirect (if any)
for my $normalization (keys %{$specified->{$redirect}->{'normalizations'}}) {
for my $match (keys %{$normalizations->{$normalization}}) {
next if (exists $falsePositives->{$selected}->{$match});
next if (exists $falsePositives->{$redirect}->{$match});
next if (exists $specified->{$match}); # matches self or other top level
next if (isUppercaseMatch($match, $redirect, $redirects->{$selected})); # both uppercase
next if (isUppercaseMatch($match, $selected, $redirects->{$selected})); # both uppercase
my $record = $normalizations->{$normalization}->{$match};
$citations->{$match}->{'formatted'} = formatCitation($match, $record);
$citations->{$match}->{'count'} = $record->{'citation-count'};
$citations->{$match}->{'articles'} = $record->{'articles'};
}
}
}
# process redirect expansions for redirects
for my $redirect (keys %{$redirects->{$selected}}) {
my $records = findRedirectExpansions($dbCommon, $type, $redirect);
for my $citation (keys %$records) {
next if (exists $falsePositives->{$selected}->{$citation});
next if (exists $falsePositives->{$redirect}->{$citation});
# can occur multiple times via different redirects so include all
my $record = $records->{$citation};
$citations->{$citation}->{'formatted'} = formatCitation($citation, $record);
$citations->{$citation}->{'count'} = $record->{'citation-count'};
$citations->{$citation}->{'articles'} = $record->{'articles'};
}
}
# create final output
my $entries;
for my $citation (sort keys %$citations) {
$entries .= "* $citations->{$citation}->{'formatted'}\n";
}
my $totalCitations = 0;
my $allArticles = {};
for my $citation (keys %$citations) {
$totalCitations += $citations->{$citation}->{'count'};
$allArticles = {%$allArticles, %{$citations->{$citation}->{'articles'}}};
}
my $result->{'entries'} = $entries;
$result->{'entryCount'} = () = $entries =~ / \(\d+ in /g;
$result->{'lineCount'} = () = $entries =~ /\n/g;
$result->{'articles'} = scalar keys %$allArticles;
$result->{'citations'} = $totalCitations;
return $result;
}
sub loadCommon {
# Retrieve the top common targets
my $database = shift;
my $type = shift;
my $max = shift;
print " retrieving $type targets ...\n";
my $sth = $database->prepare('
SELECT target, SUM(cCount)
FROM individuals
WHERE type = ?
AND target NOT IN ("—", "LANGUAGE", "INTERWIKI", "Invalid")
GROUP BY target
ORDER BY SUM(cCount) DESC
LIMIT ?
');
$sth->bind_param(1, $type);
$sth->bind_param(2, $max);
$sth->execute();
my $specified;
while (my $ref = $sth->fetchrow_hashref()) {
my $target = $ref->{'target'};
$specified->{$target} = 1;
}
return $specified;
}
sub saveResult {
# Save the result to the database
my $database = shift;
my $type = shift;
my $target = shift;
my $result = shift;
my $entries = $result->{'entries'};
my $entryCount = $result->{'entryCount'};
my $lineCount = $result->{'lineCount'};
my $articles = $result->{'articles'};
my $citations = $result->{'citations'};
my $sth = $database->prepare(qq{
INSERT INTO commons (type, target, entries, entryCount, lineCount, articles, citations)
VALUES (?, ?, ?, ?, ?, ?, ?)
});
$sth->execute($type, $target, $entries, $entryCount, $lineCount, $articles, $citations);
return;
}
#
# Main
#
# handle UTF-8
binmode(STDOUT, ':utf8');
binmode(STDERR, ':utf8');
# auto-flush output
$| = 1;
# generate output
print "Generating common targets ...\n";
my $p0 = Benchmark->new;
# delete existing database & create new one
print " creating database ...\n";
if (-e $DBCOMMON) {
unlink $DBCOMMON
or die "ERROR: Could not delete database ($DBCOMMON)\n --> $!\n\n";
}
my $dbCommon = citationsDB->new;
$dbCommon->cloneDatabase($DBINDIVIDUAL, $DBCOMMON);
$dbCommon->openDatabase($DBCOMMON);
$dbCommon->createTables(\@TABLES);
my $dbTitles = citationsDB->new;
$dbTitles->openDatabase($DBTITLES);
# load false positives
my ($falsePositives, $fpRevision) = retrieveFalsePositives($BOTINFO, $FALSEPOSITIVES, $dbTitles);
my $sth = $dbCommon->prepare('INSERT INTO revisions VALUES (?, ?)');
$sth->execute('falsePositive', $fpRevision);
$dbCommon->commit;
# process each type
for my $type (@TYPES) {
my $maximum = $MAX{$type} * 1.05; # do more than max as initial numbers expand
# find top common targets and their rediercts
my $common = loadCommon($dbCommon, $type, $maximum);
my ($specified, $redirects) = combineSpecified($dbTitles, $type, $common);
# process specified
print " finding citations for specified ...\n";
my $normalizations;
for my $selected (keys %$specified) {
my $citation = findCitation($dbCommon, $type, $selected);
if ($citation) {
$specified->{$selected}->{'record'} = $citation;
for my $normalization (keys %{$citation->{'normalizations'}}) {
$normalizations->{$normalization} = undef;
}
}
else {
my $normalization = normalizeCitation($selected);
$specified->{$selected}->{'normalizations'}->{$normalization} = 1;
$normalizations->{$normalization} = undef;
}
}
# process normalizations
my $total = scalar keys %$normalizations;
print " processing $total normalizations ...\n";
for my $normalization (keys %$normalizations) {
next if ($normalization eq '--');
my $candidates = findNormalizations($dbCommon, $type, $normalization);
for my $candidate (keys %$candidates) {
my $result = findIndividual($dbCommon, $type, $candidate);
$normalizations->{$normalization}->{$candidate} = $result if ($result);
}
}
# put it together
print " generating results ...\n";
for my $target (keys %$common) {
my $result = generateResult(
$target,
$specified,
$redirects,
$normalizations,
$falsePositives,
$type,
$dbCommon,
);
saveResult($dbCommon, $type, $target, $result);
}
$dbCommon->commit;
}
$dbCommon->disconnect;
$dbTitles->disconnect;
my $p1 = Benchmark->new;
my $pd = timediff($p1, $p0);
my $ps = timestr($pd);
$ps =~ s/^\s*(\d+)\swallclock secs.*$/$1/;
print " specified citations processed in $ps seconds\n";
GitHub Events
Total
- Issues event: 23
- Delete event: 14
- Issue comment event: 1
- Push event: 9
- Pull request event: 25
- Create event: 14
Last Year
- Issues event: 23
- Delete event: 14
- Issue comment event: 1
- Push event: 9
- Pull request event: 25
- Create event: 14
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 8
- Total pull requests: 7
- Average time to close issues: 32 minutes
- Average time to close pull requests: 1 minute
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 8
- Pull requests: 7
- Average time to close issues: 32 minutes
- Average time to close pull requests: 1 minute
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- JLaTondre (23)
- Octobot222 (1)
Pull Request Authors
- JLaTondre (23)