Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.4%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
A helper library full of URL-related heuristics.
Basic Info
Statistics
- Stars: 70
- Watchers: 7
- Forks: 13
- Open Issues: 22
- Releases: 22
Topics
Metadata Files
README.md
Ural
A python helper library full of URL-related heuristics.
Installation
You can install ural with pip with the following command:
pip install ural
How to cite?
ural is published on Zenodo as
You can cite it thusly:
Guillaume Plique, Jules Farjas, Oubine Perrin, Benjamin Ooghe-Tabanou, Martin Delabre, Pauline Breteau, Jean Descamps, Béatrice Mazoyer, Amélie Pellé, Laura Miguel, & César Pichon. Ural, a python helper library full of URL-related heuristics. (2018). Zenodo. https://doi.org/10.5281/zenodo.8160139
Usage
Generic functions
- canonicalize_url
- couldbehtml
- couldberss
- ensure_protocol
- fingerprint_hostname
- fingerprint_url
- force_protocol
- format_url
- getdomainname
- get_hostname
- getfingerprintedhostname
- getnormalizedhostname
- hasspecialhost
- hasvalidsuffix
- hasvalidtld
- infer_redirection
- is_homepage
- isshortenedurl
- isspecialhost
- istypourl
- is_url
- isvalidtld
- linksfromhtml
- normalize_hostname
- normalize_url
- shouldfollowhref
- should_resolve
- split_suffix
- strip_protocol
- urlpathsplit
- urlsfromhtml
- urlsfromtext
Utilities
Classes
LRU-related functions (What on earth is a LRU?)
- lru.urltolru
- lru.lrutourl
- lru.lru_stems
- lru.canonicalizedlrustems
- lru.normalizedlrustems
- lru.fingerprintedlrustems
- lru.serialize_lru
- lru.unserialize_lru
LRU-related classes
Platform-specific functions
Differences between canonicalizeurl, normalizeurl & fingerprint_url
ural comes with three different url deduplication schemes, targeted to different use-cases and ordered hereafter by ascending aggressiveness:
- canonicalize_url: we clean the url by performing some light preprocessing usually done by web browsers before hitting them, e.g. lowercasing the hostname, decoding punycode, ensuring we have a protocol, dropping leading and trailing whitespace etc. The clean url is guaranteed to still lead to the same place.
- normalize_url: we apply more advanced preprocessing that will drop some parts of the url that are irrelevant to where the url leads, such as technical artifacts and SEO tricks. For instance, we will drop typical query items used by marketing campaigns, reorder the query items, infer some redirections, strip trailing slash or fragment when advisable etc. At that point, the url should be clean enough that one can perform meaningful statistical aggregation when counting them, all while ensuring with some good probability that the url still works and still leads to the same place, at least if the target server follows most common conventions.
- fingerprint_url: we go a step further and we perform destructive preprocessing that cannot guarantee that the resulting url will still be valid. But the result might be even more useful for statistical aggregation, especially when counting urls from large platforms having multiple domains (e.g.
facebook.com,facebook.fretc.)
| Function | Use-cases | Url validity | Deduplication strength | |------------------|--------------------------------------|------------------------|------------------------| | canonicalizeurl | web crawler | Technically the same | + | | normalizeurl | web crawler, statistical aggregation | Probably the same | ++ | | fingerprint_url | statistical aggregation | Potentially invalid | +++ |
Example
```python from ural import canonicalizeurl, normalizeurl, fingerprint_url
url = 'https://www.FACEBOOK.COM:80/index.html?utm_campaign=3&id=34'
canonicalize_url(url)
'https://www.facebook.com/index.html?utm_campaign=3&id=34'
The same url, cleaned up a little
normalize_url(url)
'facebook.com?id=34'
Still a valid url, with implicit protocol, where all the cruft has been discarded
fingerprinturl(url, stripsuffix=True)
'facebook?id=34'
Not a valid url anymore, but useful to match more potential
candidates such as: http://facebook.co.uk/index.html?id=34
```
canonicalize_url
Function returning a clean and safe version of the url by performing the same kind of preprocessing as web browsers.
For more details about this be sure to read this section of the docs.
```python from ural import canonicalize_url
canonicalize_url('www.LEMONDE.fr')
'https://lemonde.fr' ```
Arguments
- url string: url to canonicalize.
- quoted ?bool [
False]: by default the function will unquote the url as much as possible all while keeping the url safe. If this kwarg is set toTrue, the function will instead quote the url as much as possible all while ensuring nothing will be double-quoted. - default_protocol ?str [
https]: default protocol to add when the url has none. - strip_fragment ?str [
False]: whether to strip the url's fragment.
couldbehtml
Function returning whether the url could return HTML.
```python from ural import couldbehtml
couldbehtml('https://www.lemonde.fr')
True
couldbehtml('https://www.lemonde.fr/articles/page.php')
True
couldbehtml('https://www.lemonde.fr/data.json')
False
couldbehtml('https://www.lemonde.fr/img/figure.jpg')
False ```
couldberss
Function returning whether the given url could be a rss feed url.
```python from ural import couldberss
couldberss('https://www.lemonde.fr/cyclisme/rss_full.xml')
True
couldberss('https://www.lemonde.fr/cyclisme/')
False
couldberss('https://www.ecorce.org/spip.php?page=backend')
True
couldberss('https://feeds.feedburner.com/helloworld')
True ```
ensure_protocol
Function checking if the url has a protocol, and adding the given one if there is none.
```python from ural import ensure_protocol
ensure_protocol('www.lemonde.fr', protocol='https')
'https://www.lemonde.fr' ```
Arguments
- url string: URL to format.
- protocol string: protocol to use if there is none in url. Is 'http' by default.
fingerprint_hostname
Function returning a "fingerprinted" version of the given hostname by stripping subdomains irrelevant for statistical aggregation. Be warned that this function is even more aggressive than normalize_hostname and that the resulting hostname might not be valid anymore.
For more details about this be sure to read this section of the docs.
```python from ural import fingerprint_hostname
fingerprint_hostname('www.lemonde.fr')
'lemonde.fr'
fingerprint_hostname('fr-FR.facebook.com')
'facebook.com'
fingerprinthostname('fr-FR.facebook.com', stripsuffix=True)
'facebook' ```
Arguments
- hostname string: target hostname.
- strip_suffix ?bool [
False]: whether to strip the hostname suffix such as.comor.co.uk. This can be useful to aggegate different domains of the same platform.
fingerprint_url
Function returning a "fingerprinted" version of the given url that can be useful for statistical aggregation. Be warned that this function is even more aggressive than normalize_url and that the resulting url might not be valid anymore.
For more details about this be sure to read this section of the docs.
```python from ural import fingerprint_hostname
fingerprint_url('www.lemonde.fr/article.html')
'lemonde.fr/article.html'
fingerprint_url('fr-FR.facebook.com/article.html')
'facebook.com/article.html'
fingerprinturl('fr-FR.facebook.com/article.html', stripsuffix=True)
'facebook/article.html' ```
Arguments
- url string: target url.
- strip_suffix ?bool [
False]: whether to strip the hostname suffix such as.comor.co.uk. This can be useful to aggegate different domains of the same platform. - platform_aware ?bool [
False]: whether to take some well-known platforms supported byuralsuch as facebook, youtube etc. into account when normalizing the url.
force_protocol
Function force-replacing the protocol of the given url.
```python from ural import force_protocol
force_protocol('https://www2.lemonde.fr', protocol='ftp')
'ftp://www2.lemonde.fr' ```
Arguments
- url string: URL to format.
- protocol string: protocol wanted in the output url. Is
'http'by default.
format_url
Function formatting a url given some typical parameters.
```python from ural import format_url
format_url( 'https://lemonde.fr', path='/article.html', args={'id': '48675'}, fragment='title-2' )
'https://lemonde.fr/article.html?id=48675#title-2'
Path can be given as an iterable
format_url('https://lemonde.fr', path=['articles', 'one.html'])
'https://lemonde.fr/articles/one.html'
Extension
format_url('https://lemonde.fr', path=['article'], ext='html')
'https://lemonde.fr/articles/article.html'
Query args are formatted/quoted and/or skipped if None/False
format_url( "http://lemonde.fr", path=["business", "articles"], args={ "hello": "world", "number": 14, "boolean": True, "skipped": None, "also-skipped": False, "quoted": "test=ok", }, fragment="#test", )
'http://lemonde.fr/business/articles?boolean&hello=world&number=14"ed=test%3Dok#test'
Query args can also be passed as a list of (key, value) pairs
format_url("http://lemonde.fr", args=[("id", "one"), ("name", "lucy")])
"http://lemonde.fr?id=one&name=lucy
Custom argument value formatting
def formatargvalue(key, value): if key == 'ids': return ','.join(value)
return key
formaturl( 'https://lemonde.fr', args={'ids': [1, 2]}, formatargvalue=formatarg_value )
'https://lemonde.fr?ids=1%2C2'
Formatter class
from ural import URLFormatter
formatter = URLFormatter('https://lemonde.fr', args={'id': 'one'})
formatter(path='/article.html')
'https://lemonde.fr/article.html?id=one'
same as:
formatter.format(path='/article.html')
'https://lemonde.fr/article.html?id=one'
Query arguments are merged
formatter(path='/article.html', args={"user_id": "two"})
'https://lemonde.fr/article.html?id=one&user_id=two'
Easy subclassing
class MyCustomFormatter(URLFormatter): BASE_URL = 'https://lemonde.fr/api'
def formatapicall(self, token): return self.format(args={'token': token})
formatter = MyCustomFormatter()
formatter.formatapicall('2764753')
'https://lemonde.fr/api?token=2764753' ```
Arguments
- base_url str: Base url.
- path ?str|list: the url's path.
- args ?dict|list: query arguments as a dictionary or a list of (key, value) pairs.
- formatargvalue ?callable: function taking a query argument key and value and returning the formatted value.
- fragment ?str: the url's fragment.
- ext ?str: path extension such as
.html.
getdomainname
Function returning an url's domain name. This function is of course tld-aware and will return None if no valid domain name can be found.
```python from ural import getdomainname
getdomainname('https://facebook.com/path')
'facebook.com' ```
get_hostname
Function returning the given url's full hostname. It can work on scheme-less urls.
```python from ural import get_hostname
get_hostname('http://www.facebook.com/path')
'www.facebook.com' ```
getfingerprintedhostname
Function returning the "fingerprinted" hostname of the given url by stripping subdomains irrelevant for statistical aggregation. Be warned that this function is even more aggressive than getnormalizedhostname and that the resulting hostname might not be valid anymore.
For more details about this be sure to read this section of the docs.
```python from ural import getnormalizedhostname
getnormalizedhostname('https://www.lemonde.fr/article.html')
'lemonde.fr'
getnormalizedhostname('https://fr-FR.facebook.com/article.html')
'facebook.com'
getnormalizedhostname('https://fr-FR.facebook.com/article.html', strip_suffix=True)
'facebook' ```
Arguments
- url string: target url.
- strip_suffix ?bool [
False]: whether to strip the hostname suffix such as.comor.co.uk. This can be useful to aggegate different domains of the same platform.
getnormalizedhostname
Function returning the given url's normalized hostname, i.e. without usually irrelevant subdomains etc. Works a lot like normalize_url.
For more details about this be sure to read this section of the docs.
```python from ural import getnormalizedhostname
getnormalizedhostname('http://www.facebook.com/path')
'facebook.com'
getnormalizedhostname('http://fr-FR.facebook.com/path')
'facebook.com' ```
Arguments
- url str: Target url.
- infer_redirection bool [
True]: whether to attempt resolving common redirects by leveraging well-known GET parameters. - normalize_amp ?bool [
True]: Whether to attempt to normalize Google AMP subdomains.
hasspecialhost
Function returning whether the given url looks like it has a special host.
```python from ural import hasspecialhost
hasspecialhost('http://104.19.154.83')
True
hasspecialhost('http://youtube.com')
False ```
hasvalidsuffix
Function returning whether the given url has a valid suffix as per Mozzila's Public Suffix List.
```python from ural import hasvalidsuffix
hasvalidsuffix('http://lemonde.fr')
True
hasvalidsuffix('http://lemonde.doesnotexist')
False
Also works with hostnames
hasvalidsuffix('lemonde.fr')
True ```
hasvalidtld
Function returning whether the given url has a valid Top Level Domain (TLD) as per IANA's list.
```python from ural import hasvalidtld
hasvalidtld('http://lemonde.fr')
True
hasvalidtld('http://lemonde.doesnotexist')
False
Also works with hostnames
hasvalidtld('lemonde.fr')
True ```
infer_redirection
Function attempting to find obvious clues in the given url that it is in fact a redirection and resolving the redirection automatically without firing any HTTP request. If nothing is found, the given url will be returned as-is.
The function is by default recursive and will attempt to infer redirections until none is found, but you can disable this behavior if you need to.
```python from ural import infer_redirection
inferredirection('https://www.google.com/url?sa=t&source=web&rct=j&url=https%3A%2F%2Fm.youtube.com%2Fwatch%3Fv%3D4iJBsjHMviQ&ved=2ahUKEwiBm-TO3OvkAhUnA2MBHQRPAR4QwqsBMAB6BAgDEAQ&usg=AOvVaw0i7y2fEy3nwwdIZyo_qug')
'https://m.youtube.com/watch?v=4iJBsjHMviQ'
infer_redirection('https://test.com?url=http%3A%2F%2Flemonde.fr%3Fnext%3Dhttp%253A%252F%252Ftarget.fr')
'http://target.fr'
infer_redirection( 'https://test.com?url=http%3A%2F%2Flemonde.fr%3Fnext%3Dhttp%253A%252F%252Ftarget.fr', recursive=False )
'http://lemonde.fr?next=http%3A%2F%2Ftarget.fr' ```
is_homepage
Function returning whether the given url is probably a website's homepage, based on its path.
```python from ural import is_homepage
is_homepage('http://lemonde.fr')
True
is_homepage('http://lemonde.fr/index.html')
True
is_homepage('http://lemonde.fr/business/article5.html')
False ```
isshortenedurl
Function returning whether the given url is probably a shortened url. It works by matching the given url domain against most prominent shortener domains. So the result could be a false negative.
```python from ural import isshortenedurl
isshortenedurl('http://lemonde.fr')
False
isshortenedurl('http://bit.ly/1sNZMwL')
True ```
isspecialhost
Function returning whether the given hostname looks like a special host.
```python from ural import isspecialhost
isspecialhost('104.19.154.83')
True
isspecialhost('youtube.com')
False ```
istypourl
Function returning whether the given string is probably a typo error. This function doesn't test if the given string is a valid url. It works by matching the given url tld against most prominent typo-like tlds or by matching the given string against most prominent inclusive language terminations. So the result could be a false negative.
```python from ural import istypourl
istypourl('http://dirigeants.es')
True
istypourl('https://www.instagram.com')
False ```
is_url
Function returning whether the given string is a valid url.
```python from ural import is_url
is_url('https://www2.lemonde.fr')
True
isurl('lemonde.fr/economie/article.php', requireprotocol=False)
True
isurl('lemonde.falsetld/whatever.html', tldaware=True)
False ```
Arguments
- string string: string to test.
- require_protocol bool [
True]: whether the argument has to have a protocol to be considered a url. - tld_aware bool [
False]: whether to check if the url's tld actually exists or not. - allowspacesin_path bool [
False]: whether the allow spaces in URL paths. - onlyhttphttps bool [
True]: whether to only allow thehttpandhttpsprotocols.
isvalidtld
Function returning whether the given Top Level Domain (TLD) is valid as per IANA's list.
```python from ural import isvalidtld
isvalidtld('.fr')
True
isvalidtld('com')
True
isvalidtld('.doesnotexist')
False ```
linksfromhtml
Function returning an iterator over the valid outgoing links present in given HTML text.
This is a variant of urlsfromhtml suited to web crawlers. It can deduplicate the urls, canonicalize them, join them with a base url and filter out things that should not be followed such as mailto: or javascript: href links etc. It will also skip any url equivalent to the given base url.
Note this function is able to work both on string and bytes seamlessly.
```python from ural import linksfromhtml
html = b"""
Hey! Check this site: médialab And also this page: article Or click on this: link
"""for link in linksfromhtml('http://lemonde.fr', html): print(link)
'https://medialab.sciencespo.fr/' 'http://lemonde.fr/article.html' ```
Arguments
- base_url string: the HTML's url.
- string string|bytes: html string or bytes.
- encoding ?string [
utf-8]: if given binary, this encoding will be used to decode the found urls. - canonicalize ?bool [
False]: whether to canonicalize the urls using canonicalize_url. - strip_fragment ?bool [
False]: whether to strip the url fragments when usingcanonicalize. - unique ?bool [
False]: whether to deduplicate the urls.
normalize_hostname
Function normalizing the given hostname, i.e. without usually irrelevant subdomains etc. Works a lot like normalize_url.
For more details about this be sure to read this section of the docs.
```python from ural import normalize_hostname
normalize_hostname('www.facebook.com')
'facebook.com'
normalize_hostname('fr-FR.facebook.com')
'facebook.com' ```
normalize_url
Function normalizing the given url by stripping it of usually non-discriminant parts such as irrelevant query items or sub-domains etc.
This is a very useful utility when attempting to match similar urls written slightly differently when shared on social media etc.
For more details about this be sure to read this section of the docs.
```python from ural import normalize_url
normalizeurl('https://www2.lemonde.fr/index.php?utmsource=google')
'lemonde.fr' ```
Arguments
- url string: URL to normalize.
- infer_redirection ?bool [
True]: whether to attempt resolving common redirects by leveraging well-known GET parameters. - fixcommonmistakes ?bool [
True]: whether to attempt to fix common URL mistakes. - normalize_amp ?bool [
True]: whether to attempt to normalize Google AMP urls. - sort_query ?bool [
True]: whether to sort query items. - strip_authentication ?bool [
True]: whether to strip authentication. - strip_fragment ?bool|str [
'except-routing']: whether to strip the url's fragment. If set toexcept-routing, will only strip the fragment if the fragment is not deemed to be js routing (i.e. if it contains a/). - strip_index ?bool [
True]: whether to strip trailing index. - stripirrelevantsubdomains ?bool [
False]: whether to strip irrelevant subdomains such aswwwetc. - strip_protocol ?bool [
True]: whether to strip the url's protocol. - striptrailingslash ?bool [
True]: whether to strip trailing slash. - quoted ?bool [
False]: by default the function will unquote the url as much as possible all while keeping the url safe. If this kwarg is set toTrue, the function will instead quote the url as much as possible all while ensuring nothing will be double-quoted. - platform_aware ?bool [
False]: whether to take some well-known platforms supported byuralsuch as facebook, youtube etc. into account when normalizing the url.
shouldfollowhref
Function returning whether the given href should be followed (usually from a crawler's context). This means it will filter out anchors, and url having a scheme which is not http/https.
```python from ural import shouldfollowhref
shouldfollowhref('#top')
False
shouldfollowhref('http://lemonde.fr')
True
shouldfollowhref('/article.html')
True ```
should_resolve
Function returning whether the given function looks like something you would want to resolve because the url will probably lead to some redirection.
It is quite similar to isshortenedurl but covers more ground since it also deal with url patterns which are not shortened per se.
```python from ural import should_resolve
should_resolve('http://lemonde.fr')
False
should_resolve('http://bit.ly/1sNZMwL')
True
should_resolve('https://doi.org/10.4000/vertigo.26405')
True ```
split_suffix
Function splitting a hostname or a url's hostname into the domain part and the suffix part (while respecting Mozzila's Public Suffix List).
```python from ural import split_suffix
split_suffix('http://www.bbc.co.uk/article.html')
('www.bbc', 'co.uk')
split_suffix('http://www.bbc.idontexist')
None
split_suffix('lemonde.fr')
('lemonde', 'fr') ```
strip_protocol
Function removing the protocol from the url.
```python from ural import strip_protocol
strip_protocol('https://www2.lemonde.fr/index.php')
'www2.lemonde.fr/index.php' ```
Arguments
- url string: URL to format.
urlpathsplit
Function taking a url and returning its path, tokenized as a list.
```python from ural import urlpathsplit
urlpathsplit('http://lemonde.fr/section/article.html')
['section', 'article.html']
urlpathsplit('http://lemonde.fr/')
[]
If you want to split a path directly
from ural import pathsplit
pathsplit('/section/articles/')
['section', 'articles'] ```
urlsfromhtml
Function returning an iterator over the urls present in the links of given HTML text.
Note this function is able to work both on string and bytes seamlessly.
```python from ural import urlsfromhtml
html = """
Hey! Check this site: médialab
"""for url in urlsfromhtml(html): print(url)
'https://medialab.sciencespo.fr/' ```
Arguments
- string string|bytes: html string or bytes.
- encoding ?string [
utf-8]: if given binary, this encoding will be used to decode the found urls. - errors ?string [
strict]: policy on decode errors.
urlsfromtext
Function returning an iterator over the urls present in the string argument. Extracts only urls having a protocol.
Note that this function is somewhat markdown-aware, and punctuation-aware.
```python from ural import urlsfromtext
text = "Hey! Check this site: https://medialab.sciencespo.fr/, it looks really cool. They're developing many tools on https://github.com/"
for url in urlsfromtext(text): print(url)
'https://medialab.sciencespo.fr/' 'https://github.com/' ```
Arguments
- string string: source string.
Upgrading suffixes and TLDs
If you want to upgrade the package's data wrt Mozilla suffixes and IANA TLDs, you can do so either by running the following command:
bash
python -m ural upgrade
or directly in your python code:
```python from ural.tld import upgrade
upgrade()
Or if you want to patch runtime only this time, or regularly
(for long running programs or to avoid rights issues etc.):
upgrade(transient=True) ```
HostnameTrieSet
Class implementing a hierarchic set of hostnames so you can efficiently query whether urls match hostnames in the set.
```python from ural import HostnameTrieSet
trie = HostnameTrieSet()
trie.add('lemonde.fr') trie.add('business.lefigaro.fr')
trie.match('https://liberation.fr/article1.html')
False
trie.match('https://lemonde.fr/article1.html')
True
trie.match('https://www.lemonde.fr/article1.html')
True
trie.match('https://lefigaro.fr/article1.html')
False
trie.match('https://business.lefigaro.fr/article1.html')
True ```
#.add
Method add a single hostname to the set.
```python from ural import HostnameTrieSet
trie = HostnameTrieSet() trie.add('lemonde.fr') ```
Arguments
- hostname string: hostname to add to the set.
#.match
Method returning whether the given url matches any of the set's hostnames.
```python from ural import HostnameTrieSet
trie = HostnameTrieSet() trie.add('lemonde.fr')
trie.match('https://liberation.fr/article1.html')
False
trie.match('https://lemonde.fr/article1.html')
True ```
Arguments
- url string|urllib.parse.SplitResult: url to match.
lru.urltolru
Function converting the given url to a serialized lru.
```python from ural.lru import urltolru
urltolru('http://www.lemonde.fr:8000/article/1234/index.html?field=value#2')
's:http|t:8000|h:fr|h:lemonde|h:www|p:article|p:1234|p:index.html|q:field=value|f:2|' ```
Arguments
- url string: url to convert.
- suffix_aware ?bool: whether to be mindful of suffixes when converting (e.g. considering "co.uk" as a single token).
lru.lrutourl
Function converting the given serialized lru or lru stems to a proper url.
```python from ural.lru import lrutourl
lrutourl('s:http|t:8000|h:fr|h:lemonde|h:www|p:article|p:1234|p:index.html|')
'http://www.lemonde.fr:8000/article/1234/index.html'
lrutourl(['s:http', 'h:fr', 'h:lemonde', 'h:www', 'p:article', 'p:1234', 'p:index.html'])
'http://www.lemonde.fr:8000/article/1234/index.html' ```
lru.lru_stems
Function returning url parts in hierarchical order.
```python from ural.lru import lru_stems
lru_stems('http://www.lemonde.fr:8000/article/1234/index.html?field=value#2')
['s:http', 't:8000', 'h:fr', 'h:lemonde', 'h:www', 'p:article', 'p:1234', 'p:index.html', 'q:field=value', 'f:2'] ```
Arguments
- url string: URL to parse.
- suffix_aware ?bool: whether to be mindful of suffixes when converting (e.g. considering "co.uk" as a single token).
lru.canonicalizedlrustems
Function canonicalizing the url and returning its parts in hierarchical order.
```python from ural.lru import canonicalizedlrustems
canonicalizedlrustems('http://www.lemonde.fr/article/1234/index.html?field=value#2')
['s:http', 'h:fr', 'h:lemonde', 'p:article', 'p:1234', 'q:field=value', 'f:2'] ```
Arguments
This function accepts the same arguments as canonicalize_url.
lru.normalizedlrustems
Function normalizing the url and returning its parts in hierarchical order.
```python from ural.lru import normalizedlrustems
normalizedlrustems('http://www.lemonde.fr/article/1234/index.html?field=value#2')
['h:fr', 'h:lemonde', 'p:article', 'p:1234', 'q:field=value'] ```
Arguments
This function accepts the same arguments as normalize_url.
lru.fingerprintedlrustems
Function fingerprinting the url and returning its parts in hierarchical order.
```python from ural.lru import fingerprintedlrustems
fingerprintedlrustems('http://www.lemonde.fr/article/1234/index.html?field=value#2', strip_suffix=True)
['h:lemonde', 'p:article', 'p:1234', 'q:field=value'] ```
Arguments
This function accepts the same arguments as fingerprint_url.
lru.serialize_lru
Function serializing lru stems to a string.
```python from ural.lru import serialize_lru
serialize_lru(['s:https', 'h:fr', 'h:lemonde'])
's:https|h:fr|h:lemonde|' ```
lru.unserialize_lru
Function unserializing stringified lru to a list of stems.
```python from ural.lru import unserialize_lru
unserialize_lru('s:https|h:fr|h:lemonde|')
['s:https', 'h:fr', 'h:lemonde'] ```
LRUTrie
Class implementing a prefix tree (Trie) storing URLs hierarchically by storing them as LRUs along with some arbitrary metadata. It is very useful when needing to match URLs by longest common prefix.
Note that this class directly inherits from the phylactery library's TrieDict so you can also use any of its methods.
```python from ural.lru import LRUTrie
trie = LRUTrie()
To respect suffixes
trie = LRUTrie(suffix_aware=True) ```
#.set
Method storing a URL in a LRUTrie along with its metadata.
```python from ural.lru import LRUTrie
trie = LRUTrie() trie.set('http://www.lemonde.fr', {'type': 'general press'})
trie.match('http://www.lemonde.fr')
{'type': 'general press'} ```
Arguments
- url string: url to store in the LRUTrie.
- metadata any: metadata of the url.
#.set_lru
Method storing a URL already represented as a LRU or LRU stems along with its metadata.
```python from ural.lru import LRUTrie
trie = LRUTrie()
Using stems
trie.set_lru(['s:http', 'h:fr', 'h:lemonde', 'h:www'], {'type': 'general press'})
Using serialized lru
trie.setlru('s:http|h:fr|h:lemonde|h:www|', {'type': 'generalpress'}) ```
Arguments
- lru string|list: lru to store in the Trie.
- metadata any: metadata to attach to the lru.
#.match
Method returning the metadata attached to the longest prefix match of your query URL. Will return None if no common prefix can be found.
```python from ural.lru import LRUTrie
trie = LRUTrie() trie.set('http://www.lemonde.fr', {'media': 'lemonde'})
trie.match('http://www.lemonde.fr')
{'media': 'lemonde'} trie.match('http://www.lemonde.fr/politique') {'media': 'lemonde'}
trie.match('http://www.lefigaro.fr')
None ```
Arguments
- url string: url to match in the LRUTrie.
#.match_lru
Method returning the metadata attached to the longest prefix match of your query LRU. Will return None if no common prefix can be found.
```python from ural.lru import LRUTrie
trie = LRUTrie() trie.set(['s:http', 'h:fr', 'h:lemonde', 'h:www'], {'media': 'lemonde'})
trie.match(['s:http', 'h:fr', 'h:lemonde', 'h:www'])
{'media': 'lemonde'} trie.match('s:http|h:fr|h:lemonde|h:www|p:politique|') {'media': 'lemonde'}
trie.match(['s:http', 'h:fr', 'h:lefigaro', 'h:www'])
None ```
Arguments
- lru string|list: lru to match in the LRUTrie.
CanonicalizedLRUTrie
The CanonicalizedLRUTrie is nearly identical to the standard LRUTrie except that it canonicalizes given urls before attempting any operation using the canonicalize_url function.
Its constructor therefore takes the same arguments as the beforementioned function.
```python from ural.lru import CanonicalizedLRUTrie
trie = CanonicalizedLRUTrie(strip_fragment=False) ```
NormalizedLRUTrie
The NormalizedLRUTrie is nearly identical to the standard LRUTrie except that it normalizes given urls before attempting any operation using the normalize_url function.
Its constructor therefore takes the same arguments as the beforementioned function.
```python from ural.lru import NormalizedLRUTrie
trie = NormalizedLRUTrie(normalize_amp=False) ```
FingerprintedLRUTrie
The FingerprintedLRUTrie is nearly identical to the standard LRUTrie except that it fingerprints given urls before attempting any operation using the fingerprint_url function.
Its constructor therefore takes the same arguments as the beforementioned function.
```python from ural.lru import FingerprintedLRUTrie
trie = FingerprintedLRUTrie(strip_suffix=False) ```
hasfacebookcomments
Function returning whether the given url is pointing to a Facebook resource potentially having comments (such as a post, photo or video for instance).
```python from ural.facebook import hasfacebookcomments
hasfacebookcomments('https://www.facebook.com/permalink.php?story_fbid=1354978971282622&id=598338556946671')
True
hasfacebookcomments('https://www.facebook.com/108824017345866/videos/311658803718223')
True
hasfacebookcomments('https://www.facebook.com/astucerie/')
False
hasfacebookcomments('https://www.lemonde.fr')
False
hasfacebookcomments('/permalink.php?storyfbid=1354978971282622&id=598338556946671', allowrelative_urls=True)
True ```
isfacebookid
Function returning whether the given string is a valid Facebook id or not.
```python from ural.facebook import isfacebookid
isfacebookid('974583586343')
True
isfacebookid('whatever')
False ```
isfacebookfull_id
Function returning whether the given string is a valid Facebook full post id or not.
```python from ural.facebook import isfacebookfull_id
isfacebookfullid('9745835863439749757953')
True
isfacebookfull_id('974583586343')
False
isfacebookfull_id('whatever')
False ```
isfacebookurl
Function returning whether given url is from Facebook or not.
```python from ural.facebook import isfacebookurl
isfacebookurl('http://www.facebook.com/post/974583586343')
True
isfacebookurl('https://fb.me/846748464')
True
isfacebookurl('https://www.lemonde.fr')
False ```
isfacebookpost_url
Function returning whether the given url is a Facebook post or not.
```python from ural.facebook import isfacebookpost_url
isfacebookpost_url('http://www.facebook.com/post/974583586343')
True
isfacebookpost_url('http://www.facebook.com')
False
isfacebookpost_url('https://www.lemonde.fr')
False ```
isfacebooklink
Function returning whether the given url is a Facebook redirection link.
```python from ural.facebook import isfacebooklink
isfacebooklink('https://l.facebook.com/l.php?u=http%3A%2F%2Fwww.chaos-controle.com%2Farchives%2F2013%2F10%2F14%2F28176300.html&h=AT0iUqJpUTMzHAH8HAXwZ11p8P3Z-SrY90wIXZhcjMnxBTHMiau8Fv1hvz00ZezRegqmF86SczyUXx3GzdtMdFH-I4CwHIXKKU9L6w522xwOqkOvLAylxojGEwrp341uC-GlVyGE2N7XwTPK9cpP0mQ8PIrWh8Qj2gHIIR08Js0mUr7G8Qe9fx66uYcfnNfTTF1xi0Us8gTo4fOZxAgidGWXsdgtUOdvQqyEm97oHzKbWfXjkhsrzbtb8ZNMDwCP5099IMcKRD8Hi5H7W3vwh9hdJlRgm5Z074epDmGAeoEATEQUVNTxO0SHO4XNn2Z7LgBamvevu1ENBcuyuSOYA0BsY2cx8mPWJ9t44tQcnmyQhBlYmYmszDaQx9IfVP26PRqhsTLz-kZzv0DGMiJFU78LVWVPc9QSw2f9mA5JYWr29w12xJJ5XGQ6DhJxDMWRnLdG8Tnd7gZKCaRdqDER1jkO72u75-o4YuV3CLh4j-_4u0fnHSzHdVD8mxr9pNEgu8rvJF1E2H3-XbzA6F2wMQtFCejH8MBakzYtTGNvHSexSiKphE04Ci1Z23nBjCZFsgNXwL3wbIXWfHjh2LCKyihQauYsnvxp6fyioStJSGgyA9GGEswizHa20lucQF0S0F8H9-')
True
isfacebooklink('https://lemonde.fr')
False ```
convertfacebookurltomobile
Function returning the mobile version of the given Facebook url. Will raise an exception if a non-Facebook url is given.
```python from ural.facebook import convertfacebookurltomobile
convertfacebookurltomobile('http://www.facebook.com/post/974583586343')
'http://m.facebook.com/post/974583586343' ```
parsefacebookurl
Function parsing the given Facebook url.
```python from ural.facebook import parsefacebookurl
Importing related classes if you need to perform tests
from ural.facebook import ( FacebookHandle, FacebookUser, FacebookGroup, FacebookPost, FacebookPhoto, FacebookVideo )
parsefacebookurl('https://www.facebook.com/people/Sophia-Aman/102016783928989')
FacebookUser(id='102016783928989')
parsefacebookurl('https://www.facebook.com/groups/159674260452951')
FacebookGroup(id='159674260452951')
parsefacebookurl('https://www.facebook.com/groups/159674260852951/permalink/1786992671454427/')
FacebookPost(id='1786992671454427', group_id='159674260852951')
parsefacebookurl('https://www.facebook.com/108824017345866/videos/311658803718223')
FacebookVideo(id='311658803718223', parent_id='108824017345866')
parsefacebookurl('https://www.facebook.com/photo.php?fbid=10222721681573727')
FacebookPhoto(id='10222721681573727')
parsefacebookurl('/annelaure.rivolu?rc=p&tn=R', allowrelativeurls=True)
FacebookHandle(handle='annelaure.rivolu')
parsefacebookurl('https://lemonde.fr')
None ```
extracturlfromfacebooklink
Function extracting target url from a Facebook redirection link.
```python from ural.facebook import extracturlfromfacebooklink
extracturlfromfacebooklink('https://l.facebook.com/l.php?u=http%3A%2F%2Fwww.chaos-controle.com%2Farchives%2F2013%2F10%2F14%2F28176300.html&h=AT0iUqJpUTMzHAH8HAXwZ11p8P3Z-SrY90wIXZhcjMnxBTHMiau8Fv1hvz00ZezRegqmF86SczyUXx3GzdtMdFH-I4CwHIXKKU9L6w522xwOqkOvLAylxojGEwrp341uC-GlVyGE2N7XwTPK9cpP0mQ8PIrWh8Qj2gHIIR08Js0mUr7G8Qe9fx66uYcfnNfTTF1xi0Us8gTo4fOZxAgidGWXsdgtUOdvQqyEm97oHzKbWfXjkhsrzbtb8ZNMDwCP5099IMcKRD8Hi5H7W3vwh9hdJlRgm5Z074epDmGAeoEATEQUVNTxO0SHO4XNn2Z7LgBamvevu1ENBcuyuSOYA0BsY2cx8mPWJ9t44tQcnmyQhBlYmYmszDaQx9IfVP26PRqhsTLz-kZzv0DGMiJFU78LVWVPc9QSw2f9mA5JYWr29w12xJJ5XGQ6DhJxDMWRnLdG8Tnd7gZKCaRdqDER1jkO72u75-o4YuV3CLh4j-_4u0fnHSzHdVD8mxr9pNEgu8rvJF1E2H3-XbzA6F2wMQtFCejH8MBakzYtTGNvHSexSiKphE04Ci1Z23nBjCZFsgNXwL3wbIXWfHjh2LCKyihQauYsnvxp6fyioStJSGgyA9GGEswizHa20lucQF0S0F8H9-')
'http://www.chaos-controle.com/archives/2013/10/14/28176300.html'
extracturlfromfacebooklink('http://lemonde.fr')
None ```
isampurl
Returns whether the given url is probably a Google AMP url.
```python from ural.google import isampurl
isampurl('http://www.europe1.fr/sante/les-onze-vaccins.amp')
True
isampurl('https://www.lemonde.fr')
False ```
isgooglelink
Returns whether the given url is a Google search link.
```python from ural.google import isgooglelink
isgooglelink('https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=2ahUKEwjp8Lih_LnmAhWQlxQKHVTmCJYQFjADegQIARAB&url=http%3A%2F%2Fwww.mon-ip.com%2F&usg=AOvVaw0sfeZJyVtUS2smoyMlJmes')
True
isgooglelink('https://www.lemonde.fr')
False ```
extracturlfromgooglelink
Extracts the url from the given Google search link. This is useful to "resolve" the links scraped from Google's search results. Returns None if given url is not valid nor relevant.
```python from ural.google import extracturlfromgooglelink
extracturlfromgooglelink('https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=2ahUKEwicu4K-rZzmAhWOEBQKHRNWA08QFjAAegQIARAB&url=https%3A%2F%2Fwww.facebook.com%2Fieff.ogbeide&usg=AOvVaw0vrBVCiIHUr5pncjeLpPUp')
'https://www.facebook.com/ieff.ogbeide'
extracturlfromgooglelink('https://www.lemonde.fr')
None ```
extractidfromgoogledrive_url
Extracts a file id from the given Google drive url. Returns None if given url is not valid nor relevant.
```python from ural.google import extractidfromgoogledrive_url
extractidfromgoogledrive_url('https://docs.google.com/spreadsheets/d/1Q9sJtAb1BZhUMjxCLMrVASx3AoNDp5iV3VkbPjlg/edit#gid=0')
'1Q9sJtAb1BZhUMjxCLMrVASx3AoNDp5iV3VkbPjlg'
extractidfromgoogledrive_url('https://www.lemonde.fr')
None ```
parsegoogledrive_url
Parse the given Google drive url. Returns None if given is not valid nor relevant.
```python from ural.google import ( parsegoogledrive_url, GoogleDriveFile, GoogleDrivePublicLink )
parsegoogledrive_url('https://docs.google.com/spreadsheets/d/1Q9sJtAb1BZhUMjxCLMrVASx3AoNDp5iV3VkbPjlg/edit#gid=0')
GoogleDriveFile('spreadsheets', '1Q9sJtAb1BZhUMjxCLMrVASx3AoNDp5iV3VkbPjlg')
parsegoogledrive_url('https://www.lemonde.fr')
None ```
isinstagrampost_shortcode
Function returning whether the given string is a valid Instagram post shortcode or not.
```python from ural.instagram import isinstagrampost_shortcode
isinstagrampostshortcode('974583By-586343')
True
isinstagrampost_shortcode('whatever!!')
False ```
isinstagramusername
Function returning whether the given string is a valid Instagram username or not.
```python from ural.instagram import isinstagramusername
isinstagramusername('97458.3By-5_86343')
True
isinstagramusername('whatever!!')
False ```
isinstagramurl
Returns whether the given url is from Instagram.
```python from ural.instagram import isinstagramurl
isinstagramurl('https://lemonde.fr')
False
isinstagramurl('https://www.instagram.com/guillaumelatorre')
True ```
extractusernamefrominstagramurl
Return a username from the given Instagram url or None if we could not find one.
```python from ural.instagram import extractusernamefrominstagramurl
extractusernamefrominstagramurl('https://www.instagram.com/martin_dupont/p/BxKRx5CHn5i/')
'martin_dupont'
extractusernamefrominstagramurl('https://lemonde.fr')
None
```
parseinstagramurl
Returns parsed information about the given Instagram url: either about the post, the user or the reel. If the url is an invalid Instagram url or if not an Instagram url, the function returns None.
```python from ural.instagram import ( parseinstagramurl,
# You can also import the named tuples if you need them InstagramPost, InstagramUser, InstagramReel )
parseinstagramurl('https://www.instagram.com/martin_dupont/p/BxKRx5CHn5i/')
InstagramPost(id='BxKRx5CHn5i', name='martin_dupont')
parseinstagramurl('https://lemonde.fr')
None
parseinstagramurl('https://www.instagram.com/p/BxKRx5-Hn5i/')
InstagramPost(id='BxKRx5-Hn5i', name=None)
parseinstagramurl('https://www.instagram.com/martin_dupont')
InstagramUser(name='martin_dupont')
parseinstagramurl('https://www.instagram.com/reels/BxKRx5-Hn5i')
InstagramReel(id='BxKRx5-Hn5i') ```
Arguments
- url str: Instagram url to parse.
Telegram
istelegrammessage_id
Function returning whether the given string is a valid Telegram message id or not.
```python from ural.telegram import istelegrammessage_id
istelegrammessage_id('974583586343')
True
istelegrammessage_id('whatever')
False ```
istelegramurl
Returns whether the given url is from Telegram.
```python from ural.telegram import istelegramurl
istelegramurl('https://lemonde.fr')
False
istelegramurl('https://telegram.me/guillaumelatorre')
True
istelegramurl('https://t.me/s/jesstern')
True ```
converttelegramurltopublic
Function returning the public version of the given Telegram url. Will raise an exception if a non-Telegram url is given.
```python from ural.teglegram import converttelegramurltopublic
converttelegramurltopublic('https://t.me/jesstern')
'https://t.me/s/jesstern' ```
extractchannelnamefromtelegram_url
Return a channel from the given Telegram url or None if we could not find one.
```python from ural.telegram import extractchannelnamefromtelegram_url
extractchannelnamefromtelegram_url('https://t.me/s/jesstern/345')
'jesstern'
extractchannelnamefromtelegram_url('https://lemonde.fr')
None
```
parsetelegramurl
Returns parsed information about the given telegram url: either about the channel, message or user. If the url is an invalid Telegram url or if not a Telegram url, the function returns None.
```python from ural.telegram import ( parsetelegramurl,
# You can also import the named tuples if you need them TelegramMessage, TelegramChannel, TelegramGroup )
parsetelegramurl('https://t.me/s/jesstern/76')
TelegramMessage(name='jesstern', id='76')
parsetelegramurl('https://lemonde.fr')
None
parsetelegramurl('https://telegram.me/rapsocialclub')
TelegramChannel(name='rapsocialclub')
parsetelegramurl('https://t.me/joinchat/AAAAAE9B8u_wO9d4NiJp3w')
TelegramGroup(id='AAAAAE9B8u_wO9d4NiJp3w') ```
Arguments
- url str: Telegram url to parse.
istwitterurl
Returns whether the given url is from Twitter.
```python from ural.twitter import istwitterurl
istwitterurl('https://lemonde.fr')
False
istwitterurl('https://www.twitter.com/Yomguithereal')
True
istwitterurl('https://twitter.com')
True ```
extractscreennamefromtwitter_url
Extracts a normalized user's screen name from a Twitter url. If given an irrelevant url, the function will return None.
```python from ural.twitter import extractscreennamefromtwitter_url
extractscreennamefromtwitter_url('https://www.twitter.com/Yomguithereal')
'yomguithereal'
extractscreennamefromtwitter_url('https://twitter.fr')
None ```
parsetwitterurl
Takes a Twitter url and returns either a TwitterUser namedtuple (contains a screenname) if the given url is a link to a twitter user, a TwitterTweet namedtuple (contains a userscreen_name and an id) if the given url is a tweet's url, a TwitterList namedtuple (contains an id) or None if the given url is irrelevant.
```python from ural.twitter import parsetwitterurl
parsetwitterurl('https://twitter.com/Yomguithereal')
TwitterUser(screen_name='yomguithereal')
parsetwitterurl('https://twitter.com/medialab_ScPo/status/1284154793376784385')
TwitterTweet(userscreenname='medialab_scpo', id='1284154793376784385')
parsetwitterurl('https://twitter.com/i/lists/15512656222798157826')
TwitterList(id='15512656222798157826')
parsetwitterurl('https://twitter.com/home')
None ```
Youtube
isyoutubeurl
Returns whether the given url is from Youtube.
```python from ural.youtube import isyoutubeurl
isyoutubeurl('https://lemonde.fr')
False
isyoutubeurl('https://www.youtube.com/watch?v=otRTOE9i51o')
True
isyoutubeurl('https://youtu.be/otRTOE9i51o)
True ```
isyoutubechannel_id
Returns whether the given string is a formally valid Youtube channel id. Note that it won't validate the fact that this id actually refers to an existing channel or not. You will need to call YouTube servers for that.
```python from ural.youtube import isyoutubechannel_id
isyoutubechannel_id('UCCCPCZNChQdGa9EkATeye4g')
True
isyoutubechannel_id('@France24')
False ```
isyoutubevideo_id
Returns whether the given string is a formally valid YouTube video id. Note that it won't validate the fact that this id actually refers to an existing video or not. You will need to call YouTube servers for that.
```python from ural.youtube import isyoutubevideo_id
isyoutubevideo_id('otRTOE9i51o')
True
isyoutubevideo_id('bDYTYET')
False ```
parseyoutubeurl
Returns parsed information about the given youtube url: either about the linked video, user or channel. If the url is an invalid Youtube url or if not a Youtube url, the function returns None.
```python from ural.youtube import ( parseyoutubeurl,
# You can also import the named tuples if you need them YoutubeVideo, YoutubeUser, YoutubeChannel, YoutubeShort, )
parseyoutubeurl('https://www.youtube.com/watch?v=otRTOE9i51o')
YoutubeVideo(id='otRTOE9i51o')
parseyoutubeurl('https://www.youtube.com/shorts/GINlKobb41w')
YoutubeShort(id='GINlKobb41w')
parseyoutubeurl('https://lemonde.fr')
None
parseyoutubeurl('http://www.youtube.com/channel/UCWvUxN9LAjJ-sTc5JJ3gEyA/videos')
YoutubeChannel(id='UCWvUxN9LAjJ-sTc5JJ3gEyA', name=None)
parseyoutubeurl('http://www.youtube.com/user/ojimfrance')
YoutubeUser(id=None, name='ojimfrance')
parseyoutubeurl('https://www.youtube.com/taranisnews')
YoutubeChannel(id=None, name='taranisnews') ```
Arguments
- url str: Youtube url to parse.
- fixcommonmistakes bool [
True]: Whether to fix common mistakes that can be found in Youtube urls as you can find them when crawling the web.
extractvideoidfromyoutube_url
Return a video id from the given Youtube url or None if we could not find one. Note that this will also work with Youtube shorts.
```python from ural.youtube import extractvideoidfromyoutube_url
extractvideoidfromyoutube_url('https://www.youtube.com/watch?v=otRTOE9i51o')
'otRTOE9i51o'
extractvideoidfromyoutube_url('https://lemonde.fr')
None
extractvideoidfromyoutube_url('http://youtu.be/afa-5HQHiAs')
'afa-5HQHiAs' ```
normalizeyoutubeurl
Returns a normalized version of the given Youtube url. It will normalize video, user and channel urls so you can easily match them.
```python from ural.youtube import normalizeyoutubeurl
normalizeyoutubeurl('https://www.youtube.com/watch?v=otRTOE9i51o')
'https://www.youtube.com/watch?v=otRTOE9i51o'
normalizeyoutubeurl('http://youtu.be/afa-5HQHiAs')
'https://www.youtube.com/watch?v=afa-5HQHiAs' ```
Miscellaneous
About LRUs
TL;DR: a LRU is a hierarchical reordering of a URL so that one can perform meaningful prefix queries on URLs.
If you observe many URLs, you will quickly notice that they are not written in sound hierarchical order. In this URL, for instance:
http://business.lemonde.fr/articles/money.html?id=34#content
Some parts, such as the subdomain, are written in an "incorrect order". And this is fine, really, this is how URLs always worked.
But if what you really want is to match URLs, you will need to reorder them so that their order closely reflects the hierarchy of their targeted content. And this is exactly what LRUs are (that and also a bad pun on URL, since a LRU is basically a "reversed" URL).
Now look how the beforementioned URL could be splitted into LRU stems:
python
[
's:http',
'h:fr',
'h:lemonde',
'h:business',
'p:articles',
'p:money.html',
'q:id=34',
'f:content'
]
And typically, this list of stems will be serialized thusly:
s:http|h:fr|h:lemonde|h:business|p:articles|p:money.html|q:id=34|f:content|
The trailing slash is added so that serialized LRUs can be prefix-free.
Owner
- Name: médialab Sciences Po
- Login: medialab
- Kind: organization
- Location: Paris, France
- Website: https://medialab.sciencespo.fr
- Repositories: 236
- Profile: https://github.com/medialab
SciencesPo's médialab is an interdisciplinary research laboratory gathering engineers, designers & social science researchers.
GitHub Events
Total
- Issues event: 5
- Watch event: 8
- Issue comment event: 1
- Push event: 4
- Pull request event: 1
- Fork event: 1
Last Year
- Issues event: 5
- Watch event: 8
- Issue comment event: 1
- Push event: 4
- Pull request event: 1
- Fork event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Yomguithereal | g****e@g****m | 511 |
| farjasju | j****s@g****m | 61 |
| oubine | o****n@g****m | 17 |
| MiguelLaura | 1****a | 15 |
| ameliepelle | 8****e | 7 |
| elanhermi | h****n@g****m | 7 |
| Martin Delabre | d****n@g****m | 6 |
| d3scmps | j****s@g****m | 5 |
| bmaz | b****r@g****m | 4 |
| Benjamin Ooghe-Tabanou | b****e@s****r | 4 |
| César | c****n@g****m | 3 |
| AnnaCharles | 1****s | 2 |
| Kelly Christensen | 6****l | 1 |
| julienp | 1****e | 1 |
| paubre | p****u@o****r | 1 |
Committer Domains (Top 20 + Academic)
Packages
- Total packages: 1
-
Total downloads:
- pypi 3,297 last-month
- Total dependent packages: 3
- Total dependent repositories: 13
- Total versions: 89
- Total maintainers: 3
pypi.org: ural
A helper library full of URL-related heuristics.
- Homepage: http://github.com/medialab/ural
- Documentation: https://ural.readthedocs.io/
- License: GPL-3.0
-
Latest release: 1.5.0
published 11 months ago
Rankings
Maintainers (3)
Dependencies
- actions/checkout v2 composite
- actions/setup-python v2 composite
- black ==22.8.0
- importchecker ==2.0
- more-itertools <8.6
- pycountry >=18.12.8,<19
- pytest ==3.5.1
- tld ==0.12.1
- tqdm ==4.31.1
- twine ==1.11.0
- wheel *
- pycountry >=18.12.8,<19