https://github.com/ctuavastlab/url2mill.jl

An extension of Mill.jl to convert URLs to Mill structure

https://github.com/ctuavastlab/url2mill.jl

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (3.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

An extension of Mill.jl to convert URLs to Mill structure

Basic Info
  • Host: GitHub
  • Owner: CTUAvastLab
  • Language: Julia
  • Default Branch: main
  • Size: 17.6 KB
Statistics
  • Stars: 0
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed over 3 years ago
Metadata Files
Readme

README.md

Url2Mill.jl

An extension of Mill.jl to convert URLs to Mill structure

A simple library implementing representation of URLs from the paper Nested Multiple Instance Learning in Modelling of HTTP network traffic, Tomas Pevny, Marek Dedic, 2020

Example: ```julia using Url2Mill

julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home20175_9.css?v=jd201705182030") ProductNode # 1 obs, 152 bytes ├── hostname: BagNode # 1 obs, 104 bytes │ ╰── ArrayNode(2053×3 NGramMatrix with Int64 elements) # 3 obs, 166 bytes ├────── path: BagNode # 1 obs, 104 bytes │ ╰── ArrayNode(2053×5 NGramMatrix with Int64 elements) # 5 obs, 214 bytes ╰───── query: BagNode # 1 obs, 136 bytes ╰── ProductNode # 1 obs, 64 bytes ├──── key: ArrayNode(2053×1 NGramMatrix with Int64 elements) # 1 o ⋯ ╰── value: ArrayNode(2053×1 NGramMatrix with Int64 elements) # 1 o ⋯

If you want to represent strings by ngrams directly as `SparseArrays`, use `use_sparse_arrays = true` julia julia> ds = url2mill("st.360buyimg.com/m/css/2014/index/home201759.css?v=jd201705182030";usesparse_arrays = true) ProductNode # 1 obs, 184 bytes ├── hostname: BagNode # 1 obs, 112 bytes │ ╰── ArrayNode(2053×3 SparseMatrixCSC with Int64 elements) # 3 obs, 552 b ⋯ ├────── path: BagNode # 1 obs, 112 bytes │ ╰── ArrayNode(2053×5 SparseMatrixCSC with Int64 elements) # 5 obs, 888 b ⋯ ╰───── query: BagNode # 1 obs, 152 bytes ╰── ProductNode # 1 obs, 80 bytes ├──── key: ArrayNode(2053×1 SparseMatrixCSC with Int64 elements) # ⋯ ╰── value: ArrayNode(2053×1 SparseMatrixCSC with Int64 elements) # ⋯ ```

Owner

  • Name: Joint research lab of Czech Technical University in Prague and Avast
  • Login: CTUAvastLab
  • Kind: organization
  • Location: Prague

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 6
  • Total Committers: 1
  • Avg Commits per committer: 6.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
pevnak p****k@g****m 6

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels