https://github.com/ardoco/magika
Java implementation of Google's magika tool to predict file types
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.1%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
Java implementation of Google's magika tool to predict file types
Basic Info
- Host: GitHub
- Owner: ArDoCo
- License: apache-2.0
- Language: Java
- Default Branch: main
- Size: 2.75 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Created over 1 year ago
· Last pushed over 1 year ago
Metadata Files
License
https://github.com/ArDoCo/magika/blob/main/
# Magika This is a Java library that includes [Google's magika](https://github.com/google/magika). According to the README at [google/magika](https://github.com/google/magika): > Magika is a novel AI powered file type detection tool that relies on the recent advance of deep learning to provide accurate detection. Under the hood, Magika employs a custom, highly optimized Keras model that only weighs about a few MBs, and enables precise file identification within milliseconds, even when running on a single CPU. > In an evaluation with over 1M files and over 100 content types (covering both binary and textual file formats), Magika achieves 99%+ precision and recall. Magika is used at scale to help improve Google users safety by routing Gmail, Drive, and Safe Browsing files to the proper security and content policy scanners. Read more in our [research paper](https://arxiv.org/abs/2409.13768! > The Magika paper was accepted at IEEE/ACM International Conference on Software Engineering (ICSE) 2025! See also [their website](https://google.github.io/magika/). ## Getting Started This library is meant to be used as a dependency in your Java project. It is available on Maven Central. We provide a simple API to the ONNX model for classification (v0.6.0). If you need a CLI, we suggest you use the original [google/magika](https://github.com/google/magika) project. ### Maven ```xml``` ## Known Limitations & Contributing Magika significantly improves over the state of the art, but there's always room for improvement! More work can be done to increase detection accuracy, support for additional content types, bindings for more languages, etc. This initial release is not targeting polyglot detection, and we're looking forward to seeing adversarial examples from the community. We would also love to hear from the community about encountered problems, misdetections, features requests, need for support for additional content types, etc. ## License Apache 2.0; see [LICENSE](LICENSE) for details. The model is licensed by the original authors under Apache 2.0, see their [LICENSE](./src/main/resources/magika/LICENSE). io.github.ardoco magika 0.6.1
Owner
- Name: ArDoCo
- Login: ArDoCo
- Kind: organization
- Location: Germany
- Website: https://ardoco.github.io/
- Repositories: 15
- Profile: https://github.com/ArDoCo
Architecture Documentation Consistency - Aiming to provide consistency analyses between formal models and informal (textual) documentation
GitHub Events
Total
- Release event: 1
- Watch event: 4
- Push event: 14
- Create event: 3
Last Year
- Release event: 1
- Watch event: 4
- Push event: 14
- Create event: 3
Dependencies
pom.xml
maven
- com.fasterxml.jackson.core:jackson-databind 2.16.1
- com.google.errorprone:error_prone_core 2.36.0
- com.microsoft.onnxruntime:onnxruntime 1.20.0
- org.slf4j:slf4j-api 2.0.14
- org.slf4j:slf4j-simple 2.0.14
- org.junit.jupiter:junit-jupiter-api 5.11.0 test
- org.junit.jupiter:junit-jupiter-engine 5.11.0 test