djent

A reimplementation of the Fourmilab/John Walker random number test program ent with several improvements.

https://github.com/dj-on-github/djent

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A reimplementation of the Fourmilab/John Walker random number test program ent with several improvements.

Basic Info
  • Host: GitHub
  • Owner: dj-on-github
  • License: gpl-2.0
  • Language: C
  • Default Branch: master
  • Homepage:
  • Size: 514 KB
Statistics
  • Stars: 14
  • Watchers: 4
  • Forks: 2
  • Open Issues: 0
  • Releases: 1
Created almost 9 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

djent

djent is a reimplementation of the Fourmilab/John Walker random number test program ent.

The improvements are:

  • Multiple input file names can be provided at once. This works nicely with the CSV format output.
  • -h works as well as -u to get the help information.
  • The filename is present in CSV output
  • The symbol size can be any number of bits up to 32. ent was constrained to 1 or 8.
  • The SCC test can be either wrap-around or not wrap-around.
  • The SCC result can be given a lag value to get a LAG-N correlation coefficient.
  • A list of filenames to analyze can be read from a text file using -i filename.
  • Test condition details (Volts, temp, id etc.) can be parsed from the filename and included in output.
  • MCV Min Entropy is estimated in addition to Shannon Entropy. The symbol and entropy are both reported
  • The longest run and the symbol in the longest run are reported. For 1 bit-per-symbol analysis, a p-value is computed of the probability of a uniform random bit sequence having a longest run length equal to or less than the meaured run length.

``` djent -h Usage: djent [-brRpcCuhds] [-l ] [-i ] [filename] [filename2] ...

Compute statistics of random data. Author: David Johnston, dj@deadhat.com

-i --inputfilelist= Read list of filenames from -p --parsefilename Extract CID, Process, Voltage and Temperature from filename. The values will be included in the output. -l --symbollength= Treat incoming data symbols as bitlength n. Default is 8. -b --binary Treat incoming data as binary. Default bit length will be -l 1 -r --bytereverse Reverse the bit order in incoming bytes -R --wordreverse Reverse the byte order in incoming 4 byte words -c --occurrence Print symbol occurrence counts -C --longest Print symbol longest run counts -w --sccwrap Treat data as cyclical in SCC -n --lagn= Lag gap in SCC. Default=1 -f --fold Fold uppercase letters to lower case -t --terse Terse output -e --entexact Exactly match output format of ent -s --suppress_header Suppress the header in terse output -h or -u --help Print this text

Notes * By default djent is in hex mode where it reads ascii hex data and converts it to binary to analyze. In hex mode, the symbol length defaults to 8, so normal hex files can be treated as a representation of bytes. The symbol length can be changed to any value between 1 and 32 bits using the -l option. * With the -b option djent switches to binary reads in each byte as binary with a symbol length of 1. * To analyze ascii text instead of hex ascii, you need djent to treat each byte as a separate symbol, so use binary mode with a symbol length of 8. I.E. djent -b -l 8 * By default djent treats the MSB of each byte as the first. This can be switched so that djent treats the LSB as the first bit in each byte using the -r option. * Terse output is requested using -t. This outputs in CSV format. The first line is the header. If multiple files are provided, there will be one line of CSV output per file in addition to the header. The CSV header can be suppressed with -s. * To analyze multiple files, just give multiple file names on the command line. To read data in from the command line, don't provide a filename and pipe the data in. | djent * The parse filename option =p picks takes four patterns from the filename to include in the output, This is so that it is easy to plot test conditions that are commonly encoded in a filename. Fields are delimited by uderscores. The four patters for CID, process, Voltage and Temperature are: CID- , PROC-, pV and pC . 'p' is the decimal point. * To compute the statistics, djent builds a frequency table of the symbols. This can be displayed using the -c option. The size of this table is what limits the the maximum symbol size. For each of the 2^n symbols, a 64 bit entry in a table is created. So for n=32, that's 32GBytes so the ability to handle large symbol sizes is limited by the available memory and the per process allocation limit. * The serial correlation coefficient is not wrap around by default, meaning that it does not compare the last value in the data with the first. To get wrap around behaviour, use the -w option. * The Lag-N correlation coefficient can be computed by using the -n option. This causes the SCC computation to compare each Xth symbol with the (X+n)th symbol instead of the (X+1)th symbol. If you use wrap around with Lag-N, then the wrap around will reach n bits further into the start of the sequence. * The byte reverse option -r reverses the order of bits within each byte. The word reverse option -R reverses the order of bytes within each 32 bit word, from 3,2,1,0 to 0,1,2,3. Both -R and -r can be used together. Using -R with a data that isn't a multiple of 32 bits long will get padded with zeros, which may not be what you want. A padding warning will be sent to STDERR. * Instead of providing data file names on the command line, djent can be told to read a list of files from a text file. The file must have one filename per line. Lines beginning with # will be ignored. Use the -i option to request that djent reads the file list from .

Examples Print this help djent -h

Analyze hex file from stdin cat datafile.hex | djent

Analyze binary file djent -b datafile.bin

Analyze several files with CSV output djent -t data1.hex data2.hex data3.hex

Analyze ascii symbols - Read in binary and set symbol size to 8. djent -b -l 8 textfile.txt

Analyze binary file with parsable filename. djent -b -t -p rawdataCID-X23PROC-TTFT1p2V25p0C_.bin ```

Owner

  • Name: David Johnston
  • Login: dj-on-github
  • Kind: user
  • Location: Oregon, USA

Arch Grumpy Coder. Designer of random things. Author of "Random Number Generators, Principles and Practices" DeGruyter Press, ISBN 978-1501515132

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: djent
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: David
    family-names: Johnston
    email: dj@deadhat.com
    orcid: 'https://orcid.org/0009-0002-5149-9414'
repository-code: 'https://github.com/dj-on-github/djent'
abstract: >-
  djent is a reimplementation of the Fourmilab/John Walker
  random number test program ent.


  The improvements are:


  Multiple input file names can be provided at once. This
  works nicely with the CSV format output.

  -h works as well as -u to get the help information.

  The filename is present in CSV output

  The symbol size can be any number of bits up to 32. ent
  was constrained to 1 or 8.

  The SCC test can be either wrap-around or not wrap-around.

  The SCC result can be given a lag value to get a LAG-N
  correlation coefficient.

  A list of filenames to analyze can be read from a text
  file using -i filename.

  Test condition details (Volts, temp, id etc.) can be
  parsed from the filename and included in output.

  MCV Min Entropy is estimated in addition to Shannon
  Entropy. The symbol and entropy are both reported

  The longest run and the symbol in the longest run are
  reported. For 1 bit-per-symbol analysis, a p-value is
  computed of the probability of a uniform random bit
  sequence having a longest run length equal to or less than
  the meaured run length.
license: GPL-2.0

GitHub Events

Total
  • Push event: 2
  • Fork event: 1
Last Year
  • Push event: 2
  • Fork event: 1