https://github.com/wrathematics/float

Single precision (float) matrices for R.

Keywords

float-matrix hpc linear-algebra matrix r

Keywords from Contributors

recommender-system collaborative-filtering factorization-machines matrix-completion matrix-factorization sparse-matrices svd

Last synced: 5 months ago · JSON representation

Repository

Single precision (float) matrices for R.

Basic Info

Host: GitHub
Owner: wrathematics
License: other
Language: Fortran
Default Branch: master
Homepage:
Size: 2.48 MB

Statistics

Stars: 45
Watchers: 6
Forks: 12
Open Issues: 17
Releases: 2

Topics

float-matrix hpc linear-algebra matrix r

Created over 8 years ago · Last pushed 12 months ago

Metadata Files

Readme Changelog License

float

Version: 0.3-1
License: BSD 2-Clause
Project home: https://github.com/wrathematics/float
Bug reports: https://github.com/wrathematics/float/issues

float is a single precision (aka float) matrix framework for R. Base R has no single precision type. Its "numeric" vectors/matrices are double precision (or possibly integer, but you know what I mean). Floats have half the precision of double precision data, for a pretty obvious performance vs accuracy tradeoff.

A matrix of floats should use about half as much memory as a matrix of doubles, and your favorite matrix routines will generally compute about twice as fast on them as well. However, the results will not be as accurate, and are much more prone to roundoff error/mass cancellation issues. Statisticians have a habit of over-hyping the dangers of roundoff error in this author's opinion. If your data is well-conditioned, then using floats is "probably" fine for many applications.

⚠️ WARNING ⚠️ type promotion always defaults to the higher precision. So if a float matrix operates with an integer matrix, the integer matrix will be cast to a float first. Likewise if a float matrix operates with a double matrix, the float will be cast to a double first. Similarly, any float matrix that is explicitly converted to a "regular" matrix will be stored in double precision.

Installation

The package requires the single precision BLAS/LAPACK routines which are not included in the default libRblas and libRlapack shipped from CRAN. If your BLAS/LAPACK libraries do not have what is needed, then they will be built (note that a fortran compiler is required in this case). However, these can take a very long time to compile, and will have much worse performance than optimized libraries. The topic of which BLAS/LAPACK to use and how to use them has been written about many times.

To install the R package, run:

r install.packages("float")

The development version is maintained on GitHub:

r remotes::install_github("wrathematics/float")

Windows

If you are installing on Windows and wish to get the best performance, then you will need to install from source after editing some files. After installing high-performance BLAS and LAPACK libraries, delete the text $(LAPACK_OBJS) from line in src/Makevars.win beginning with OBJECTS =. You will also need to add the appropriate link line. This will ensure that on building, the package links with your high-performance libraries instead of compiling the reference versions. This is especially important for 32-bit Windows where the internal LAPACK and BLAS libraries are built without compiler optimization because of a compiler bug.

Also, if you are using Windows on big endian hardware (I'm not even sure if this is possible), then you will need to change the 0 in src/windows/endianness.h to a 1. Failure to do so will cause very bizarre things to happen with the NA handlers.

Creating, Casting, and Type

Before we get to the main usage of the package and its methods,

To cast TO a float (convert an existing numeric vector/matrix), use as.float() (or its shorthand fl()).
To cast FROM a float, use as.double() or as.integer() (or their shorthands, dbl() and int()).
To pre-allocate a float vector of 0's (like integer(5)), use float().
To construct a float32 object (developes only; see the vignette), use float32().

R has a generic number type "numeric" which encompasses integers and doubles. The function is.numeric() will FALSE for float vectors/matries. Similarly, as.numeric() will return the data cast as double.

Methods

The goal of the package is to recreate the matrix algebra facilities of the base package, but with floats. So we do not include higher statistical methods (like lm() and prcomp()).

Is something missing? Please let me know.

Basic utilities

| Method | Status | |---|---| | [ | done | | c() | done | | cbind() and rbind() | done | | diag() | done | | is.na() | done | | is.float() | done | | min() and max() | done | | na.omit(), na.exclude() | done | | nrow(), ncol(), dim() | done | | object.size() | done | | print() | done | | rep() | done | | scale() | Available for logical center and scale | | str() | done | | sweep() | Available for FUN's "+", "-", "*", and "/". Others impossible(?) | | typeof() and storage.mode() | No storage.mode<- method. | | which.min() and which.max() | done |

Binary Operations

| Method | Status | |---|---| | + | done | | * | done | | - | done | | / | done | | ^ | done | | > | done | | >= | done | | == | done | | < | done | | <= | done |

Casters and Converters

| Method | Status | |---|---| | dbl() | done | | int() | done | | fl() | done | | as.vector() and as.matrix() | done |

Linear algebra

| Method | Status | |---|---| | %*% | done | | backsolve() and forwardsolve() | done | | chol(), chol2inv() | done | | crossprod() and tcrossprod() | done | | eigen() | only for symmetric inputs | | isSymmetric() | done | | La.svd() and svd() | done | | norm() | done | | qr(), qr.Q(), qr.R() | done | | rcond() | done | | solve() | done | | t() | done |

Math functions

| Method | Status | |---|---| | abs(), sqrt() | done | | ceiling(), floor(), trunc(), round() | done | | exp(), exp1m() | done | | gamma(), lgamma() | done | | is.finite(), is.infinite(), is.nan() | done | | log(), log10(), log2() | done | | sin(), cos(), tan(), asin(), acos(), atan() | done | | sinh(), cosh(), tanh(), asinh(), acosh(), atanh() | done |

Misc

| Method | Status | |---|---| | .Machine_float | float analogue of .Machine. everything you'd actually want is there |

Sums and Means

| Method | Status | |---|---| | colMeans() | done | | colSums() | done | | rowMeans() | done | | rowSums() | done | | sum() | done |

Package Use

Memory consumption is roughly half when using floats:

```r library(float)

m = 10000 n = 2500

memuse::howbig(m, n)

190.735 MiB

x = matrix(rnorm(m*n), m, n) object.size(x)

200000200 bytes

s = fl(x) object.size(s)

100000784 bytes

```

And the runtime performance is (generally) roughly 2x better:

```r library(rbenchmark) cols <- cols <- c("test", "replications", "elapsed", "relative") reps <- 5

benchmark(crossprod(x), crossprod(s), replications=reps, columns=cols)

test replications elapsed relative

2 crossprod(s) 5 3.185 1.000

1 crossprod(x) 5 7.163 2.249

```

However, the accuracy is better in the double precision version:

```r cpx = crossprod(x) cps = crossprod(s) all.equal(cpx, dbl(cps))

[1] "Mean relative difference: 3.478718e-07"

```

For this particular example, the difference is fairly small; but for some operations/data, the difference could be significantly larger due to roundoff error.

A Note About Memory Consumption

Because of the use of S4 for the nice syntax, there is some memory overhead which is noticeable for small vectors/matrices. This cost is amortized quickly for reasonably large vectors/matrices. But storing many very small float vectors/matrices can be surprisingly costly.

For example, consider the cost for a single float vector vs a double precision vector:

```r object.size(fl(1))

632 bytes

object.size(double(1))

48 bytes

```

However once we get to 147 elements, the storage is identical:

```r object.size(fl(1:147))

1216 bytes

object.size(double(147))

1216 bytes

```

And for vectors/matrices with many elements, the size of the double precision data is roughly twice that of the float data:

```r object.size(fl(1:10000))

40624 bytes

object.size(double(10000))

80040 bytes

```

The above analysis assumes that your float and double values are conforming to the IEEE-754 standard (which is required to build this package). It specifies that a float requires 4 bytes, and a double requires 8. The size of an int is actually system dependent, but is probably 4 bytes. This means that for most, a float matrix should always be larger than a similarly sized integer matrix, because the overhead for our float matrix is simply larger. However, for objects with many elements, the sizes will be roughly equal:

```r object.size(fl(1:10000))

40624 bytes

object.size(1:10000)

40040 bytes

```

Q&A

Why would I want to do arithmetic in single precision?

It's (generally) twice as fast and uses half the RAM compared to double precision. For a some data analysis tasks, that's more important than having (roughly) twice as many decimal digits.

Why does `floatmat + 1` produce a numeric (double) matrix but `floatmat + 1L` produce a float matrix?

Type promotion always defaults to the highest type available. If you want the arithmetic to be carried out in single precision, cast the 1 with fl(1) first.

Doesn't that make R's type system even more of a mess?

Yes.

How would I create my own methods?

If you can formulate the method in terms of existing functionality from the float package, then you're good. If not, you will likely have to write your own C/C++ code. See the For Developers section of the package vignette.

Owner

Name: Drew Schmidt
Login: wrathematics
Kind: user
Location: Knoxville, Tennessee

Website: https://hpcran.org
Twitter: wrathematics
Repositories: 120
Profile: https://github.com/wrathematics

I like R, C, and HPC.

GitHub Events

Total

Issues event: 2
Issue comment event: 3
Push event: 2

Last Year

Issues event: 2
Issue comment event: 3
Push event: 2

Committers

Last synced: over 2 years ago

All Time

Total Commits: 438
Total Committers: 5
Avg Commits per committer: 87.6
Development Distribution Score (DDS): 0.048

Past Year

Commits: 17
Committers: 1
Avg Commits per committer: 17.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
wrathematics	w**s@g**m	417
wccsnow	w**w@g**m	13
dselivanov	s**y@g**m	6
Cristhian Diaz	a**a@g**m	1
David Cortes	d**a@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 36
Total pull requests: 16
Average time to close issues: 28 days
Average time to close pull requests: 3 months
Total issue authors: 13
Total pull request authors: 7
Average comments per issue: 3.86
Average comments per pull request: 2.44
Merged pull requests: 11
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: 3 days
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

dselivanov (13)
david-cortes (6)
PeteHaitch (3)
cdeterman (3)
barracuda156 (2)
wrathematics (2)
clawish (1)
Kdreval (1)
loveshack (1)
pshashk (1)
drkrynstrng (1)
ahookom (1)
sunracesuraj (1)

Pull Request Authors

snoweye (6)
dselivanov (3)
david-cortes (3)
jeroen (2)
rehbergT (1)
crissthiandi (1)
cdeterman (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- cran 43,203 last-month
Total docker downloads: 45,278

Total dependent packages: 12
(may contain duplicates)
Total dependent repositories: 53
(may contain duplicates)
Total versions: 17
Total maintainers: 1

cran.r-project.org: float

32-Bit Floats

Homepage: https://github.com/wrathematics/float
Documentation: http://cran.r-project.org/web/packages/float/float.pdf
License: BSD 2-clause License + file LICENSE
Latest release: 0.3-3
published 12 months ago

Versions: 12
Dependent Packages: 9
Dependent Repositories: 53
Downloads: 43,203 Last month
Docker Downloads: 45,278

Rankings

Docker downloads count: 2.3%

Dependent repos count: 3.4%

Downloads: 4.4%

Average: 4.9%

Forks count: 5.8%

Dependent packages count: 5.9%

Stargazers count: 7.4%

Maintainers (1)

wrathematics@gmail.com

Last synced: 6 months ago

conda-forge.org: r-float

Homepage: https://github.com/wrathematics/float
License: BSD-2-Clause
Latest release: 0.3_0
published almost 4 years ago

Versions: 5
Dependent Packages: 3
Dependent Repositories: 0

Rankings

Dependent packages count: 15.6%

Average: 32.2%

Dependent repos count: 34.0%

Stargazers count: 38.3%

Forks count: 40.9%

Last synced: 6 months ago

https://github.com/wrathematics/float

Science Score: 26.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

float

Installation

Windows

Creating, Casting, and Type

Methods

Basic utilities

Binary Operations

Casters and Converters

Linear algebra

Math functions

Misc

Sums and Means

Package Use

190.735 MiB

200000200 bytes

100000784 bytes

test replications elapsed relative

2 crossprod(s) 5 3.185 1.000

1 crossprod(x) 5 7.163 2.249

[1] "Mean relative difference: 3.478718e-07"

A Note About Memory Consumption

632 bytes

48 bytes

1216 bytes

1216 bytes

40624 bytes

80040 bytes

40624 bytes

40040 bytes

Q&A

Why would I want to do arithmetic in single precision?

Why does floatmat + 1 produce a numeric (double) matrix but floatmat + 1L produce a float matrix?

Doesn't that make R's type system even more of a mess?

How would I create my own methods?

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: float

Rankings

Maintainers (1)

conda-forge.org: r-float

Rankings

Dependencies

Why does `floatmat + 1` produce a numeric (double) matrix but `floatmat + 1L` produce a float matrix?