-
Notifications
You must be signed in to change notification settings - Fork 692
Universal Functions
An UFunc (Universal Function) is conceptually a function that can operate on scalars, vectors, matrices, counters and, with a little effort, your own collection types. It's much like Numpy's Universal Functions.
UFuncs makes Breeze very extensible. With little effort, you can add a new collection type (e.g. Vector) and reuse all builtin UFuncs. Or if you want to add a new numeric type that supports operations like log
, you can define UFunc implementations for those operations and then use UFuncs on collections of your data type.
All code snippets at this page assume that you have imported necessary packages:
import breeze.linalg._
import breeze.math._
import breeze.numerics._
There are three UFunc types: element-wise UFuncs (like log
), operators UFuncs (e.g. a+b
) and reduction UFuncs (like sum
).
The most common UFuncs, usually inheriting from MappingUFunc
. They usually take one scalar or collection and apply the underlying function to the scalar or elementwise to all elements in the collection:
scala> log(1.0)
Double = 0.0
scala> log(Complex(1.0, 1.0))
breeze.math.Complex = 0.3465735902799727 + 0.7853981633974483i
scala> log(Array(1.0, 2.0))
Array[Double] = Array(0.0, 0.6931471805599453)
scala> log(DenseMatrix((1.0, 2.0), (3.0, 4.0)))
breeze.linalg.DenseMatrix[Double] =
0.0 0.6931471805599453
1.0986122886681098 1.3862943611198906
By default, these UFunc return result as a new object. If you want the UFunc to alter the argument, you can do it:
scala> log.inPlace(someMatrix)
These are UFuncs that are usually not used directly. For example, OpAdd
is an operator UFunc, but it's usually used as a + b
, rather than OpAdd(a, b)
.
scala> val a = DenseVector(1.0, 2.0)
scala> val b = DenseVector(3.0, 4.0)
scala> a + b // binary operator
breeze.linalg.DenseVector[Double] = DenseVector(4.0, 6.0)
scala> operators.OpAdd(a, b)
breeze.linalg.DenseVector[Double] = DenseVector(4.0, 6.0)
scala> -a // unary operator
breeze.linalg.DenseVector[Double] = DenseVector(-1.0, -2.0)
scala> operators.OpNeg(a)
breeze.linalg.DenseVector[Double] = DenseVector(-1.0, -2.0)
These are UFuncs that conceptually "accumulate" collections into a smaller result. For instance, the sum
UFunc predictably sums all elements together. Reduction UFuncs also work with broadcasting.
scala> val mat = DenseMatrix((1.0, 2.0, 3.0),
(4.0, 5.0, 6.0))
scala> sum(mat)
Double = 21.0
scala> sum(mat(::, *)) // sum of each column
breeze.linalg.DenseMatrix[Double] = 5.0 7.0 9.0
scala> mat(*, ::) + DenseVector(10.0, 20.0, 30.0)
breeze.linalg.DenseMatrix[Double] =
11.0 22.0 33.0
14.0 25.0 36.0
scala> val onesMat = DenseMatrix.ones[Double](3, 3)
breeze.linalg.DenseMatrix[Double] =
1.0 1.0 1.0
1.0 1.0 1.0
1.0 1.0 1.0
scala> mat(::, *) := DenseVector(10.0, 20.0)
scala> mat
breeze.linalg.DenseMatrix[Double] =
10.0 10.0 10.0
20.0 20.0 20.0
All of these live in the breeze.numerics
package object.
-
Functions from
scala.math
:-
exp
: Returnse^x
. -
log
: Returns the natural logarithm of x.log(b, x)
is the log of x in base b! -
log1p
: Returnslog(1 + x)
. This is more accurate than explicitlog(1 + x)
for x < 1. -
sqrt
: Square root. -
sin
,cos
,tan
,asin
,acos
,atan
: Trigonometric functions using radians. -
sinh
,cosh
,tanh
,asinh
,acosh
,atanh
: Hyperbolic functions using radians. -
toDegrees
: Converts radians to degrees. -
toRadians
: Converts degrees to radians. -
floor
: Rounds numbers to the next lowest integer. -
ceil
: Rounds numbers to the next highest integer. -
round
: Rounds numbers. -
rint
: Rounds numbers and returns a Double. -
signum
: Returns the sign of the value(s) as ±1, ±0 or NaN. -
abs
: Returns the absolute value of the argument. -
isOdd
: Returns true if the argument is equal to 1 modulo 2. -
isEven
: Returns true if the number is divisible by two.
-
-
Gamma function:
-
lgamma
: The natural logarithm of the Gamma function, which is the natural generalization of factorial to all real numbers. -
lgamma(a: Double, x: Double)
: returns the log of the Incomplete Gamma Function. -
digamma
: the first derivative oflgamma
.exp(digamma(x)) ≈ x - .5
forx > 10
. -
trigamma
: the second derivative oflgamma
-
gammp(a: Double, x: Double)
: returns the lower regularized incomplete gamma function
-
-
Error function and variants, commonly used as the CDF of the Gaussian distribution.
-
erf
: the error function. -
erfi
: The imaginary error function with real arguments.erfi(x) = -i erf(i * x)
. -
erfc
: Equals1 - erf
. -
erfinv
: Returns the inverseerf
. -
erfcinv
: Returns the inverseerfc
.
-
-
Bessel function:
-
Bessel.i0
: the modified Bessel function of the first kind with parameter α = 0. -
Bessel.i1
: the modified Bessel function of the first kind with parameter α = 1.
-
-
Special and utility functions:
-
sigmoid
: the sigmoid function from logistic regression/neural nets.1/(1+exp(-x))
-
I
: the indicator function (1.0 if true, 0.0 if false) -
logI
: the log of the indicator functionlogI(true) == 0.0
,logI(false) == -inf
-
lbeta
: The log of the Beta function -
clip(a, lower, upper)
: returns an array such that all elements are "clipped" at the range (lower, upper)
-
These are UFuncs that are usually called using infix syntax (in the case of a binary operators) or prefix syntax if the case is an unary operator. Some of these functions support in-place version, i.e. you can write a :+= b
instead of a := a + b
.
A := B
is a special operator named OpSet
. It alters A
in such way that it's the same as B
. Note, that this is not an usual assignment (=
) so A
still points to the same object. Only the content of A
is mutually altered.
:
at the beginning of an operator means that the operation is elementwise. For example, :*
is elementwise multiplication, like Matlab's .*
.
Not all operators are defined for all Tensors, or for all value types. (Some reasonable things are missing. We are adding them as necessary.) Undefined operators is a compile time, not runtime, error.
All operators are defined in breeze.linalg.NumericOps
trait and in breeze.linalg.operators
package.
Name | Example | In-place usage |
---|---|---|
OpAdd |
a + b , a :+ b
|
a += b , a :+= b
|
OpSub |
a - b , a :- b
|
a -= b , a :-= b
|
OpMulMatrix |
a * b |
a *= b |
OpMulScalar |
a :* b |
a :*= b |
OpDiv |
a / b , a :/ b
|
a /= b , a :/= b
|
OpSolveMatrixBy |
a \ b |
- |
OpMod |
a % b , a :% b
|
a %= b , a :%= b
|
OpPow |
a :^ b |
a :^= b |
OpMulInner |
a dot b |
- |
OpNeg |
-a |
- |
OpEq |
OpNe |
OpLT |
OpLTE |
OpGT |
OpGTE |
isClose |
---|---|---|---|---|---|---|
a :== b |
a :!= b |
a :< b |
a :<= b |
a :> b |
a :>= b |
isClose(a, b) |
name | example | in-place usage |
---|---|---|
OpAnd |
a & b , a :& b
|
a &= b , a :&= b
|
OpOr |
a | b , a :| b
|
a |= b , a :|= b
|
OpXor |
a ^^ b , a :^^ b
|
a ^^= b , a :^^= b
|
OpNot |
!a |
-
Basic UFuncs:
-
sum
: Returnsa(1) + ... + a(n)
. -
product
: Returnsa(1) * ... * a(n)
. -
softmax
: Return the softmax of the collection.log(exp(a(1)) + ... exp(a(n)))
. Also known as logsumexp when used to sum up small probabilities in the log domain -
any
: Returns true if any element is non-zero. -
all
: Returns true if all elements are non-zero. -
min
,max
: Fast implementation for performance critical loops wherearr.max
/arr.min
is too slow. -
argmin
,argmax
: Returns the index (key) of the minimum/maximum element. -
norm
: Computes the norm of an object. -
normalize
: Normalizes the argument such that its norm is 1.0. Returns value if value's norm is 0. -
argsort
: Returns a sequence of indexes (keys) sorted by value. -
argtopk
: Returns the top k indices with maximum value.
-
-
Linear algebra UFuncs:
-
det
: Returns the determinant of the given real matrix. -
eig
: Returns triple: vector of real and imaginary parts of the eigenvalues as well as matrix of the corresponding eigenvalues. -
inv
: Returns the inverse of a given real matrix. RaisesMatrixSingularException
when the matrix is not invertible. -
logdet
: Computes the log of the determinant of the given real matrix and returns a pair consisting of determinant sign (1.0 or -1.0) and the computed logarithm. -
LU
: Computes the LU factorization of the given real M-by-N matrixX
such thatX = P * L * U
whereP
is a permutation matrix (row exchanges). Returns a tuple consisting of a matrixA
and an integer arrayP
. The upper triangular portion ofA
resemblesU
whereas the lower one resemblesL
. The diagonal elements are all1
. For0 <= i < M
, each elementP(i)
denotes whether rowi
of the matrixX
was exchanged with rowP(i-1)
during computation -
svd
: Computes singular value decomposition of given M-by-N matrix and returns an M-by-M matrixU
, a vector of singular values, and a N-by-N matrixV'
. -
trace
: Returns sum of diagonal elements.
-
-
Statistical UFuncs (you have to import
breeze.stats._
!):-
mean
: Returns arithmetic mean (sum divided by number of elements) -
variance
: Returns variance -
stddev
: Returns standard deviation, i.e. square root of variance -
meanAndVariance
: Returns an object withmean
,variance
andcount
fields. The last field is the number of elements.
-
Let's start with the canonical way to make a UFunc. I'm going to use log
as an example, for now
(In Breeze, log
is actually implemented as a MappingUFunc - see below).
import breeze.generic.UFunc
object log extends UFunc
The basic idea behind a UFunc is that it's a marker for others to provide implementation via implicit parameters. Almost no functionality needs to be defined inside the object itself. UFuncs define several methods, most of which are named apply
and those differ just in how many arguments they take. There's also inPlace
, which I'll describe later. The simplest apply method takes 1 argument and an implicit implementation argument, and forwards the argument to the implicit.
trait UFunc {
// ...
def apply[Arg1, Result](arg1: Arg1)(implicit impl: Impl[Arg1, Result]):Result = impl(arg1)
def apply[Arg1, Arg2, Result](arg1: Arg1, arg2: Arg2)(implicit impl: Impl2[Arg1, Arg2, Result]):Result = {
impl(arg1, arg2)
}
}
The Impl type looks a little scary, but it's actually not too bad. It's an alias for another trait, named UFunc.UImpl
, defined inside the companion object to UFunc.
trait UFunc {
type Impl[Arg1, Result] = UFunc.UImpl[this.type, Arg1, Result]
// ...
}
object UFunc {
trait UImpl[Tag, Arg1, Result] {
def apply(arg1: Arg1):Result
}
}
There are several different implementation traits, one for each arity (currently up to 3). To define an implementation, you just need to put an implicit in scope. Anywhere that Scala will look for an implicit is fine (companion objects, etc.). One particularly convenient place to define it is inside the UFunc itself:
object log extends UFunc {
implicit object implDouble extends Impl[Double, Double] {
def apply(a: Double) = scala.math.log(a)
}
}
Now, we can use it:
log(1.0) // == 0.0
One thing that's nice about this approach is that if we get a new type, we can just define a new Impl for it. For instance, in Complex's companion object, we define an implementation for the log of a Complex number.
object Complex {
// ...
implicit object implComplex extends log.Impl[Complex, Complex] {
def apply(a: Complex) = // ...
}
}
Now, log(Complex(1.0, 1.0))
works as well.
We can use implicit functions to automatically add operations for collections of things. There is a special class of UFuncs called "MappingUFuncs" that work elementwise over their input (e.g. sin, cos), rather than doing some kind of reduction or aggregation (e.g. sum, mean).
The actual definition of log
is a member of this class so that it can also be applied to vectors. MappingUFuncs automatically allow implementations for all Vector types, Arrays, Matrices, and Counters if the underlying value type has an implementation.
object log extends UFunc with MappingUFunc {
implicit object implDouble extends Impl[Double, Double] {
def apply(a: Double) = scala.math.log(a)
}
If you would like to add a new collection type, you will need to add a CanMapValues
implicit to extend the UFunc to this collection.
There's also InPlaceImpl[Arg]
and InPlaceImpl2[Arg1AndSink, Arg2]
for mutating operations and they work analogously.
The next step is thinking more about operators. For instance, to enable addition between two types A
and B
, you define an implicit of type OpAdd.Impl2
:
implicit object addAandB extends OpAdd.Impl2[A, B, Result] {
def apply(a: A, b: B) = ...
}
Now, if A
extends NumericOps[A]
---or if A
has a conversion to NumericOps[A]
---then if we say A + B
, this implicit will be used to do the addition. To do mutating operators, you create a OpAdd.InPlaceImpl2[A, B]
implicit instead:
implicit object addAandB extends OpAdd.InPlaceImpl2[A, B] {
def apply(a: A, b: B) { ... /* alter a */ }
}
Note that if you want to support unary operators, you need to implement Impl1
instead of Impl2
.
Creating reducing UFuncs like sum
or mean
is a little trickier than creating MappingUFunc
s. In particular, you should write an implementation that uses CanTraverseValues
, which is a trait for collections that can be traversed or iterated over. It's basically like a marker that supports foreach, but it has extra methods that allow some implementations to be faster than they would be otherwise.
trait CanTraverseValues[From, A] {
/**Traverses all values from the given collection. */
def traverse(from: From, fn: ValuesVisitor[A]): Unit
def isTraversableAgain(from: From):Boolean
}
The traverse
method takes a CanTraverseValues.ValuesVisitor[A]
, which is like a Function1[A, Unit]
, but has a few extra (optional) methods:
trait ValuesVisitor[@specialized A] {
def visit(a: A)
def visitArray(arr: Array[A]):Unit = visitArray(arr, 0, arr.length, 1)
def visitArray(arr: Array[A], offset: Int, length: Int, stride: Int):Unit = {
var i = 0
while(i < length) {
visit(arr(i * stride + offset))
i += 1
}
}
// visit numZero values that all have value zeroValue
def zeros(numZero: Int, zeroValue: A)
}
All you have to define is visit
and zero
. However, if you want your UFunc to be as fast as possible, you can override visitArray
.
OK, let's say we're implementing sum
. That might look like this:
object sum extends UFunc {
implicit def sumFromTraverseDoubles[T](implicit traverse: CanTraverseValues[T, Double]): Impl[T, Double] = {
new Impl[T, Double] {
def apply(t: T): Double = {
var sum = 0.0
traverse.traverse(t, new ValuesVisitor[Double] {
def visit(a: Double) = {sum += a}
def zeros(count: Int, zeroValue: A) {}
})
sum
}
}
}
}
The actual sum
uses the Expand Macro to define sum for all primitives, and it overrides visitArray
for efficiency.
Finally, there's how to use UFunc Impls to create generic operations. This is the whole reason for why UFuncs work the way they do. Basically, we want to be able to generically define methods in terms of their pieces. So, for instance, it would be nice to be able to be able to define a method like normalize
in terms of norm
(and also division). UFuncs let us do this:
object normalize extends UFunc {
implicit def fromNormAndDivide[Arg](implicit normImpl: norm.Impl[Arg, Double], divImpl: OpDiv.Impl[Arg, Double, Arg]):Impl[Arg, Arg] = new Impl[Arg, Arg] {
def apply(a: Arg) = {
val normalizer = norm(a)
divImpl(a, normalizer) // we could instead require that Arg extend NumericOps and use /, but eh.
}
}
With those few lines of code, we can now provide implementations of normalize
for everything that can provide us with norm
and division by Doubles. Pretty slick, eh?
If you want a new collection type to work with UFuncs, you need to provide a few implicits:
-
CanMapValues[From, FromElement, ToElement, To]
(for mapping UFuncs) -
CanZipMapValues[From, FromType, ToType, ToType]
(for supporting binary UFuncs) -
CanTransformValues[From, FromElement, FromElement]
(for inPlace, if your collection is mutable) -
CanTraverseValues[From, FromElement]
(forsum
and its kin.)
Matrix-y collections will also want to implement:
CanCollapseAxis[From, Axis._0.type, ColumnType]
CanCollapseAxis[From, Axis._1.type, RowType]
These provide broadcasting and all their related goodness.
TODO Abstracting over UFuncs
Breeze is a numerical processing library for Scala. http://www.scalanlp.org