I've found that thinking of tensors in terms of graphs make Einsums much more natural.
For example, a matrix product MN, `a b, b c -> a c` is just two nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are connected, so the resulting graph has only two "free" edges `a` and `c`. That's how we know the result is another matrix.
Once you look at tensors this way, a number of things that are normally tricky with standard matrix notation become trivial. Such a higher order derivatives used in neural networks.
This is beautiful! I see we even decided on the same "vectorization tensor" and notation, which makes Kathri-Rao, and all the other "matrix products" much more intuitive!
I use a similar notation, but never quite found a satisfactory notation for elementwise operations (e.g. `-M-a + -b`, especially broadcasted ones which I end up doing as `-A-B- + -c 1-`) or for denoting what derivatives are with respect to. Using differentials gets around some of the latter, but still, I was never quite satisfied. Any chance you've found nice options there?
The broadcasting using "1-" (or the order-1 copy/kronecker tensor) is actually pretty nice, I think. It allows a lot of nice manipulations, and make the low rank of the matrices really clear etc.
Addition does feel a bit hacky, but it may just be necessary for any associative notation. At least it means that rules such as distribution works the same as with classical notation.
I also wrote this tensorgrad library to do symbolic calculations using tensor networks: https://github.com/thomasahle/tensorgrad it keeps track of what you are taking derivatives with respect to. But I agree it would be nice to show more explicitly.
My background is in quantum physics, and it is a useful tool for understanding some aspects of entanglement. Here, the idea was to show its use in machine learning in general and deep learning in particular.
There are some libraries, but the style is so not standardized that I understand your desire to write your own.
Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
> There are some libraries, but the style is so not standardized that I understand your desire to write your own.
Do you have some recommendations? I'm currently using tikz' graphdrawing library because it supports subgraphs. This is necessary for handling addition, functions and derivatives. However, the force-based layouting doesn't work with subgraphs, which causes a lot of trouble.
> Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
I don't have specific recommendations, I mostly use libraries to draw quantum computing circuits, but they're not great for tensor networks even though they share some features. Man, we think alike. Btw, I've seen you work with Patrick, we used to share an office at the IQC (we were both postdocs there), tell him I said hi! :)
I was pretty confused by this for a while. I think the context I was missing is that this is about a function in nympy called ‘einsum’ which is somewhat related to Einstein summation notation.
To write a little more: there are two things people mean by ‘tensor’. One is a kind of geometric object that corresponds to a multilinear map between vector spaces (or modules, I suppose), and another is a array indexed by k-tuples (ie what would be called a multidimensional array in C or Fortran programming). For a given choice basis, one can represent a degree k tensor with a k-dimensional array of scalars.
Only certain operations make sense geometrically on tensors (in the sense that they do not depend on the choice of basis) and these can be broken down into:
- tensor products, which take degree n and m tensors and output a degree (n + m) tensor
- contractions which take a degree n tensor and output a degree (n-2) tensor
- generalized transpositions which take a degree n tensor and output a degree n tensor
A matrix multiplication can be seen as a composition of a tensor product and a contraction; a matrix trace is just a contraction.
The Einstein summation convention is a notation which succinctly expresses these geometric operations by describing what one does with the ‘grid of numbers’ representation, combined with the convention that, if an index is repeated in a term twice (an odd number bigger than 1 is meaningless, an even number is equivalent to reapplying the ‘twice’ rule many times) one should implicitly sum the expression for each basis vector for that index. You get: tensor products by juxtaposition, contractions by repeated indexes, and transpositions by reordering indexes.
In numpy, it is for general computation rather than expressing something geometric so one doesn’t need the restrictions on number of times an index occurs. Instead I guess the rule is something like:
- if index is only on lhs, sum over it
- if index on lhs and rhs then don’t sum
- if index only on rhs or repeated on rhs, error
And computationally I guess it’s something like (1) figure out output shape and (2):
for output_index of output:
p = 1
for (input, input_indexes) of (inputs, lhs):
p = p * input[input_indexes(output_index)]
output[output_index] = p
And one of the interesting (and hard) computer science problems is how to turn an arbitrary einsum expression into the most efficient series of operations (transpose, matmul, etc.) available in hardware kernels. This is mentioned in the post under "How Contractions Are Actually Computed".
> an even number is equivalent to reapplying the ‘twice’ rule many times
I never heard that extension and in my opinion is a bad idea. One nice property of the Einstein summation notation is that you can reorder the tensors(with their index), and the result does no change. If you allow 4x repetitions this is not valid).
---
Also, a nice property of tensors written in paper is that you can write the index as subscripsts or superscripts to remember how they change when there is a base change. You can only colapse a subscripst with a superscript, so the result of the operation does not depend on the choice of the base. (It would be very nice that Python can remember and check this too to avoid silly error.)
Academic CS, moreso than any of the other STEM subjects, is like that whose line is it anyway meme - all the words are completely made up and absolutely none of the results matter.
So a vector isn't a vector, a tensor isn't a tensor, einsum isn't actually Einstein summation, automatic differentiation often gives derivatives for things that aren't differentiable, a neural network has nothing to do with neurons, etc etc etc
CS people think it's a crime that humanities people make up words in order to publish but they're really living in an extremely glassy glass house.
Yes, I think it comes from the kind of metaphorical language that writing programs induces. If artists "kinda lie" when describing how their art represents some emotion or reality, likewise, computer scientists are also engaged in this 'metaphorical representation' game -- unlike the science, which aim to actually represent, not metaphorically.
This also drives me mad reading many a computer science paper now -- the disrespect for the ordinary meaning of words is only so obscene in the most radical kinds of poetry. This feels wholly out of place in anything called a 'science', and leads to endless amounts of confusion.
There are none in all of academia so brazen in a poetic use of language and at the same time so incurious as to its meaning.
I wish Python had tools to support combining tensors a la einsum but with arbitrary operations instead of just multiplication and addition. The only tool I'm aware of that provides a very slickk interface for this is Tullio in Julia.
Among other things, that would make a lot of codde very convenient -- including graph algorithms. This is the idea behind GraphBLAS https://graphblas.org/
Really interesting, I have been confused about the einsum function before. As a former physicist, I would also like to see the actual tensor notation for the examples. So instead of ij,jk something like $A_i^j B_j^k$ (imagine the math here instead of the LaTeX).
Tensors used in deep learning are not the same as the definition used by Physicists - blame the DL community for this :). So DL tensors are just N-dimensional arrays of data, and there is no concept of covariance and contravariance of the dimensions. You could think of DL tensors as Cartesian tensors and they don't need to conform to the same transformation laws that Physics tensors do.
Einsum immediately clicked with me because in my past advanced classical mechanics courses such concise contractions of multi-index creatures were really the only way to make quick sense of complicated problems without the limitations of low-dimensional array notations. Physics tensors are of course different creatures than the simpler multidimensional arrays of PyTorch and co., but the simplified einsum notation still works very well. I ended up sometimes rewriting my Einsum code to plain tensor manipulation code in order to better work with collaborators who didnt like einsum, but I still experiment in my own hacks with einsum when I need to. Occasionally I felt that it would have been nice to also have the Levi-Civita symbol available in einsum, or to be able to use complex numbers and take complex conjugates, but these are all super-specialized requests and there often is a good way around them without modifying einsum.
This is a good idea, though one problem is that Einsum notation (as realized in Numpy and Pytorch) doesn't support the notion of co-contravariance, and the site is based on their Einsum notation. I could potentially add the variances for the examples, though that would move away from how the site currently works (where the information about the reduction comes only from the einsum input).
God I love einsum so much. I discovered it in grad school and it made my life so much nicer, definitely recommend learning it, you can do some wicked powerful stuff. (One of the first things I asked one of the NVIDIA guys when they were showing off cuPy a few years back was whether it had einsum.)
That's something I wish I had when I started looking at einsums. It gets interesting when you start thinking about optimum paths (like in the opt_einsum package), sharded/distributed einsums, and ML accelerators.
I've found that thinking of tensors in terms of graphs make Einsums much more natural.
For example, a matrix product MN, `a b, b c -> a c` is just two nodes with two edges each: `-a- M -b- N -c-`. Their `b` edges are connected, so the resulting graph has only two "free" edges `a` and `c`. That's how we know the result is another matrix.
Once you look at tensors this way, a number of things that are normally tricky with standard matrix notation become trivial. Such a higher order derivatives used in neural networks.
I wrote https://tensorcookbook.com/ to give a simple reference for all of this.
A few years ago I wrote a note (never published) on how many products can be seen in this way https://arxiv.org/abs/1903.01366
This is beautiful! I see we even decided on the same "vectorization tensor" and notation, which makes Kathri-Rao, and all the other "matrix products" much more intuitive!
I use a similar notation, but never quite found a satisfactory notation for elementwise operations (e.g. `-M-a + -b`, especially broadcasted ones which I end up doing as `-A-B- + -c 1-`) or for denoting what derivatives are with respect to. Using differentials gets around some of the latter, but still, I was never quite satisfied. Any chance you've found nice options there?
The broadcasting using "1-" (or the order-1 copy/kronecker tensor) is actually pretty nice, I think. It allows a lot of nice manipulations, and make the low rank of the matrices really clear etc.
Addition does feel a bit hacky, but it may just be necessary for any associative notation. At least it means that rules such as distribution works the same as with classical notation.
I also wrote this tensorgrad library to do symbolic calculations using tensor networks: https://github.com/thomasahle/tensorgrad it keeps track of what you are taking derivatives with respect to. But I agree it would be nice to show more explicitly.
I had been working on a library for visualizing tensor contractions (so-called Penrose tensor diagrams), https://github.com/Quantum-Flytrap/tensor-diagrams, with an example (a tutorial stub) https://p.migdal.pl/art-tensor-diagrams/.
My background is in quantum physics, and it is a useful tool for understanding some aspects of entanglement. Here, the idea was to show its use in machine learning in general and deep learning in particular.
Awesome! Are you able to handle sums of diagrams like (-M- + -a b-)-c etc?
Thanks! Right now, it does not.
I focus on a common case of a single product without any other parts. If there is a summation, it is managed via indices.
If you are interested in arbitrary drawings, you are probably aware of TikZ (if you like coding) or Excalidraw.
I just opened your book, very nice! I really like the derivatives. You went above and beyond with latex diagrams ^^
At this point I feel like I need to write a tikz library just for tensor diagrams...
There are some libraries, but the style is so not standardized that I understand your desire to write your own.
Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
> There are some libraries, but the style is so not standardized that I understand your desire to write your own.
Do you have some recommendations? I'm currently using tikz' graphdrawing library because it supports subgraphs. This is necessary for handling addition, functions and derivatives. However, the force-based layouting doesn't work with subgraphs, which causes a lot of trouble.
> Back in 2020 I wanted to write a library based on manim to animate the contractions, but I never came around to it (and manim back then was less well-documented than it is now).
Manim is on the TODO list :-) https://github.com/thomasahle/tensorgrad/issues/7 If you feel like writing some code, I think it could work very well with tensorgrad's step-by-step derivations.
I don't have specific recommendations, I mostly use libraries to draw quantum computing circuits, but they're not great for tensor networks even though they share some features. Man, we think alike. Btw, I've seen you work with Patrick, we used to share an office at the IQC (we were both postdocs there), tell him I said hi! :)
I was pretty confused by this for a while. I think the context I was missing is that this is about a function in nympy called ‘einsum’ which is somewhat related to Einstein summation notation.
To write a little more: there are two things people mean by ‘tensor’. One is a kind of geometric object that corresponds to a multilinear map between vector spaces (or modules, I suppose), and another is a array indexed by k-tuples (ie what would be called a multidimensional array in C or Fortran programming). For a given choice basis, one can represent a degree k tensor with a k-dimensional array of scalars.
Only certain operations make sense geometrically on tensors (in the sense that they do not depend on the choice of basis) and these can be broken down into:
- tensor products, which take degree n and m tensors and output a degree (n + m) tensor
- contractions which take a degree n tensor and output a degree (n-2) tensor
- generalized transpositions which take a degree n tensor and output a degree n tensor
A matrix multiplication can be seen as a composition of a tensor product and a contraction; a matrix trace is just a contraction.
The Einstein summation convention is a notation which succinctly expresses these geometric operations by describing what one does with the ‘grid of numbers’ representation, combined with the convention that, if an index is repeated in a term twice (an odd number bigger than 1 is meaningless, an even number is equivalent to reapplying the ‘twice’ rule many times) one should implicitly sum the expression for each basis vector for that index. You get: tensor products by juxtaposition, contractions by repeated indexes, and transpositions by reordering indexes.
In numpy, it is for general computation rather than expressing something geometric so one doesn’t need the restrictions on number of times an index occurs. Instead I guess the rule is something like:
- if index is only on lhs, sum over it
- if index on lhs and rhs then don’t sum
- if index only on rhs or repeated on rhs, error
And computationally I guess it’s something like (1) figure out output shape and (2):
And one of the interesting (and hard) computer science problems is how to turn an arbitrary einsum expression into the most efficient series of operations (transpose, matmul, etc.) available in hardware kernels. This is mentioned in the post under "How Contractions Are Actually Computed".
I mostly agree.
> an even number is equivalent to reapplying the ‘twice’ rule many times
I never heard that extension and in my opinion is a bad idea. One nice property of the Einstein summation notation is that you can reorder the tensors(with their index), and the result does no change. If you allow 4x repetitions this is not valid).
---
Also, a nice property of tensors written in paper is that you can write the index as subscripsts or superscripts to remember how they change when there is a base change. You can only colapse a subscripst with a superscript, so the result of the operation does not depend on the choice of the base. (It would be very nice that Python can remember and check this too to avoid silly error.)
Ah I think I’m just wrong and the 4x thing shouldn’t be allowed
Academic CS, moreso than any of the other STEM subjects, is like that whose line is it anyway meme - all the words are completely made up and absolutely none of the results matter.
So a vector isn't a vector, a tensor isn't a tensor, einsum isn't actually Einstein summation, automatic differentiation often gives derivatives for things that aren't differentiable, a neural network has nothing to do with neurons, etc etc etc
CS people think it's a crime that humanities people make up words in order to publish but they're really living in an extremely glassy glass house.
Yes, I think it comes from the kind of metaphorical language that writing programs induces. If artists "kinda lie" when describing how their art represents some emotion or reality, likewise, computer scientists are also engaged in this 'metaphorical representation' game -- unlike the science, which aim to actually represent, not metaphorically.
This also drives me mad reading many a computer science paper now -- the disrespect for the ordinary meaning of words is only so obscene in the most radical kinds of poetry. This feels wholly out of place in anything called a 'science', and leads to endless amounts of confusion.
There are none in all of academia so brazen in a poetic use of language and at the same time so incurious as to its meaning.
Vec4 is a vector, but yes C++ base vec is not.
In some ways CS is more correct with the modern (abstract) algebra definition of a vector, vs the historical didactics arrow concept.
Tensor has been abused, it is a property of bilinear maps, and as matrix multiplication is a bilinear operation there is a concept overlap there.
Vectors are really just elements of a vector space, not necessarily arrows.
I wish Python had tools to support combining tensors a la einsum but with arbitrary operations instead of just multiplication and addition. The only tool I'm aware of that provides a very slickk interface for this is Tullio in Julia.
Among other things, that would make a lot of codde very convenient -- including graph algorithms. This is the idea behind GraphBLAS https://graphblas.org/
For anyone who does a lot of einsum I highly recommend the einx package because the stock single-letter syntax very quickly becomes hard to read.
In your experience, why einx over einops+einsum?
Really interesting, I have been confused about the einsum function before. As a former physicist, I would also like to see the actual tensor notation for the examples. So instead of ij,jk something like $A_i^j B_j^k$ (imagine the math here instead of the LaTeX).
Tensors used in deep learning are not the same as the definition used by Physicists - blame the DL community for this :). So DL tensors are just N-dimensional arrays of data, and there is no concept of covariance and contravariance of the dimensions. You could think of DL tensors as Cartesian tensors and they don't need to conform to the same transformation laws that Physics tensors do.
Einsum immediately clicked with me because in my past advanced classical mechanics courses such concise contractions of multi-index creatures were really the only way to make quick sense of complicated problems without the limitations of low-dimensional array notations. Physics tensors are of course different creatures than the simpler multidimensional arrays of PyTorch and co., but the simplified einsum notation still works very well. I ended up sometimes rewriting my Einsum code to plain tensor manipulation code in order to better work with collaborators who didnt like einsum, but I still experiment in my own hacks with einsum when I need to. Occasionally I felt that it would have been nice to also have the Levi-Civita symbol available in einsum, or to be able to use complex numbers and take complex conjugates, but these are all super-specialized requests and there often is a good way around them without modifying einsum.
This is a good idea, though one problem is that Einsum notation (as realized in Numpy and Pytorch) doesn't support the notion of co-contravariance, and the site is based on their Einsum notation. I could potentially add the variances for the examples, though that would move away from how the site currently works (where the information about the reduction comes only from the einsum input).
God I love einsum so much. I discovered it in grad school and it made my life so much nicer, definitely recommend learning it, you can do some wicked powerful stuff. (One of the first things I asked one of the NVIDIA guys when they were showing off cuPy a few years back was whether it had einsum.)
That's something I wish I had when I started looking at einsums. It gets interesting when you start thinking about optimum paths (like in the opt_einsum package), sharded/distributed einsums, and ML accelerators.
I found this video to be fantastic
https://m.youtube.com/watch?v=pkVwUVEHmfI