What is the difference between arrays and matrices?
Some people are looking for matrix solutions to array problems, so what is the difference? The big difference is that matrix values are numbers, an array can contain other information, even strings. Matrices can represent equations, this is where most developers need them, at least in the case of replacing NumPy.
How do you make a matrix calculation?
The standard matrix operations are simple to make, when adding you just add the elements, when multiplying you can use a scalar to each element and so on.
Multiplication is a little more complex but by very little. What makes it heavy is that you need to do many calculations for each solution, this is where performance comes in. Since the majority of the calculations are not depending on each other, these calculations are excellent candidates for parallel computation. GPUs are designed for these kinds of calculations and they are designed to be added to desktop systems easily.
When you need to do matrix calculations in Python the first solution you find is numPy. However NumPy is not always the most efficient system for calculating many matrices.
This post will cover what options you have in Python.
When you need alternatives, start by looking more carefully what you need matrix operations for. Your current install may already have, either their own implementation, or is using an underlying library. An example is Machine Learning, where the need for matrix operations is paramount. TensorFlow has its own library for matrix operations. Make sure you know your current library.
In many cases though, you need a solution that works for you. Maybe there are limitations in NumPy, some libraries are faster than NumPy and specially made for matrices. Many times, developers want to speed up their code so they start looking for alternatives. One reason is that NumPy cannot run on GPUs.
While this post is about alternatives to NumPy, a library built on top of NumPy, the Theano Library needs to be mentioned. The Theano library is tightly integrated with NumPy and enables GPU supported matrix. Theano is a bigger library for machine learning but you can lift out only the matrix functions.
For a deeper explanation of using Theano, see this page: http://www.marekrei.com/blog/theano-tutorial/
SpPy is a library specifically for sparse arrays, it can still be used for matrices. A sparse array, by the way, is an array that has many zero values in them. This library is small and efficient but a little limited due to its specialisation. It also uses NumPy but is more efficient than just NumPy.
Eigen is an efficient implementation of matrices, to use it in Python you need miniEigen, available at https://pypi.org/pypi/minieigen. Eigen is actually included in many other solutions. It acts as the generic matrix library for more specialized modules and frameworks. This library has many modules for dense matrix and array manipulation. It also supports linear algebra, decomposition and sparse linear algebra. The package also has a plugin function so you can add your own modules.
To use Eigen, install it with pip and import it in your code.
PyTorch is a library for Machine Learning, because of this it has matrix operations. Importing the entire library is overkill if you only want to make a few calculations. However, if you are just starting out with a machine learning project, make sure you decide if this one is for you.
Another alternative is to fetch any C-library and use that. To make this possible, there is a solution named cffi that will create the interface for you. This solution requires that you already know C and that you create a wrapper for each function you need. The code will then look muddled and hard to read but this may be worth it depending on your project.
If you just want to speed up all array and numerical functions you can use numba instead. Numba is a Python compiler. When you use it, the compiler will create binary code ‘just in time’, jit. The idea of jit is more commonly used with Java but is very useful for heavy mathematics in Python. Since Python is interpreted you can get performance issues with heavy mathematics, numba takes care of this by compiling to CPU or GPU at your choice.
There is also parallel computing features available, by default the compiler runs with a lock that stops many threads from running at the same time. You can turn this off by a flag as long as you are aware of the potential problems assiociated with parallel programming.
Many times when you start programming in Python, or other languages, you run into limitations of the language, compiler or something else. When you are in this situation, you should stop and think about what limitation you have and consider how many others may have had the same situation. In the case ot Python and NumPy, many scientists and developers have written code that needs fast execution. This legacy has created a large number of branches that may solve your problem without forcing you to switch language or writing a new extension to this particular language.