Is Matrix Multiplication Ugly?

mathenchant.wordpress.com

156 points by jamespropp 2 days ago

sfpotter 2 days ago

I think this sentence:

> But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board.

is the most interesting one.

A man hammering a nail into a board can be both beautiful and elegant! If you've ever seen someone effortlessly hammer nail after nail into wood without having to think hardly at all about what they're doing, you've seen a master craftsman at work. Speaking as a numerical analyst, I'd say a well multiplied matrix is much the same. There is much that goes into how deftly a matrix might be multiplied. And just as someone can hammer a nail poorly, so too can a matrix be multiplied poorly. I would say the matrices being multiplied in service of training LLMs are not a particularly beautiful example of what matrix multiplication has to offer. The fast Fourier transform viewed as a sparse matrix factorization of the DFT and its concomitant properties of numerical stability might be a better candidate.

sdenton4 2 days ago

A somewhat more beautiful matmul for neural networks is given by the Monarch paper: https://arxiv.org/abs/2204.00595
Generally, low-rank and block-diagonal matrices are both great strategies for producing expressive matmuls with fewer parameters. We can view the FFT as a particularly deft example of factorizing one big matmul into a number of block-diagonal matmuls, greatly reducing the overall number of multiplications by minimizing the block size. However, on a G/TPU, we have a lot more parallelism available, so the sweet spot for size of the blocks may be larger than 2x2...
We can also mix low-rank, block diagonal, and residual connections to get the best of both worlds:
x' = (L@x + B@x + x)
The block-diagonal matrix does 'local' work, and the low-rank matrix does 'broadcast' work. I find it pretty typical to be able to replace a single dense matmul with this kind of structure and save ~90% of the params with no quality cost... (and sometimes the regularization actually helps!)
- btown 2 days ago
  
  https://hazyresearch.stanford.edu/blog/2023-12-11-truly-subq... is an approachable overview of the Monarch approach, for those interested!
  There's a lot of opportunity here. Just because matrix multiplication makes for a beautiful mathematical building block, and a very reasonable one to build high-level ML logic on, doesn't mean it needs to be computed the same way, and in the same order, that we learned in linear algebra courses.
  I'm quite curious if this is being used in practice at scale, or whether it's still in the lab at the moment!
  - robot-wrangler 2 days ago
    
    > doesn't mean it needs to be computed the same way, and in the same order, that we learned in linear algebra courses.
    I think this touches on something fundamental. As a stand-alone operation matmul is ugly because it's arbitrary. In other words.. if the goal was just to entangle values, there's a bunch of ways to do it, so why this particular way landing on ae+bg etc? You kind of need algebra/geometry to justify matmul this way, which makes it obviously useful, but now it's still ugly, exactly because you had to invoke this other stuff.
    Compare that situation to algebra and geometry themselves, which in a real sense don't need each other. Or to things like logic, sets, categories, processes, numbers, knots, games, etc where you can build up piles of stuff based on it in a whole rich universe before you need to appeal to much that is "outside". And in those universes operations would be defined mostly in ways that were more like "natural" or "necessary" without anything feeling arbitrary.
    Traditional matmul is beautiful in the sense of "connections across and between", where all the particulars do become necessary. For those that prefer a certain amount of abstract perfection / platonism / etc or those with a taste for foundations though, it's understandable if it's not that appealing. This is related to, but not the same as the pure vs applied split.
- bee_rider 2 days ago
  
  Do low rank/block diagonal matrices come up in LLMs often? What about banded or block tridiagonal? Intuitively banded matrices seem like they ought to be good at encoding things about the world… everything is connected but not randomly so.
  - sdenton4 a day ago
    
    Yep! Think of LORA for network fine tuning. Monarch (linked above) uses lots of block diagonality. These ideas also make flash attention flash.
    I haven't seen banded matrices as much, though (with weight sharing) they're just convolutions. One nice feature of block diagonality is that you can express it as batched matrix multiplication, reusing all the existing matmul kernels.
hobs 2 days ago

https://www.youtube.com/watch?v=Ruf-cLr2PZ8 I always think of this when thinking about the gracefulness of a hammer.
- nialse 2 days ago
  
  Turns out it’s the skill of the person handling the hammer that matters most. Enlightening! Appreciate the link!
- mauvehaus 2 days ago
  
  Upholstery tacks are sold sterilized for people holding them in their mouths. Now I wonder the same about drywall nails!
  Thanks for the link; that is absolutely masterful work.
- JanNash 2 days ago
  
  Wow, thank you for this gem!!!
jamespropp 2 days ago

Yes!
- gsf_emergency_6 2 days ago
  
  >The fast Fourier transform viewed as a sparse matrix factorization of the DFT
  Riffing further on the Fourier connection: are you planning to explore the link between matmul and differentiation?
  Using the "Pauli-Z" matrix that you introduced without a straightforward motivation, eg.
  (I took it that you intended it to be a "backyard instance" of "dual numbers")

musebox35 2 days ago

The computations in transformers are actually generalized tensor tensor contractions implemented as matrix multiplications. Their efficient implementation in gpu hardware involves many algebraic gems and is a work of art. You can have a taste of the complexity involved in their design in this Youtube video: https://www.youtube.com/live/ufa4pmBOBT8

janalsncm 2 days ago

> But matrix multiplication, to which our civilization is now devoting so many of its marginal resources, has all the elegance of a man hammering a nail into a board.

Elegance is a silly critique. Imagine instead we were spending trillions on floral bouquets, calligraphy, and porcelain tea sets. I would argue that would be a bad allocation of resources.

What matters to me is whether it solves the problems we have. Not how elegant we are in doing so. And to the extent AI fails to do that, I think those are valid critiques. Not how elegant it is.

card_zero 2 days ago

"Creeping elegance", I guess: https://en.wiktionary.org/wiki/creeping_elegance
But elegant can mean minimal, restrained, parsimonious, sparing. That's different from a bunch a paraphernalia and flowery nonsense.
srean 2 days ago

You mean tech-debt ridden spaghetti code that works now is as good as an elegant correct and an efficient code ?
Joker_vD 2 days ago

> Imagine instead we were spending trillions on floral bouquets, calligraphy, and porcelain tea sets. I would argue that would be a bad allocation of resources.
And I would argue it wouldn't. So? It's a value call.
> What matters to me is whether it solves the problems we have.
Again, what is and is not a problem is a value call. "Lacking tools to surveil and control the population" and "having population that demands its share of economic output" arguably are problems for someone which AI probably could solve. "The planet is literally on fire" is another problem (for, arguably, much bigger number of someones) and pouring terawatts of energy into chips that, coincidentally, do AI-related matrix multiplications, won't solve that problem.
alganet 2 days ago

You're right. It's all about solving problems.
Maybe we need a word that, when applied to mathematical concepts, describes how simple, easy to understand and generally useful a solution or idea is.
I wonder what that word could be.
almostgotcaught 2 days ago

the aesthetics of math and physics is by far the most boring discussion that can be had. i used to be utterly repulsed by such talk in undergrad - beauty this and that. it absolutely always felt affected and put on - as if you talk about it enough, you'll actually convince people outside of the major to give you the same plaudits as real artists.... yea right lol.
- chris_wot 2 days ago
  
  That’s the opinion of one person. There are many mathematicians who find beauty in the maths they are studying.
  - srean 2 days ago
    
    I will emphasize your point more forcefully. All mathematicians I know work on what they work because of the beauty and aesthetics in their field.
    Much like sex. Sex has reproductive utility but that's not why most people engage in it. Those who do are are missing much.
    Notion of beauty for a mathematician is quite specialized. It's the difference between spaghetti code that works and an elegant and efficient code that is correct. They are easy to build upon efficiently.
    
    almostgotcaught 2 days ago
    
    > Much like sex.
    My guy you know lots of people in here have read Feynman right? You should cite him instead of pretending you were clever enough to come up with the analogy yourself.
    
    srean 2 days ago
    
    Quite the contrary. I expect majority of HN readers to know the quote, base 0 if you will, and not harbor thoughts that by having read it they are a part of an exclusive club.
    Channeling Good Will Hunting much huh? Most HN'ers would have watched that too.
    
    almostgotcaught 2 days ago
    
    I have no idea what you're trying to say - it is generally understood everywhere in the world (ie all forms of human culture) that it's pathetic to pass off someone else's insights as your own.
    
    srean 2 days ago
    
    Only when there is an expectation of being perceived as original. When people use differentiation people don't cite Newton or Leibnitz, do they ?
    
    almostgotcaught 2 days ago
    
    .... Of course they do? You can still find citations of those papers to this day.
    For all of your "forceful" comments on math, I think probably you don't actually know much about it.
    
    srean 2 days ago
    
    > You can still find citations of those papers to this day.
    That's not what I contested. What fraction of people who use differentials in their published work still cite Newton or Leibnitz was the point. You can count number of such citations in last 10 years of say neural nets literature, or applied maths literature and report. Thats plenty of use.
    Citations to their differential calculus that are still made are mostly in the context of history of math.
    Seems numeracy or comprehension is not your strong point. LOL.
    > I have no idea what you're trying to say
    Now I don't doubt it. LOL
    
    almostgotcaught 2 days ago
    
    > What fraction of people who use differentials in their published work still cite Newton or Leibnitz was the point
    Those papers were written in the 1600s. "The character of physical law", the essay you're ripping off, was written in 1964. 100% papers from the 1960s are cited every single time the techniques are used.
    You are as tedious as the original refrain I was complaining about (which is not at all ironic). What's most tedious is you're not actually a mathematician but presume to speak for them.
    
    1718627440 a day ago
    
    And an HN comment is not a scientific paper. When I tell people about ideas I find insightful, I don't cite a proper source either. If they like the idea I might tell them later, how I got it or where they can find more detail about the idea.
    Honestly quite often an idea did originate in my own thoughts, but the work to put it into well-formed words, which I will use to tell others about it, was done by someone else, whose formulations of the same idea I had, I have read later.
    
    srean 2 days ago
    
    You are changing goal posts now. Your absolute claim was
    > it's pathetic to pass off someone else's insights as your own.
    To which my point was citations are made when there is an expectation of originality. By now Feynman's anecdotes are folklore and folks wisdom.
    OK let's go by your standards. Cooley Tukey's FFT algorithm was "discovered" by them in around 1965. How often do they get a citation when FFT is used, especially in comments on a social site, such as HN is.
    LOL even 10 years old results do not get cited because they are considered common knowledge.
    That said, Witt's notion of beauty that Propp is critiquing in the posted article is just plane idiotic. Lack of commutativity is not lack of beauty. What a stupid idea.
    Mathematical beauty and imagination is different. One of Hilbert's grad students dropped out to become a poet.Hilbert is reported to have said: 'I never thought he had enough imagination to be a mathematician.'
    A little unsolicited advice: if you are an aspiring mathematician(I am very happy for you if you are), but if you do not have a sense of a good taste or mathematical beauty, you probably will probably not have a good time.
    
    almostgotcaught 2 days ago
    
    > if you are an aspiring mathematician, and you do not have a sense of a good taste or mathematical beauty, you probably will probably not have a good time
    Lol I have a PhD from a T10 and 15 published papers. I'm pretty sure I don't need your advice on "taste" or "beauty".
    
    srean 2 days ago
    
    Yup. PhD is a start.
    My condolences though, for being in a line of work where you don't perceive beauty.
    
    almostgotcaught 2 days ago
    
    I'm a researcher in FAANG - I'll keep in mind your condolences when I get my yearly RSU re-grant lololol.
    
    srean a day ago
    
    Then thanks for your services, your work is literally paying me in my retirement.
    The 15 is on the lower side. When I used to be there 15 would be on the uncomfortable side :) Good luck to up the numbers. Oh! do get back on the Cooley Tukey citations and FFT mention ratio.
    
    almostgotcaught a day ago
    
    You know you have a web presence right? You know it's very easy to verify you've never been in FAANG right? lolol
  - almostgotcaught 2 days ago
    
    > There are many mathematicians who find beauty
    Lol did you think this was clever? You just literally reiterated exactly what I said. See, if you had said "there are many pianists that find beauty in math" - you know like how many mathematicians find beauty in piano concertos - then you'd have me.
    
    inglor_cz 2 days ago
    
    Pianists don't find beauty in written maths, but mathematicians don't usually find beauty in sheet music either. It is the performance, accesible to our senses, that can convey beauty even to amateurs.
    Accidentally - in the parts of maths where the concepts can be visualized, such as fractal theory, non-mathematicians seem to love what they see.
    
    almostgotcaught 2 days ago
    
    > written maths
    Absolutely no one when they're navel gazing on this topic is discussing the aesthetics of notation.
    > seem to love what they see.
    Nor visualizations
    
    inglor_cz 2 days ago
    
    It was you who compared music to maths.
    People in general perceive music as "what is being played" vs. mathematics "what is being written on a page". This is the common concept, but it is incomplete. Music has its boring parts (notation), so does maths, but the general public is prone to confuse maths as a whole with its "sheet music".
    "when they're navel gazing"
    Maybe they're just thinking?
  - card_zero 2 days ago
    
    You're equivocating over the verdict, here. Are they right?

gweinberg 2 days ago

The commutation problem has nothing to do with matrices. Rotations in space do not commute, and that will be the case whether you represent them as matrices or in some other way.

laichzeit0 2 days ago

Well function composition f(g(x)) is not the same as g(f(x)) and when you represent f and g as matrices relative to some suitable set of basis functions then obviously AB and BA should be different. If the multiplication was defined any different, that wouldn’t work.

btilly 2 days ago

The way that I used to put this was, "If I put on my shoes before my socks, I'll get a different result than if it I put on my socks before my shoes. Order of operations matters."
- bee_rider a day ago
  
  The ticket collector asked me why I was getting dressed for work on the train.
  I asked him, “is this not the commuter rail?”

pbhjpbhj 2 days ago

More like watching a weaving machine than watching a person hammer nails imo. Maybe like an old-time mill, with several machines if you think in terms of actual processing on an accelerator?

There's a wooden weaving machine at a heritage museum near me that gives me the same 'taste' in my brain as thinking about 'matrix' processing in a TPU or whatever.

snickerbockers 2 days ago

Maybe I'm just being ai-phobic or whatever but I strongly suspect the original article is written by grok based on how it goes off on bizarre tangents describing extremely complicated metaphors that are not only inaccurate but also wouldn't in any way be insightful even if they were accurate.

algernonramone 2 days ago

I am willing to admit that I find matrix multiplication ugly, as well as non-intuitive. But, I am also willing to admit that my finding it ugly is likely a result of my relative mathematical immaturity (despite my BS in math).

znkr 2 days ago

Maybe it helps to think of matrix multiplication as a special case of the composition of linear transformation. In the finite dimensional case they can be expressed as matrix multiplications.

card_zero 2 days ago

So, Hardy focused on good explanations, and that was what he meant by beauty. Fair enough. The best objective definition of beauty I know of is "communication across a gap". This covers flowers, mathematics, and all kinds of art, including art I think is ugly such as Lucian Freud and Hans Giger I guess. So now I'm describing some things as beautiful and ugly at the same time, which betrays that there's a relative component to it (relative, objectively). That means I wish some things - including mathematics, which is usually tedious - communicated better, or explained things that seem to me to matter more: I feel in my gut that there's potential for this. So I don't rate mathematics as beautiful, any of it, personally.

But I'll admit its barely beautiful. Within which context, I guess the article's lawyering for the relative beauty of a matrix was a success, but I always liked them better than calculus or group theory anyway.

amai 2 days ago

I think a much more insightful discussion would be to ask, why matrix multiplication is so much more useful than Hadamars product: https://en.wikipedia.org/wiki/Hadamard_product_(matrices)

Hadamars product/elementwise multiplication is also commutative.

stackghost 2 days ago

>Matrix algebra is the language of symmetry and transformation, and the fact that a followed by b differs from b followed by a is no surprise; to expect the two transformations to coincide is to seek symmetry in the wrong place — like judging a dog’s beauty by whether its tail resembles its head.

The way I've always explained this to non-algebra people is to imagine driving in a city downtown. If you're at an intersection and you turn right, then left at the next intersection, you'll end up at a completely different spot than if you were to instead turn left and then right.

countWSS 2 days ago

Beauty,symmetry,etc are largely irrelevant, the key point it does not scale and burning gigawatts to compute these matrices(even with all those tricks) will not scale or compete with more efficient/direct methods in the long term. Perhaps transformers are very elaborate sunk-cost fallacy where pivoting to scalable, simpler architecture is treated as "too risky" even when cost of new GPU cluster dwarfs whatever it takes to bring an architecture from 0 to chatGPT level.

sigmoid10 2 days ago

The whole issue with this industry is that it moves so fast, there is no "long term." You're either in all the way in a likely futile attempt to capture the market or you're not in at all. So you also don't have time to really innovate on the hardware or software level and you need to put everything into training data and training hardware.

Scene_Cast2 2 days ago

Matmuls (and GEMM) are a hardware-friendly way to stuff a lot of FLOPS into an operation. They also happen to be really useful as a constant-step discrete version of applying a mapping to a 1D scalar field.

I've mentioned it before, but I'd love for sparse operations to be more widespread in HPC hardware and software.

koolala 2 days ago

Quaternions are beautiful too until you sit down to multiply them.

tonyarkles a day ago

I’ve been swimming in quaternions all week. Thank you for that :D

ComplexSystems 2 days ago

Matrices represent linear transformations. Linear transformations are very natural and "beautiful" things. They are also very clearly not commutative: f(g(x)) is not the same as g(f(x)). The matrix algebra perfectly represents all of this, and as a result, FGx is not the same as GFx. It's only not "beautiful" if you believe that matrix multiplication is a random operation that exists for no reason.

peterfirefly 2 days ago

I just finished reading lots of Stephen Witt quotes on goodreads. He comes across as a white Malcolm Gladwell, except that he actually does know what "Igon values" are so I don't know what his excuse is.

mhb 2 days ago

> white Malcolm Gladwell
I'm intrigued. How would a white Malcolm Gladwell's quotes differ from the IRL Malcolm Gladwell?
- peterfirefly 2 days ago
  
  Different hair, mostly.

jamespropp 2 days ago

Do you disagree with my take or think I’m missing Witt’s point? I’d be happy to hear from people who disagree with me.

starmole 2 days ago

I think 4x4 matrices for 3D transforms (esp homogenous coordinates) are very elegant. I think the intended critique is that the huge n*m matrices used in ML are not elegant - but the point is made poorly by pointing out properties of general matrices. In ML matrices are just "data", or "weights". There are no interesting properties to these matrices. In a way a Neumann (https://en.wikipedia.org/wiki/Von_Neumann%27s_elephant) Elephant. Now, this might just be what it is needed for ML to work and deal with messy real world data! But mathematically it is not elegant.
dnr 2 days ago

The inelegance to me isn't in the definition of the operation, but that it's doing a huge amount of brute-force work to mix every part of the input with every other part, when the answer really only depends on a tiny fraction of the input. If we somehow "just knew" what parts to look at, we could get the answer much more efficiently.
Of course that doesn't really make any sense at the matrix level. And (from what I understand) techniques like MoE move in that direction. So the criticism doesn't really make sense anymore, except in that brains are still much much more efficient than LLMs so we know that we could do better.
johngossman 2 days ago

I think you're right that the inelegant part is how AI seems to just consist of endless loops of multiplication. I say this as a graphics programmer who realized years ago that all those beautiful images were just lots of MxNs, and AI takes this to a whole new level. When I was in college they told us most of computing resources were used doing Linear Programming. I wonder when that crossed over to graphics or AI (or some networking operation like SSL)?
- dwaltrip 2 days ago
  
  What could any complex phenomenon possibly be other than small “mundane” components combined together in a variety of ways and in immense quantities?
  All such things are like this.
  For me, this is fascinating, mind-boggling, non-sensical, and unsurprising, all at once.
  But I wouldn’t call it inelegant.
- jiggawatts 2 days ago
  
  > When I was in college they told us most of computing resources were used doing Linear Programming.
  I seriously doubt that was ever true, except perhaps for a very brief time in the 1950s or 60s.
  Linear programming is an incredibly niche application of computing used so infrequently that I've never seen it utilised anywhere despite being a consultant that has visited hundreds of varied customers including big business.
  It's like Wolfram Mathematica. I learned to use it in University, I became proficient at it, and I've used it about once a decade "in industry" because most jobs are targeted at the median worker. The median worker is practically speaking innumerate, unable to read a graph, understand a curve fit, or if they do, their knowledge won't extend to confidence intervals or non-linear fits such as log-log graphs.
  Teachers that are exposed to the same curriculum year after year, seeing the same topic over and over assume that industry must be the same as their lived experience. I've lost count of the number of papers I've seen about Voronoi diagrams or Delaunay triangulations, neither of which I've ever seen applied anywhere outside of a tertiary education setting. I mean, seriously, who uses this stuff!?
  In the networking course in my computer science degree I had to use matrix exponentiation to calculate the maximum throughput of an arbitrary network topology. If I were to even suggest something like this at any customer, even those spending millions on their core network infrastructure, I would be either laughed at openly, or their staff would gape at me in wide-eyed horror and back away slowly.
  - aragilar 2 days ago
    
    The first two results from Google with "Voronoi astro" gave two different uses than the one I knew about (sampling fibre bundles): https://galaxyproject.org/news/2025-06-11-voronoi-astronomy/ https://arxiv.org/abs/2511.14697
    
    jiggawatts 2 days ago
    
    Astronomy is pure research and is performed almost exclusively by academics.
    I’m not saying these things have zero utility, it’s just that they’re used far less frequently in industry than academics imagine.
    
    aragilar 2 days ago
    
    And astronomy tends to throw up technology that becomes widely used (WiFi being the obvious example) or becomes of "interest" to governments. I expect that AMR code will be used/ported to nuclear simulations if it proves to be useful. Do I expect it to be used in a CRUD app? Obviously not, but use by most software shops isn't a measure of importance.
  - petters 2 days ago
    
    I have not only used linear programming in the industry, I have also had to write my own solver because the existing ones (even commercial) were to slow. (This was possible only because I only cares about a very approximate solution)
    The triangulations you mention are important in the current group I'm working in.
    
    jiggawatts 2 days ago
    
    I'm curious to hear what you specifically use these algorithms for!
    PS: My point is not that these things are never used, they clearly are, I'm saying that the majority of CPU cycles globally goes towards "idle", then pushing pixels around with simple bitblt-like algorithms for 2D graphics, then whatever it is that browsers do on the inside, then operating system internals, and then specialised and more interesting algorithms like Linear Programming are a vanishingly small slice of whatever is left of that pie chart.
  - srean 2 days ago
    
    3d modelers would like to have a word with you.
    Part of the reason why linear programming does need t get used as often is that there are no industry standard software implementation that is not atrociously priced. Same deal with Mathematica.
    
    jiggawatts 2 days ago
    
    3D modelling is mostly linear algebra, not linear programming, which is an entirely different set of algorithms.
    
    srean a day ago
    
    Oh I mentioned it in the context of mesh geometries, tesselations.
messe 2 days ago

I think it conflates the map and the territory.
Linear transformations are a beautiful thing, but matrices are an ugly representation that nevertheless is a convenient one when we actually want to compute.
Elegant territory. Inelegant, brute-force, number crunching map.
LegionMammal978 2 days ago

If the O(n^3) schoolbook multiplication were the best that could be done, then I'd totally agree that "it's simply the nature of matrices to have a bulky multiplication process". Yet there's a whole series of algorithms (from the Strassen algorithm onward) that use ever-more-clever ways to recursively batch things up and decrease the asymptotic complexity, most of which aren't remotely practical. And for all I know, it could go on forever down to O(n^(2+ε)). Overall, I hate not being able to get a straight answer for "how hard is it, really".
- RossBencina 2 days ago
  
  For anyone interested, there is a introductory survey of the current lower bound at: https://en.wikipedia.org/wiki/Computational_complexity_of_ma...
amelius 2 days ago

Maybe the problem is that matrices are too general.
You can have very beautiful algorithms when you assume the matrices involved have a certain structure. You can even have that A*B == B*A, if A and B have a certain structure.
djmips 2 days ago

Ignore me then because I agree with you. :) He sounds like someone who upon first hearing jazz to complain it was ugly.
veqq 2 days ago

> sends the pair (x, y) to the pair (−x, y)
I know linear algebra, but this part seems profoundly unclear. What does "send" mean? Following with different examples in 2 by 2 notation only makes it worse. It seems like you're changing referents constantly.
- jeffhwang 2 days ago
  
  Let me try.
  In US schools during K-12, we generally learn functions in two ways:
  1. 2-d line chart with an x-axis and y-axis, like temperature over time, history of stock price, etc. Classic independent variable is on the horizontal axis, dependent variable is on the vertical axis. And even people who forgotten almost all math can instantly understand the graphics displayed when they're watching CNBC or a TV weather report.
  2. We also think of functions like little machines that do things for us. E.g., y = f(x) means that f() is like a black box. We give the black box input 'x'; then the black box f() returns output 'y'. (Obviously very relevant to the life of programmers.)
  But one of 3blue-1brown's excellent videos finally showed me at least a few more ways of thinking of functions. This is where a function acts as a map from what "thing" to another thing (technically from Domain X to Co-Domain Y).
  So if we think of NVIDIA stock price over time (Interpretation 1) as a graph, it's not just a picture that goes up and to the right. It's mapping each point in time on the x-axis to a price on the y-axis, sure! Let's use the example, x=November 21, 2025 maps to y=$178/share. Of course, interpretation 2 might say that the black box of the function takes in "November 21, 2025" as input and returns "$178" as output.
  But what what I call Interpretation 3 does is that it maps from the domain of Time to the output Co-domain of NVDA Stock Price.
  3. This is a 1D to 1D mapping. aka, both x and y are scalar values. In the language that jamespropp used, we send the value "November 21, 2025" to the value "$178".
  But we need not restrict ourselves to a 1-dimensional input domain (time) and a 1-dimensional output domain (price).
  We could map from a 2-d Domain X to another 2-d Co-Domain Y. For example X could be 2-d geographical coordinates. And Y could be 2-d wind vector.
  So we would feed input of say location (5,4) as input. and our 2Dto2D function would output wind vector (North by 2mph, East by 7mph).
  So we are "sending" input (5,4) in the first 2d plane to output (+2,+7) in the second 2d plane.
- jamespropp 2 days ago
  
  I’ve updated this passage. Let me know if the new version is clearer.
- jamespropp 2 days ago
  
  Thanks for pointing this out. I’ll work on this passage tomorrow.

woopwoop 2 days ago

Matrix multiplication is not ugly, but matrices themselves are ugly, mainly because they encode the arbitrary operation of choosing a basis. There's nothing especially nice about the pixel basis for images, or about the token basis for language. But of all the things that make up modern deep learning, matrix multiplication is surely the _least_ ugly. Relu/gelu is not pretty! Batch normalization is vomit-inducing!! Imagenet normalization? JFC!!!

mcswell 2 days ago

I guess Stephen Witt must not like subtraction either, since a-b =/= b-a. Nor division.

1718627440 a day ago

Which is maybe why mathematicians don't define it on its own, but define addition instead and subtraction based on it.

fracus 2 days ago

I think it is just a matter of perspective. You can both be right. I don't think there is an objective answer to this question.

sswatson 2 days ago

The author has exclusive claim to their own aesthetic sensibilities, of course, but the language in the piece suggests some degree of universality. Whereas in fact, effectively no one who is knowledgeable about math would share the view that noncommutative operations are ugly by virtue of being noncommutative. It’s a completely foreign idea, like a poet saying that the only beautiful poems are the palindromic ones.
krackers 2 days ago

One could say that it depends on your basis...

ogogmad 2 days ago

Don't like matrices? Introducing: Penrose abstract index notation. Or "I can't believe it's not matrices".

tiagod 2 days ago

Honestly, in a purely technical sense, I do find it beautiful how you can take matrix multiplication and a shit-ton of data, and get a program that can talk to you, solve problems, and generate believable speech and imagery.

There are many complications arising from such a thing existing, and from what was needed to bring it into existence (and at the cost of whom), I'll never deny that. I just can't comprehend how someone can find the technical aspects repulsive in isolation.

It feels a lot like trying to convince someone that nuclear weapons are bad by defending that splitting an atom is akin to banging a rock against a coconut to split it in two.

dwa3592 2 days ago

No. It's not ugly.

cjfd 2 days ago

Anyone who thinks matrix multiplication is ugly has understood nothing about it.

geokon 2 days ago

maybe the issue boils down to overloading the term "multiplication". If mathematicians instead invented a new word here, people would get tripped up less (similarly for 'dot' and 'cross' "products")

i think a lot of issues arise from using analogies. Another one us complex numbers as 2D vectors. Its an ok analogy.. Except complex numbers can be multiplied where are 2D coordinates can not. Your weird new nonvectors are now spinning and people are left confused

o11c 2 days ago

Matrix multiplication libraries are ugly. They either give up on performance or have atrocious interfaces ... sometimes both.

Using matrix multiplication is also ugly when it's literally millions of times less efficient then a proper solution.

Filligree 2 days ago

What’s the proper solution for computing the voltage and current flows in a component network, order than modified nodal analysis?
- o11c 2 days ago
  
  I'm not familiar with that particular problem, but I did use a load-bearing "when".
  - chris_wot 2 days ago
    
    Can you give an example of when it is not appropriate?
    
    o11c 2 days ago
    
    Literally 99% of the crap they're shoving AI into these days.

jmclnx 2 days ago

IIRC, working with matrices was much easier using FORTAN, I would expect modern fortran kept that 'easiness'.

zkmon 2 days ago

I doubt anyone of the past or present could fully describe what a matrix is, and what its multiplication is. There are many ways people looked at it so far - as a spatial transformation, dot products and so on. I don't think the description is complete in any significant way.

That's because we don't fully understand what a number is and what a multiplication is. We defined -x and 1/x as inverses (additive and multiplicative), but what is -1/x ? Let's consider them as operations. Apply any one of them on any other of them, you get the third one. Thus they occupy peer status. But we hardly ever talked about -1/x.

The mathematical inquisition is in its infancy.

srean 2 days ago

This is just low brow philosophical sounding rubbish of the same variety as "what is 'is'" nobody knows. Sounds profound though.
Matrix is just one way to organize data. When linear operators are organized this way composition of linear operators map to matrix multiplication.
But that is just one of the ways that multiplication may be defined on matrices, Hadamard products, Tensor product, Khatri-Rao product are some of the other examples. They all correspond to different mathematical structures one wants to explore or use. If linear algebraic structures is what ones to explore or use then one gets matrix multiplication.
themafia 2 days ago

As someone who never got deeply into math but deeply into programming they just seemed like an incompletely generalized data structure with an interesting "canonical" algorithm that can be used on it. In some cases, if you arrange your data into the structure correctly, you can use it to model interesting real world phenomenon.
It feels like Linear Algebra tries to get at the heart of this generality but the structure and operator is more constrained than it ultimately could be. It's a small oddball computational device that can be tersely written into papers and widely understood. I always find pseudocode easier to follow and reason about but that's my particular bias.
youoy 2 days ago

I get your point, but i think the real issue is -(1/(-1/x)). It is the one that is being overlooked the most in our society, as if it were something normal, but it contains some of the deepest truths imho.
- iamgopal 2 days ago
  
  how about -1/(-(1/(-1/x))) ? How many roads must a man walk down before we can call him a man ?
  - zkmon 2 days ago
    
    No need of walking, they just need to be able to read post properly before calling him a man.
- zkmon 2 days ago
  
  No you didn't get it. You missed "Let's consider them as operations. Apply any one of them on any other of them, you get the third one."
  - youoy 2 days ago
    
    So is what i wrote a third one? Fourth? Fifth? :)
    
    zkmon 2 days ago
    
    Not sure what you are talking about. What you wrote reduces to just x. What I meant was, if you substitute say, -x for x in -1/x, you get 1/x, which is the third inverse. Same is true for the other two pairs. So, if we call them functions f, g and h, then, f=g(h)=h(g); g=f(h)=h(f); h=f(g)=g(f)