On Mon, Feb 03, 2025 at 07:08:32PM -0000, Donald Whittemore via cctalk wrote:
I am an old mainframe guy. I could give you my COBOL
deck of cards or the
compile listing. You could pour through the code looking for
nefarious/malicious code. I then hand you the object deck. You have no idea
if it matches the code you looked at. The only way you could be sure is to
compile the code I gave you and use your own object deck.
So why is open source these days such a beneficial thing?
The idea of Open Source is that not only do you have the source code (which
you could get before even for closed source products, e.g. various DEC
source distributions on e.g. microfiche - which was of course still owned and
copyrighted by DEC), but you can build your own binaries from it, you can
change it (and build updated binaries) and - depending on license - are
usually encouraged to share you changes.
DeepSeek may be
open source but I have no way to create my own executable.
It's a model, not source code. There is no binary in the usual sense. Calling
it Open Source is definitely misleading as here it only means you can
legally (maybe - that whole LLM space is a bunch of already ongoing lawsuits
and an avalanche of more lawsuits waiting to happen due to their creators
approach to intellectual property and privacy) run the model yourself without
paying whoever built it.
Besides, I don’t
know what language it is written in but I bet I have no expertise in it. No
way to for me to identify nasty code.
It's worse than that. It's not written in any language as such. It is a
trained machine learning model (specifically, a transformer based LLM),
that is essentially an undebuggable black box. Don't like the output the
model produces? Adjust the training and retrain.
Additionally, all the publicly available hosted LLMs (e.g. ChatGPT) are
wrapped in filter layers that both filter the input (i.e. queries against
the model) and the output of the model to keep the worst outrages (e.g.
"I asked ChatGPT how to build a bomb and it gave me detailed instructions!")
somewhat contained.
On top of that: A lot of those LLMs are build on theft at an epically large
scale. They hovered up everything in sight (and then some) without even
pretending to care about intellectual property rights - e.g. the NY Times
has taken OpenAI to court because they managed to make the OpenAI LLMs
spit out long verbatim fragments of NY Times content. The hilarious part
is that DeepSeek essentially stole from OpenAI that which OpenAI previously
stole from everyone else and OpenAI is very angry about the lack of honor
among thieves or something ;-)
Yes, many people may have reviewed the code but that
does not mean what I am
running is the result of that code.
Aside from the classic "Reflections on trusting trust", you are far more
likely to get bitten by wonky compiler optimizations than an actually
malicious compiler.
Kind regards,
Alex.
--
"Opportunity is missed by most people because it is dressed in overalls and
looks like work." -- Thomas A. Edison