[cctalk] Re: Open source a panacea?

3 Feb 2025

On Mon, Feb 3, 2025 at 12:45 PM Alexander Schreiber via cctalk <
cctalk(a)classiccmp.org&gt; wrote:

...
  On Mon, Feb 03, 2025 at 07:08:32PM -0000, Donald
Whittemore via cctalk
 wrote:

 On top of that: A lot of those LLMs are build on theft at an epically large
 scale. They hovered up everything in sight (and then some) without even
 pretending to care about intellectual property rights - e.g. the NY Times
 has taken OpenAI to court because they managed to make the OpenAI LLMs
 spit out long verbatim fragments of NY Times content. The hilarious part
 is that DeepSeek essentially stole from OpenAI that which OpenAI previously
 stole from everyone else and OpenAI is very angry about the lack of honor
 among thieves or something ;-)

My understanding was that OpenAI accused DeepSeek of "distilling" their
model.  Via presumably making API queries to OpenAIs service. However
normally 'distillation" is the process of generating a smaller
("student")
model from a larger ("teacher") model except in this case DeepSeek
apparantly created something more of a peer to the teacher.   Maybe there
was some "veneer" final training but the basic assertion of "they stole
our
work" is probably more of OpenAI trying to control the narrative.   Now
whether DeepSeek stole N different entities IP,  that's a different
question.    As you said there is no way to reproduce the model,   so
what's on github isn't "open source" in most peoples understanding.
Still it's better than Microsoft/OpenAI where the model is "closed" behind
an API.

2025

2024

2023

2022

[cctalk] Re: Open source a panacea?