dvduval 43 minutes ago
tancop 6 minutes ago
deaton 43 minutes ago
storus 7 minutes ago
pluc 37 minutes ago
You can't steal or profit off of that data, but it's fine for them for whatever reason. I guess because they're a force for good in the world and are pushing humanity forward eh?
ggillas 20 minutes ago
nla: if you create content online (public repo code, blog, podcast, YouTube, publishing) the smartest thing you can do if to file a US copyright, even if you have a hobby blog.
Anthropic paid $1.5B in a class settlement to authors because it was piracy of copyrighted works. If we as a HN community had our works protected, there are potentially huge statutory damages for scraping by any and all llms. I work with hundreds of writers and publishers and am forming a coalition to protect and license what they're creating.
MontyCarloHall 19 minutes ago
andai 14 minutes ago
The pretraining (common crawl, i.e. the entire internet. Also books and papers, mostly pirated), and the realtime web scraping.
The article appears to be about the latter.
Though the two are kind of similar, since they keep updating the training data with new web pages. The difference is that, with the web search version, it's more likely to plagiarize a single article, rather than the kind of "blending" that happens if the article was just part of trillions of web pages in the training data.
There's this old quote: "If you steal from one artist, they say oh, he is the next so-and-so. If you steal from many, they say, how original!"
hparadiz 28 minutes ago
kstenerud an hour ago
I'm having a hard time understanding what's wrong here? Unless the link text is very long, why would someone linking to your article use different words for the link text?
oytmeal 15 minutes ago
tptacek an hour ago
adamzwasserman 40 minutes ago
ecommerceguy 13 minutes ago
100% creators should get compensated by ai platforms for their work.
Further, I can see a day where someone like Reddit will close off or license their data to llms. No doubt they are losing traffic right now.
hiroto_lemon 12 minutes ago
jorisw 7 minutes ago
Can't recall the last time a compelling argument started out like this
baq 34 minutes ago
cryptocod3 an hour ago
motbus3 35 minutes ago
an hour ago
Comment deleted_-_-__-_-_- 30 minutes ago
kingleopold 14 minutes ago
pull_my_finger 9 minutes ago
[1]: https://www.theverge.com/news/674366/nick-clegg-uk-ai-artist...
dwa3592 41 minutes ago
mrbluecoat 43 minutes ago
Is AI plural or is that a typo?
ProllyInfamous 32 minutes ago
Bezos' admission, recently, that the bottom 50% of current taxpayers ought'a NOT pay any taxes... is just preparing us for the inevitable UBI'd masses.
: own nothing, be happy!
iloveoof 17 minutes ago
It’s deeply ironic that if you forget about LLMs and look only at the outcome—-we’ve found a way to legally circumvent copyright and the siloing of coding knowledge, making it so you can build on top of (almost) the whole of human coding knowledge without needing to pay a rent or ask for permission—-it sounds like the dream of open source software has been realized.
But this doesn’t feel like a win for the philosophy of OSS because a corporation broke down the gates. It turns out for a lot of people, OSS is an aesthetic and not an outcome, it’s a vibe against corporate use or control of software, not for democratized access to knowledge.
saghm 31 minutes ago
dana321 36 minutes ago
quantummagic 14 minutes ago
peterbell_nyc 37 minutes ago
I think there are real questions around motivations for creation of novel, high quality valuable content (I think they still exist but move to indirect monetization for some content and paywalls for high value materials).
I don't inherently have any problems with agents (or humans) ingesting content and using it in work product. I think we just need to accept that the landscape is changing and ensure we think through the reasons why and how content is created and monetized.
Havoc 12 minutes ago
schwartzworld 25 minutes ago
The whole AI bubble is The Emperor's New Clothes, and it feels liek more people are finally admitting it.
booleandilemma 27 minutes ago
NetMageSCW an hour ago
asklq 40 minutes ago
Currently politicians don't understand this and listen to the criminals like Amodei, but it will change.
It took a while to deal with Napster etc., but the backlash will come.
Deprogrammer9 13 minutes ago
Pennoungen0 39 minutes ago
onion2k 21 minutes ago
This has been happening since Google launched in 1998. It was probably happening when we all used Hotbot and Altavista. It isn't really an AI problem, save for the fact that the automated production of copycat articles now reword things a bit.
tayo42 14 minutes ago
I guess AI could have made a better website and did better SEO then him but that's not really the issue
tiahura an hour ago
bparsons 24 minutes ago
andy12_ an hour ago
metalman an hour ago
JohnHaugeland an hour ago
Ecys an hour ago
Reading a dictionary and making a sentence is not plagiarism. Cope.
lukasbm 42 minutes ago
analog8374 31 minutes ago
kristofferR 33 minutes ago
There's absolutely nothing new or interesting here that hasn't already been said better by a thousand different random HN commenters.
codepack 22 minutes ago
mapcars an hour ago
Comment deleteddrcongo an hour ago
ciconia an hour ago
Apparently yes.
39 minutes ago
Comment deletedswader999 40 minutes ago
beej71 42 minutes ago
As someone who thinks humanity would be better off without LLMs, I want the assertion to be true, but I don't think it is.
rigonkulous an hour ago
We built it, because we as humans intrinsically know that information should be free - always - and AI is a way to accomplish this, finally.
Extrinsically, we also have a subset of humans who do not want information to be free, because they desire to profit from the divide between free/non-free information.
I have been thinking a lot about Aaron Schwartz lately, and how un-just it is that he was persecuted for doing something that is so commonplace now, it is practically expected behaviour in the AI/ML realms. If he hadn't been targetted for elimination, I wonder just how well his ethos would have perpetuated into the AI age ..
kolinko 14 minutes ago
There were people that learned knowledge from myself, and then made their own tutorials and promote these. It hadn't crossed my mind to complain about that. AI changes very little here.
What really changes things is not people republishing my materials, but people using agents to read my materials, and to get knowledge reformatted into something that they like.
If my slides were published today, they would probably be read verbatim by a handful of humans. The rest would be agents, but I'm ok with that. The business case is the same -- I want whatever reads the slide to be encouraged to use my tool. What kind of entity, I don't really care (again: from purely business perspective)