Inspired by the comments on this Ars article, I’ve decided to program my website to “poison the well” when it gets a request from GPTBot.

The intuitive approach is just to generate some HTML like this:

<p>
// Twenty pages of random words
</p>

(I also considered just hardcoding twenty megabytes of “FUCK YOU,” but that’s a little juvenile for my taste.)

Unfortunately, I’m not very familiar with ML beyond a few basic concepts, so I’m unsure if this would get me the most bang for my buck.

What do you smarter people on Lemmy think?

(I’m aware this won’t do much, but I’m petty.)

  • colonial@lemmy.worldOP
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    1 year ago

    Re: CSS and Javascript being obvious - I’m planning to do this entirely server side, since I control the whole stack.

    Regular users (and good bots) get regular pages, but if a GPTBot user agent makes a request, they just get garbage back. (Obviously this relies on OpenAI not masking the user agent, but if they do that, hopefully bigger webmasters will notice the lack of hits and call them out.)

    I like your idea with the sentence fragments. Because the LLM check would happen before I actually look up the requested resource, I think I could combine it with fake links to lead the scraper on a wild goose chase.

    • TootSweet@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      1 year ago

      Yeah, that all makes sense. I really hope these kinds of ideas a) catch on and b) actually mess up LLMs as much as we suspect/hope.