
Recently, I was watching the Film Threat YouTube channel and caught Chris Gore and Alan Ng interviewing filmmaker “Mind Wank” about his cool concept trailer for a movie called Karen, made using Google’s Veo 2 AI. During the interview, I noticed a comment Chris made that often comes up in these conversations. He said he was okay with people using AI, as long as it was “ethically trained,” even adding, “Use your own models.”
Now, I get where that sentiment comes from, and it sounds reasonable on the surface. But honestly, I believe that the very idea of “unethically trained AI,” and by extension, “ethically trained AI,” is fundamentally flawed. And since Chris encourages different points of view, I want to unpack why.
My core argument is this: AI learns in much the same way humans do, by observing and processing the vast amount of information available in what we could call the “public square.” The internet, in this modern age, is arguably the largest public square humanity has ever created, a sprawling, dynamic, and accessible global exchange of ideas, images, texts, and sounds.
When AI models train on data scraped from the open web, from websites, databases, and publicly shared content, they are essentially performing a function analogous to how humans learn from conversations overheard in a café, books read in a library, or art seen in a gallery. The AI analyzes patterns, understands language structures, recognizes visual concepts, and identifies relationships within this massive dataset. It builds a complex statistical representation of the information it has consumed. As famed movie director James Cameron recently put it, perhaps we humans are all just complex Large Language Models ourselves, constantly gathering and processing data sets from our environment to learn and understand the world around us. If it is not unethical for a human being to learn, absorb, and draw upon the wealth of information available in the public square, it’s difficult to logically argue that it is inherently unethical for an AI to do the same.
Furthermore, this kind of learning, whether performed by a human mind or an AI model, operates outside the intended scope of copyright law. The U.S. Constitution grants Congress the power to protect specific, tangible expressions of creativity, like a particular book, a specific song recording, or a unique painting. This is intended to promote progress by giving creators temporary rights to their specific works, not to hinder the fundamental human (or now, computational) act of gaining knowledge, understanding underlying patterns, identifying styles, or extracting concepts. Copyright does not protect ideas, facts, styles, or general techniques. When AI trains, it is extracting these non-copyrightable elements at a structural level; it is learning the how and the what across a vast array of examples, not storing and reproducing exact copies of the original expressive works. This is precisely why a human artist studying thousands of paintings to develop their own style is not considered copyright infringement. The act of learning from publicly available information respects the ethos of shared knowledge in the public square and, in the context of AI training data, falls outside the purview of copyright infringement.