So, by your argument, I get a book out of the library, memorize it, and then I can sit down at my computer, type it in, and then print it. Then I can sell those printed copies. No, you can quote it, discuss it with colleagues and friends, but reproducing it in its entirety will likely end up with you in court. So there are limits to what you can do "at will".
The question is, are AI companies exceeding the Fair Use laws by taking a perfect digital copy of an existing work for training their AI, often with the intent to produce works for which the original owner would be compensated. First, if you look at US Copyright Law and the Fair Use policy (see material from the US site below), you are not legally able to appropriate the work wholesale for the purposes of reproduction and profit. The companies have cited "Fair Use" as the basis for their AI training, but I think that given the potential impact on authors (and others, like photographers, actors, musicians etc.) that courts may well find that the wholesale appropriation of works is not Fair Use. Especially since it can deprive individuals of compensation for their creative endeavors. The current copyright law did not anticipate AI training, and this is a gray area to be resolved, but this issue is being taken up now by, for example, the
New York Times vs OpenAI, Getty Images vs Stability AI, and
Andersen v. Stability AI. The list of pending cases can be seen here:
AI Fair Use litigation cases
A recent article by Enterprise AI says:" OpenAI has acknowledged that its programs are trained on publicly available data sets, including copyrighted works. This process involves making copies of the data to be analyzed. However, creating such copies without permission could infringe on copyright holders' rights. OpenAI argues that its training processes constitute fair use and do not involve infringement."
We will see what the courts decide as to whether this is Fair Use. The fact that much of the subsequent use of AI products either cost to the AI end user or is used to generate AI company revenue (e.g. subscriptions or providing other materials) may work against the AI companies, as it is commerical use.
Here is what US Copyright Law says (note, these are abridged points, entire paragraphs would be too lengthy):
Section 107 of the US Copyright Act provides the statutory framework for determining whether something is a fair use and identifies certain types of uses—such as criticism, comment, news reporting, teaching, scholarship, and research—as examples of activities that may qualify as fair use.
Purpose and character of the use, including whether the use is of a commercial nature or is for nonprofit educational purposes: Courts look at how the party claiming fair use is using the copyrighted work, and are more likely to find that nonprofit educational and noncommercial uses are fair.
Nature of the copyrighted work: This factor analyzes the degree to which the work that was used relates to copyright’s purpose of encouraging creative expression. Thus, using a more creative or imaginative work (such as a novel, movie, or song) is less likely to support a claim of a fair use than using a factual work (such as a technical article or news item).
Amount and substantiality of the portion used in relation to the copyrighted work as a whole: Under this factor, courts look at both the quantity and quality of the copyrighted material that was used. If the use includes a large portion of the copyrighted work, fair use is less likely to be found; if the use employs only a small amount of copyrighted material, fair use is more likely.
Effect of the use upon the potential market for or value of the copyrighted work: Here, courts review whether, and to what extent, the unlicensed use harms the existing or future market for the copyright owner’s original work. In assessing this factor, courts consider whether the use is hurting the current market for the original work.