Training effectively takes a lot of time. So that you need to have a cut off of data, and then focus on the training and validation of the model, if you don’t want it to generate absolute gibberish.
But also, a lot of stuff online after 2021 is already computer generated, which tends to compound errors and is generally bad for model training. Other issues can be legal.
But I also think that during the pandemic the uptick of misinformation was so bad, that training a model on that dataset will get you an antivaxx, neonazi, Qanon, wumao, russians troll bot.
Latest Answers