The internet is infested with bots that people deploy to hammer sites like X with traffic, whether to collect data or just overload them. The puzzles are designed to be hard for computer vision models to solve. That’s why you see tasks like “rotate this object until it’s pointing the right direction”, “move this puzzle piece until it’s in the right position”, or “find all images with a motorcycle”.
You or I have a robust enough world model (and manual dexterity) to quickly solve the puzzles despite noisy/corrupted images. Machine learning models are usually sensitive to pixel-level patterns from images they were trained on, making them too brittle for these puzzles.
Latest Answers