You can't just say "this won't help" and act like your word is law.
Yes, needing to log in to view pages is not the optimal scenario, but the alternative may someday be to just shut down the site. And forcing a login WILL do something to combat the problem. Stop it? Of course not. But it could save the site. You clearly don't understand what you're talking about if you think otherwise.
Aren't you acting as if your word was the law too?
How sure are you that it will do something? How sure are you that bots aren't going to be better in a few weeks or months after implementing this?
Will it really reduce bot crawling or it will make it more profitable? The more profitable it gets, the more incentives to find workarounds.
Also, the thing isn't "it could save the site" but "it will save the site". You think it can, I think it will be just a minimal hurdle that will be bypassed at some point.
And the worst part is that you won't even realize that it has been bypassed until you see that the numbers are in the red again.
See that once the bots get better, you're pretty much screwed up. You're in the same situation as before, but your users are worse, and you just sent them away to those other sites.
And I'm talking because it has happened before. Every time someone has tried to restrict something, or to forbid others from doing so, there are people that have found ways of avoiding it, making it useless. There are plenty of examples of that already. It's those who don't have such ability the ones that will be affected the most.
The rate at which such bots could scrape the site would be much slower than it is now, where they just have a giant pool of IPs.
Not to mention, they'd have to pretend to sleep etc because a user that stays logged in reading 24/7 is blatantly a bot
Like I said, it's an arms race. They circumvent, Grumpy finds better ways to detect them. Ad infinitum.
And it's harder for the botters, and more expensive for them too. And all the little fish stop trying, because it isn't worth it anymore, and maybe the big fish are now crawling so slowly that it's not obliterating Grumpy's budget.
*shrug*
That's what an arms race is. Whether to escalate to the next phase or not is up to Grumpy.
The thing is that in an arms race, it's usually more expensive to develop an armour that can resist a hit than develop the weapon to break that armour.
At some point, the measures taken are either so expensive that they can't be implemented without needing more $$$ than what they cost before, or they are so restrictive that well, it will even mess with their normal users.
It does not follow. Grumpy's problem is continued deep linking costing him $$$ in bandwidth, not "ZOMG Mangafox has copied all our stuff already".
And by definition it cannot be easier to keep crawling if they actually have to change their crawler to do so...
Your points(?) about it being easier for existing crawlers are barely coherent, so I won't try to address them.
However, re. sending them users... sending them users matters in that it would incentivise more crawling. We have agreed on this already.
I'm answering to multiple people here. This goes for the people that think that other sites will stop being able to rip batoto and using their work for profit.
Finally, something we can actually discuss. I'm interested in suggestions. What do the other sites have that Batoto doesn't? What do you think could attract enough of their users to reduce the viability of mass scraping of Batoto, or at least increase Batoto's income enough that Grumpy can pay the bills for running the site?
Well, for starters one of the disadvantages batoto has over other sites is that well, batoto is forced to follow DMCAs. That means less manga availability than those other sites, that have a higher availability of chapters than batoto.
That's one of the reasons you don't want your users going to other sites, because they already got an incentive for doing so. The problem with those other sites is that either they are full of ads, or they resize. But start fiddling with that and you may regret it.
On the other hand, another way of placing ads without bothering your users too much would be on the page itself.
That, of course, would be something that Grumpy would have to talk with:
- Scanlators
- Ad Networks
But the idea would be that a random position you insert the batoto page with an ad embedded on it, as a jpg/png or whatever.
That way, anyone who would download such file, either crawlers or users that download it, would be exposed to that advertisement, even if it's to delete it. Better than a blocked ad, lol.
People are already used to having scanlator pages on their mangas, so maybe an ad wouldn't be too much. And even if it would, well, they would only have to delete it outright, thus watching the ad nonetheless.
At least, it's like having a visit on that page.
Other options is see ways of reducing the load is well, if the crawlers and other people are going to deep crawl them, isn't it better to make it easier for them to do so?
Yeah, before you think I'm crazy, think about this. They are going to do it nonetheless, aren't they? Well, put the files available via other means (like a compressed file, not sure how much you would be able to compress, tho) that take a lower load on the server. Instead of them having to load each image to crawl them, have them download a single file.
This last measure is quite risky as it makes crawling easier, thus with the risk of increasing the traffic. It's just something I came up now, but that it would need a lot of hammering (or even discarding); but it goes on the line of reducing the bandwith costs that come from crawling.
That would also benefit the users, that would have the option of downloading those compressed files (that have an ad in them, btw) and thus, reducing the load they put on the server when looking for previous chapters of a manga. And it would be an incentive for them to use the site.
You could also put a limit on the download speed of that compressed file. The idea is that it's an alternative to crawling that those other sites (and users) might have. It's easier to use, but also easier to control the speed at what is downloaded and the load on your server.
For some, you could even set up a distributed network so that the users that are downloading it contribute with their bandwith to upload it to other users (like a torrent file would do).
The first proposal is more serious, the latter one is pure brainstorming and probably to be discarded.
Edited by zuram, 21 October 2015 - 05:28 AM.