Jump to content

Primary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Secondary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Pattern: Blank Waves Squares Notes Sharp Wood Rockface Leather Honey Vertical Triangles
Photo

Batoto becoming registered only?


  • This topic is locked This topic is locked
512 replies to this topic

#141
Kireas

Kireas

    Potato Sprout

  • Members
  • 7 posts

From what I see here, what you are proposing is more harmful towards you, this site and 'consumers' rather than the guys you're trying to deter. Like what Uncomfortable Truth has said, rather than using strong countermeasures which hurt more than it helps, work on providing a better experience so that people will be drawn to it.

 

But, if going private etc, will solve most problems in the long run, then sure.


Edited by Kireas, 21 October 2015 - 04:20 AM.


#142
Anonyan

Anonyan

    Potato Spud

  • Members
  • 16 posts

If based Cake-kun is on board with it, why not?

 

It's not the most preferable situation for the users to be in, as most of us would rather continue lurking in peace.

But if the site is at risk because of the amount of bots, and a login requirement seems like a viable deterrent, a few seconds to log in is a small price to pay for the continued service that, at this point, we often take for granted.

 

Just make sure to add a notification about why this change was made for any lurkers who find they suddenly are required to make an account to prevent confusion.

 

And hey, if we require login, maybe we could also get around to adding a "show mature content" checkbox, and allowing some of the smuttier stuff. I ain't talking full-on porn or anything, but it's silly when something that's rated R-18 is taken down despite having less nudity/violence than some of the other stuff on here that just happens to get away with it.


From what I see here, what you are proposing is more harmful towards you, this site and 'consumers' rather than the guys you're trying to deter. Like what Uncomfortable Truth has said, rather than using strong countermeasures which hurt more than it helps, work on providing a better experience so that people will be drawn to it.

Providing a better experience will do absolutely nothing to prevent bots from putting huge loads on the site, which is the issue at hand.


Edited by Anonyan, 21 October 2015 - 04:21 AM.

ayy lmao


#143
Volandum

Volandum

    Potato Spud

  • Donator
  • 18 posts

I want to clarify something. I post this now and here not because I want to play police, ego or some stuff like that. If it was, I would've changed a long ago. I post this now, I change my objective as someone who maintains this site. I want to shave off a few million page hits a day. With great thanks to kenshin, our bandwidth costs don't increase that much with increased traffic. I still maintain my original image source nodes, so it's not a big shave off in cost, but it is completely manageable. What's biggest (always has been) cost is the HTML of this site, processing the pages that needs to be served. These run on farm of really beefy CPU servers with SSD and I'm currently looking to see if it's necessary to purchase another to handle the load. And one of my ad networks haven't paid me in 3 months. So I'm running on a red.

 

We have over 10,000 comics. And over 300,000 chapters. When 100 other crawlers think they need this content, some of it on a few minute basis! Numbers REALLY add up. Humans don't do this. Because there is follows. You don't need to visit 10,000 comics just to see if there's something new.

 

I was talking with a Pocket Loli admin and he pointed out my ignorance, so apologies if this is stupid. My question is, if these hundred-odd bots want to know all the comic/chapter/page pages which have changed, what about just providing a log of changed public pages and timestamps, so they don't need to check every page of every chapter, and you can know in advance the pages they want to check (down to hundreds from millions) and reinforce those? 



#144
Anomandaris

Anomandaris

    Potato Spud

  • Members
  • 24 posts

If the problem is only deep crawls, then what about making only old uploads member only? most true traffic will certainly be for recent uploads, and speed/availability are the average user's main concerns. Have all uploads for a window of a month or two, or even a year, open for anyone and make older stuff member only. This will let you track trawlers of the whole site but does nothing to hurt the average fan looking for their fix. something like crunchyroll is doing. delaying new uploads will only make people try to find it faster elsewhere.

Yeah, from that point of view Grumpy's original 'limited private' suggestion suddenly makes a lot more sense now that we know deep crawls are the problem.


Because I wasn't talking about it in the literal sense, but in the relative results.

The result of one is that you get pwned using something that was only supposed to be used against "the bad guys".
The result of the other is that you probably choose to go to a different site while Grumpy continues his arms race against scrapers.
They're really not the same, even in the abstract sense, except in that maybe, sort of, kind of, if you squint really hard... we could say that "backdoors will backfire" and "required logins might backfire".
Where we have abstracted so far away that we ignore how "backfire" has completely different implications in the two sentences. Much more useful just to say "required logins might backfire" than try to draw bizarre analogies.
 

Also, you missed the part that the criminals wouldn't use backdoored tools, thus avoiding the backdoors themselves. That was my point, not only the increased vulnerabilities for users, that is a different issue.

That is not part of the typical debate about mandatory backdoors because discussion is generally centred on hardware or protocol level backdoors (i.e. almost unavoidable if you're not a state-level operator) or server backdoors (if the attacker doesn't even have a server, this is a moot point).
I also don't see it having any real analogue in the discussion about requiring a login.

Let's just accept that you think there's a useful comparison to be made, I think there isn't, and both just move on from that.

 

Yes, unless that site happens to make, let's say, 500 or 1000 users with separate IPs for the purpose of crawling. They make it so that they emulate a human user reading (like, waiting a few seconds between pages to download them) and other features and there you go. You ban 1? 100 more come in.

The rate at which such bots could scrape the site would be much slower than it is now, where they just have a giant pool of IPs.
Not to mention, they'd have to pretend to sleep etc because a user that stays logged in reading 24/7 is blatantly a bot :P

Like I said, it's an arms race. They circumvent, Grumpy finds better ways to detect them. Ad infinitum.
 

Well, imagine all those users going to mangafox... now tell me if by multiplying their ad revenue by, idk, 50 or 100%, if they wouldn't have a good incentive to make better bots.

The thing is that if it's productive to enter into an arms race with crawlers. As things get bypassed, solutions are harder, more expensive and put bigger burdens on your legitimate users.

And it's harder for the botters, and more expensive for them too. And all the little fish stop trying, because it isn't worth it anymore, and maybe the big fish are now crawling so slowly that it's not obliterating Grumpy's budget.

*shrug*
That's what an arms race is. Whether to escalate to the next phase or not is up to Grumpy.

Also, remember that existing sites, like mangafox, already have deep linked most of the database. That means that even if you set this up, those sites have it easier to keep it crawling.

It does not follow. Grumpy's problem is continued deep linking costing him $$$ in bandwidth, not "ZOMG Mangafox has copied all our stuff already".
And by definition it cannot be easier to keep crawling if they actually have to change their crawler to do so...
 

New crawlers may have it harder to set up (not much, as you ban they create more users or they just work slower to set their site),
but existing ones will have it better to keep doing what they do (plus you may have taken competition away from them, including you).
Plus that you would send thousands of users their way.

 
Your points(?) about it being easier for existing crawlers are barely coherent, so I won't try to address them.
However, re. sending them users... sending them users matters in that it would incentivise more crawling. We have agreed on this already.
 

If I'm guessing it right, they already use multiple sources so that it's harder to find them.

Of course they do... Grumpy even said so. And if they weren't doing distributed crawls already this topic would never have existed.
 

But just because you got a problem it doesn't mean that any solution that you think of is a good one. Nor that if it's possible to find a good solution, or even a solution.

Who is claiming "any solution is better than nothing" here? You're arguing with thin air.

However, if this goes on and Grumpy ends up way in the red, cyal8r Batoto. Seems like it might be worth working on a solution.

If you think doing nothing is better than requiring logins, that's fine... but doing nothing is clearly no longer a sustainable option.

 

It hurts those sites more to take users away from them by making batoto better or easier for their users than making it harder.

Finally, something we can actually discuss. I'm interested in suggestions. What do the other sites have that Batoto doesn't? What do you think could attract enough of their users to reduce the viability of mass scraping of Batoto, or at least increase Batoto's income enough that Grumpy can pay the bills for running the site?

Edited by Anomandaris, 21 October 2015 - 04:39 AM.

Hell is other people.


#145
sneezemonkey

sneezemonkey

    Potato

  • Members
  • 122 posts

Hi I'm the guy responsible for this: http://vatoto.com/forums/topic/22199-manga-onlineviewer-fluid-mode/

 

Just like to point out that they can just set timers to delay requests by the bots to make them look less bot like. Considering that these bots can also use cookies, they can just use multiple ips/accounts and delegate a list of series to each so it wouldn't look as suspicious. Sure making some chapters members will slow the rip speed but it won't actually delay them by however long you want to delay them for as they "look" human. Now suppose you also add a Captcha, then they'd just pay for a human farm. I'm not sure that the goal of delaying the rippers by even a week is feasible let alone stopping them out right.

 

It'd be a lot more effective to put a massive Batoto watermark on the pages or include troll pages for the bots to rip and when they're done, revert to a clean version. At least then you'd get more publicity and hopefully get more people to use this site or scanlator sites instead. Obviously this is only for new releases.

 

On the flip side, the server load would be less intense as the bot aren't ripping as many pages at once, which is what Grumpy is trying to achieve.

 

Basically, if the goal is to lessen server load (which in my opinion is the only achievable goal), then the question becomes how this can be achieved most cost effectively. It may be good to explore ways other than a semiprivate model to limit bot rip speeds.

 

There may be an alternative method that can slow all bots and not just the deep crawl while maintaining the openness of the current model, though it may end up that the only feasible way is a combination of a semiprivate model and some other countermeasure as necessary. I'd just caution whether limiting any content for members only may actually hurt the site in the long run. It may seem easy to register but never underestimate the laziness of the average user.


Edited by sneezemonkey, 21 October 2015 - 05:06 AM.

Tired of halved double page spreads? Want to read manga like an actual tankoubon? Just want to load all pages in a chapter at once?

Try Manga OnlineViewer Fluid Mode+ Now!!!!


#146
kiigu

kiigu

    Potato Sprout

  • Members
  • 2 posts

I didn't read the whole page, so apologize if this suggestion already said before.

 

But I'm sure you could detect bot and ban him for continuing access? I mean, if you detected a bot loading your page every few miliseconds, then something is wrong, you can blacklist them and redirect them to a blank page. Of course, googlebot, or other crawler has identifier that you can whitelist.



#147
zuram

zuram

    Potato Spud

  • Members
  • 19 posts

True enough, grumpy could always employ a login system that allows people to use their Google, Twitter, Facebook, etc. credentials (which I think he already does) to basically one-click it. That shouldn't be a major issue, and the profiteering isn't that much of a major concern to myself. What I hate seeing is that people are abusing a free service and when said free service wants to take measures to protect itself, everyone freaks.

 

You can alternatively register using twitter.

 

Well, one of the reasons of why people don't register to sites is to avoid leaving a trail of what they do. So those yeah, are a "single click" but in exchange, you got a trail on you. I, for example, would never register to a site that requires (not that it has an option) doing so. I would use a fake one, though, but it's a pain.

 

 

People are abusing a free service, sure. But why should those who aren't abusing it take the consequences for those that are abusing it?

 

People don't freak because the site is trying to protect itself, but because they think it's a bad idea.

 

Just because it's free doesn't mean that all the ideas are good.

 

 

See this: it won't stop crawlers, it won't stop content appearing on other sites, it will deter people from using this site and make things harder. Traffic will go to those other sites that will make a bigger profit from using content that is on other sites.

 

As I've said, the sites you hate most (mangafox, mangahere and such) have already crawled the database and they only need to keep up to date with the daily releases, something they can do without triggering any alarms.

 

Even if you, as a registered user, don't care about requiring logging in; knowing that other sites that take from batoto might be making more money should piss you off.

 

 

Now imagine this: more of your readers using mangafox to read your releases (in a lower quality and that) instead of batoto because they don't want to register. Mangafox admins, of course, rubbing their hands in glee.

 

Sure, it's their loss. But their loss is making other people even richer.

 

With your hard work.



#148
elforte

elforte

    Potato Sprout

  • Donator
  • 2 posts

I'm curious -- what exactly would be broken by this change? I'm not too aware of what other developers may be using this site for.


Edited by elforte, 21 October 2015 - 04:52 AM.


#149
Anonyan

Anonyan

    Potato Spud

  • Members
  • 16 posts

[...]See this: it won't stop crawlers,[...]

 

[...]As I've said, the sites you hate most (mangafox, mangahere and such) have already crawled the database and they only need to keep up to date with the daily releases, something they can do without triggering any alarms.[...]

You can't just say "this won't help" and act like your word is law.

 

Yes, needing to log in to view pages is not the optimal scenario, but the alternative may someday be to just shut down the site. And forcing a login WILL do something to combat the problem. Stop it? Of course not. But it could save the site. You clearly don't understand what you're talking about if you think otherwise.


ayy lmao


#150
gigades

gigades

    Potato Sprout

  • Members
  • 5 posts
In my opinion, if you go private, you'll permanently lose a lot of readers within a few days so it's risky to even try. The cost of upgrading your hardware may be less than your lost revenue as well.
If you want to prevent the deep crawl eating up cpu usage, I think the better solution is to configure your server to cache some dynamic pages those consume a lot of cpu usage with the cache duration that won't make new content to delay too much.

#151
RealHounder2014

RealHounder2014

    Potato Spud

  • Members
  • 32 posts

Too bad there isn't a way to cause bots to be redirected to a fake download which traps them into downloading the same page over and over in an infinite loop. If you could perfect bot detection, this would be something particularly nasty with which to return fire.



#152
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

I think the better solution is to configure your server to cache some dynamic pages those consume a lot of cpu usage with the cache duration that won't make new content to delay too much.

I've been pretty much doing that for 5 years now. Doesn't help that number of aggregates in the world are on the rise and only so much can be cached.

 

Too bad there isn't a way to cause bots to be redirected to a fake download which traps them into downloading the same page over and over in an infinite loop. If you could perfect bot detection, this would be something particularly nasty with which to return fire.

We actually did that with some degree of success to quite a few of them. Like making the entire chapter's contents the batoto insert. As to ones who were bold enough to hotlink our images, they'd get just batoto logo all over their site. Making them take fake comics and fake infos. Unfortunately, they notice and are removed within a day. So, most people probably haven't seen it. It's funny, but I think funny is where that ends.



#153
zuram

zuram

    Potato Spud

  • Members
  • 19 posts

You can't just say "this won't help" and act like your word is law.

 

Yes, needing to log in to view pages is not the optimal scenario, but the alternative may someday be to just shut down the site. And forcing a login WILL do something to combat the problem. Stop it? Of course not. But it could save the site. You clearly don't understand what you're talking about if you think otherwise.

 

Aren't you acting as if your word was the law too?

 

How sure are you that it will do something? How sure are you that bots aren't going to be better in a few weeks or months after implementing this?

 

Will it really reduce bot crawling or it will make it more profitable? The more profitable it gets, the more incentives to find workarounds.

 

 

Also, the thing isn't "it could save the site" but "it will save the site". You think it can, I think it will be just a minimal hurdle that will be bypassed at some point.

 

And the worst part is that you won't even realize that it has been bypassed until you see that the numbers are in the red again.

 

 

See that once the bots get better, you're pretty much screwed up. You're in the same situation as before, but your users are worse, and you just sent them away to those other sites.

 

 

And I'm talking because it has happened before. Every time someone has tried to restrict something, or to forbid others from doing so, there are people that have found ways of avoiding it, making it useless. There are plenty of examples of that already. It's those who don't have such ability the ones that will be affected the most.

 

 

 

 

The rate at which such bots could scrape the site would be much slower than it is now, where they just have a giant pool of IPs.
Not to mention, they'd have to pretend to sleep etc because a user that stays logged in reading 24/7 is blatantly a bot :P

Like I said, it's an arms race. They circumvent, Grumpy finds better ways to detect them. Ad infinitum.
 
And it's harder for the botters, and more expensive for them too. And all the little fish stop trying, because it isn't worth it anymore, and maybe the big fish are now crawling so slowly that it's not obliterating Grumpy's budget.

*shrug*
That's what an arms race is. Whether to escalate to the next phase or not is up to Grumpy.

 

The thing is that in an arms race, it's usually more expensive to develop an armour that can resist a hit than develop the weapon to break that armour.

 

At some point, the measures taken are either so expensive that they can't be implemented without needing more $$$ than what they cost before, or they are so restrictive that well, it will even mess with their normal users.

 

 

 

It does not follow. Grumpy's problem is continued deep linking costing him $$$ in bandwidth, not "ZOMG Mangafox has copied all our stuff already".
And by definition it cannot be easier to keep crawling if they actually have to change their crawler to do so...

 

Your points(?) about it being easier for existing crawlers are barely coherent, so I won't try to address them.
However, re. sending them users... sending them users matters in that it would incentivise more crawling. We have agreed on this already.

 

I'm answering to multiple people here. This goes for the people that think that other sites will stop being able to rip batoto and using their work for profit.

 

 

 

Finally, something we can actually discuss. I'm interested in suggestions. What do the other sites have that Batoto doesn't? What do you think could attract enough of their users to reduce the viability of mass scraping of Batoto, or at least increase Batoto's income enough that Grumpy can pay the bills for running the site?

 

Well, for starters one of the disadvantages batoto has over other sites is that well, batoto is forced to follow DMCAs. That means less manga availability than those other sites, that have a higher availability of chapters than batoto.

 

That's one of the reasons you don't want your users going to other sites, because they already got an incentive for doing so. The problem with those other sites is that either they are full of ads, or they resize. But start fiddling with that and you may regret it.

 

 

On the other hand, another way of placing ads without bothering your users too much would be on the page itself.

 

That, of course, would be something that Grumpy would have to talk with:

 

- Scanlators

- Ad Networks

 

But the idea would be that a random position you insert the batoto page with an ad embedded on it, as a jpg/png or whatever.

 

That way, anyone who would download such file, either crawlers or users that download it, would be exposed to that advertisement, even if it's to delete it. Better than a blocked ad, lol.

 

People are already used to having scanlator pages on their mangas, so maybe an ad wouldn't be too much. And even if it would, well, they would only have to delete it outright, thus watching the ad nonetheless.

 

At least, it's like having a visit on that page.

 

 

 

Other options is see ways of reducing the load is well, if the crawlers and other people are going to deep crawl them, isn't it better to make it easier for them to do so?

 

Yeah, before you think I'm crazy, think about this. They are going to do it nonetheless, aren't they? Well, put the files available via other means (like a compressed file, not sure how much you would be able to compress, tho) that take a lower load on the server. Instead of them having to load each image to crawl them, have them download a single file.

 

This last measure is quite risky as it makes crawling easier, thus with the risk of increasing the traffic. It's just something I came up now, but that it would need a lot of hammering (or even discarding); but it goes on the line of reducing the bandwith costs that come from crawling.

 

 

That would also benefit the users, that would have the option of downloading those compressed files (that have an ad in them, btw) and thus, reducing the load they put on the server when looking for previous chapters of a manga. And it would be an incentive for them to use the site.

 

You could also put a limit on the download speed of that compressed file. The idea is that it's an alternative to crawling that those other sites (and users) might have. It's easier to use, but also easier to control the speed at what is downloaded and the load on your server.

 

 

For some, you could even set up a distributed network so that the users that are downloading it contribute with their bandwith to upload it to other users (like a torrent file would do).

 

 

 

The first proposal is more serious, the latter one is pure brainstorming and probably to be discarded.


Edited by zuram, 21 October 2015 - 05:28 AM.


#154
Shad

Shad

    Potato Sprout

  • Members
  • 7 posts

Not entirely sure about this move into becoming private. I certainly agree with how Batoto operates as to reason why I made an account in the first place but it could dissuade a lot of the guests you get from coming here. Honestly I guess you could technically do a test run with this for a certain period of time to see if it works or not and to see how much traffic comes through then but all I can say is that its a more of a delay tactic that is a gamble with a high risk and cost rather than an actual countermeasure that could pay off as you probably know. Honestly I think the problem lies more with people than with you going private. Pretty much if more people were convinced to either make accounts and stop visiting sites who are abusing than it could work but that is one hell of a long shot since those sites offer more manga technically and people aren't as ethically inclined anymore. It is certainly something but you should probably work on the idea more or come up with a better countermeasure. The things I think that Batato really got going for it right now is that Ads don't interfere as much and you guys aren't screwing people over.



#155
Anonyan

Anonyan

    Potato Spud

  • Members
  • 16 posts

words


d98eb4afef6874f96fcaea4d9f7d73da.jpg


ayy lmao


#156
sneezemonkey

sneezemonkey

    Potato

  • Members
  • 122 posts

I'd just like to ask if there is currently a limit to how many requests an Ip can send at any given period of time?


Tired of halved double page spreads? Want to read manga like an actual tankoubon? Just want to load all pages in a chapter at once?

Try Manga OnlineViewer Fluid Mode+ Now!!!!


#157
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

I'd just like to ask if there is currently a limit to how many requests an Ip can send at any given period of time?

Yes. There is. But as already mentioned. They use multiple IPs to stay under the restrictions.



#158
Pervy Araragi

Pervy Araragi

    Potato Spud

  • Members
  • 16 posts
  • LocationA place you don't expect

This is just a question, and nothing more:

Is it possible to make it so that downloading the pages from Batoto gives placeholder pages and not the real ones?

 

To be honest, I don't think it's possible, but if it is then I think that's the best way to go about it. Although it would be inconvenient to users who want to download the chapters.


graphic.png


#159
Sporkaganza

Sporkaganza

    Potato Sprout

  • Members
  • 3 posts

I'm not morally opposed to Batoto becoming private, even completely, but I'm not sure it's the best option for the site. This is just based on gut feeling, but I think even a figure of 90% is underestimating just how big the ratio of lurkers to readers is - it's probably more like 95-97%. I feel you'd lose a ton of money if you were to become 100% private and wouldn't be able to run the site anymore.

 

I agree that partially private is the best option, but I understand it may not be enough. That said, before you resort to making it completely private you should try out putting either older or newer chapters behind a membership wall and see if that alleviates the problem. If it does, great, and if it doesn't, then you can consider going fully private.



#160
Nozomi

Nozomi

    Potato Spud

  • Members
  • 12 posts
Pretty clear the main pragmatic concern expressed here by grumpy are rooted in server-related costs, as others have pointed out. Various posters have suggested possible solutions, many of which seem to already have been implemented.

Feedback not immediately related to server-related costs have primarily touched on concerns that using the privitizing option of creating accounts, etc., could "kill the site" in the sense that casual browsers in particular would go elsewhere where it is "easier to get their fix". Furthermore, the traffic these casual browsers generate in terms of ad revenue is also brought up as a potential loss for bato.to and a possible occassion that might hasten bato.to's "death".

I can only agree with those who have said that they will support whatever decisions are mode, and will continue to stay with the site as a member, but who have encouraged the decision to be carefully made with as many hard facts as possible to weigh and after careful deliberation.

In terms of suggestions...I liked the idea of continuing to develop bato.to (privitized or not) to make the "case" why this site should be used over other sites - ideally for both users and scanlation groups. To give an example already used, I echo others' opinions of really liking the "My Follows" option. I find it really useful and convenient and is one of the things I like best about this place. The clean look of the site and inobtrusive ads are also plusses. Areas like these and others may be a good place to continue focusing attention.
"It is a joyful thing indeed to hold intimate converse with a man after one’s own heart, chatting without reserve about things of interest or the fleeting topics of the world; but such, alas, are few and far between."

- Yoshida Kenko (1283-1350), [i]Tsurezure-Gusa (1340)