Jump to content

Primary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Secondary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Pattern: Blank Waves Squares Notes Sharp Wood Rockface Leather Honey Vertical Triangles
Photo

Batoto becoming registered only?


  • This topic is locked This topic is locked
512 replies to this topic

#121
ZdrytchX

ZdrytchX

    Potato Spud

  • Members
  • 40 posts

In my opinions this will reduce the views on this site but it would most certainly prevent some people from copy-pasting to other sites.

If I recall correctly, I think FoolRulZ had some sort of GIF protector where you had to wait for it to disappear (two frames) before being able to view the image, which kind of prevents auto-copy bots I guess.



#122
draconins

draconins

    Potato Sprout

  • Donator
  • 5 posts

How difficult is it to implement ability to just using checkbox to enable/disable anonymous viewing/browsing, shall the scanlator choose to??


19.jpg


#123
Anomandaris

Anomandaris

    Potato Spud

  • Members
  • 24 posts

This is like the debate with backdooring encryption.

How so? The backdooring encryption debate can be summed up as "The government wants backdoors, but even if we could trust them not to abuse the privilege, the backdoors would eventually be found and abused for great harm by others"

That doesn't sound anything like "Require user authentication to reduce bot traffic".
 

- Crawlers find workarounds, sooner or later they will.

They'll only do that if the payoff outweighs the cost (can be any of time, money, inconvenience, technical knowledge, etc) of doing so.

Grumpy is looking for ways to make the payoff not worth the cost. Requiring logins would give him more ways to raise the cost (much easier to identify and ban bots if they have to log in, lots more options for how to do it too), although obviously it runs the risk of increasing the payoff for bot writers too (by increasing their traffic and thus their ad revenue).

Bear in mind the goal is not to completely prevent scraping, just to greatly reduce it.

Edited by Anomandaris, 21 October 2015 - 02:48 AM.

Hell is other people.


#124
zuram

zuram

    Potato Spud

  • Members
  • 19 posts

No, not that, but the part where "criminals will find workarounds around backdoors by using other tools that are even safer, while normal users will have their stuff backdoored".

 

What I meant is that your action is unlikely to have any effect on your intended targets, because those happen to be the ones that have the means to bypass it.

 

But in exchange, you just put another hurdle on your users. It's a minor one, but little by little, your users (even if they are ghosts) will get bothered by it.

 

 

And then, they will go to the crawlers' sites increasing their ad revenue, giving them more incentives to make their bots better.



#125
roro-son

roro-son

    Potato Sprout

  • Members
  • 7 posts
  • Locationcanada

I think making batoto members only is a good idea. Making an account is so easy! If one truly wants to read scanlated manga on this site, then the least one can do is simply register....right? However I would like to express my discontent towards an invite only system (if that was ever an option on the table for you).



#126
Harshrox3

Harshrox3

    Potato Spud

  • Contributor
  • 20 posts
  • LocationIndia

if we are going the hard way why not troll new chs after some pages (since nobod wants an incomplete ch) .......like for a 40 pages ch after 10 pages(1/4 of whole release) it shows

 

For more login to batoto or  ONLY MEMBERS CAN ACESS FROM HERE

 

that way those agrregator using bots will have to either follow rules or upload half baked chs and those with no bots just gets bang we need to log in for viewing in great quality  or wait for some day

 

Since everybody is talking about 1-2 weeks delay I am shipping on it though I think delay or not must be consented by uploaders while uploading a ch


Edited by Harshrox3, 21 October 2015 - 03:06 AM.

The opinions expressed by this user are solely their own and do not express the views of Batoto and its staff.

#127
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

Well, this is certainly the hottest announcement topic.

 

I want to clarify something. I post this now and here not because I want to play police, ego or some stuff like that. If it was, I would've changed a long ago. I post this now, I change my objective as someone who maintains this site. I want to shave off a few million page hits a day. With great thanks to kenshin, our bandwidth costs don't increase that much with increased traffic. I still maintain my original image source nodes, so it's not a big shave off in cost, but it is completely manageable. What's biggest (always has been) cost is the HTML of this site, processing the pages that needs to be served. These run on farm of really beefy CPU servers with SSD and I'm currently looking to see if it's necessary to purchase another to handle the load. And one of my ad networks haven't paid me in 3 months. So I'm running on a red.

 

We have over 10,000 comics. And over 300,000 chapters. When 100 other crawlers think they need this content, some of it on a few minute basis! Numbers REALLY add up. Humans don't do this. Because there is follows. You don't need to visit 10,000 comics just to see if there's something new.

 

There are number of anti-crawl features on this site already. All of which I tried to make that doesn't hinder normal users at all. It has caught a few real people using download scripts too. But it's insufficient; it's too lightweight. Pretty much since year 1 of Batoto, other crawlers have been using IP distributed crawls. Without further tracking tools for me, they're just not possible to track.

 

Hitting some of the new chapters are less of a concern. It's the deep crawl that concerns me.


Grumpy is looking for ways to make the payoff not worth the cost. Requiring logins would give him more ways to raise the cost (much easier to identify and ban bots if they have to log in, lots more options for how to do it too), although obviously it runs the risk of increasing the payoff for bot writers too (by increasing their traffic and thus their ad revenue).

That's precisely it.



#128
Harshrox3

Harshrox3

    Potato Spud

  • Contributor
  • 20 posts
  • LocationIndia

I get most of what grumpy is saying

 

but can anyone explain what is deep crawl............(.although I do have some idea)


The opinions expressed by this user are solely their own and do not express the views of Batoto and its staff.

#129
Anomandaris

Anomandaris

    Potato Spud

  • Members
  • 24 posts

No, not that, but the part where "criminals will find workarounds around backdoors by
using other tools that are even safer, while normal users will have their stuff backdoored".

It does not follow. The two situations are still clearly not 'like' each other.
In the case of backdoors, it gives the criminals a method to get (harmful) access than they would not have had otherwise.
In the case of requiring a login, it's harder to write bots, and harder for the bots to avoid detection, and they will get banned more often.

In the case of backdoors, users are victims of the attackers.
In the case of required login, users suffer minor inconvenience or read elsewhere, while the scrapers suffer significant inconvenience with the possibility of increasing their monetary gains.

Anwyay, clearly I don't think comparison to mandatory encryption backdoors is a good way to discuss this topic.
 

What I meant is that your action is unlikely to have any effect on your intended targets,because those happen to be the ones that have the means to bypass it.

I disagree. Login itself can be handled fairly easily, but it makes bots much easier to detect and ban.
 

But in exchange, you just put another hurdle on your users. It's a minor one, but little by little, your users (even if they are ghosts) will get bothered by it.

Somewhat agree. 
 

And then, they will go to the crawlers' sites increasing their ad revenue, giving them more incentives to make their bots better.

I expect this would happen, yeah.

Basically what all this boils down to is whether the increased ad revenue (driving them to make better bots) outweighs the increased difficulty of writing bots or not, and if not, does that benefit outweigh the inconvenience to users caused by having to log in.


I'm not against required log in, but I'm not entirely for it either, because I think there are alternative strategies that should be looked into first.
E.g.
  • group options, so some chapters from some groups may require login while other may choose not to.
  • better bot detection and blocking without requiring log in (rate limiting, anti-bot cookies, etc, etc) on top of whatever Grumpy already does
  • cooldown period (maybe chosen by the group) for chapters, where they require login before X days have passed

Edited by Anomandaris, 21 October 2015 - 03:23 AM.

Hell is other people.


#130
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

I get most of what grumpy is saying

 

but can anyone explain what is deep crawl............(.although I do have some idea)

crawling all of it. not just portions of it.



#131
VigorousJammer

VigorousJammer

    Potato Spud

  • Contributor
  • 17 posts
  • LocationRonkonkoma, NY

I like the idea of anti-crawler stuff if it will help the site overall, but I still hope there's a way for me to download the comics, as I vastly prefer reading offline, in ComicRack, using a two-page layout...
I currently do this downloading using an external app, which I'm guessing will no longer be usable once this change takes effect.

Perhaps you could offer a batch download option on the site itself which would, of course, also be private / for members only? Or is that somewhat complicated, legally?


Edited by VigorousJammer, 21 October 2015 - 03:24 AM.

::End of Transmission::


#132
Anomandaris

Anomandaris

    Potato Spud

  • Members
  • 24 posts

There are number of anti-crawl features on this site already.
All of which I tried to make that doesn't hinder normal users at all.
It has caught a few real people using download scripts too. But it's insufficient; it's too lightweight.

Amusingly, there's a bunch of totally naive crawlers on GitHub that target Batoto and are essentially just glorified download scripts.
 

Pretty much since year 1 of Batoto, other crawlers have been using IP distributed crawls.
Without further tracking tools for me, they're just not possible to track.

Unfortunately IP pooling/distribution tools for crawlers are now ridiculously easy to use.
Hell, you can even write a non-distributed crawler and just point it at a distribution service that will do it for you.
I'm sure you knew that already, though.
It would be much more costly for scrapers if they had to run a headerless JS server to scrape effectively, but unfortunately like you said in your first post, one of the good things about Batoto is that it doesn't need any of that. Which I really appreciate, fwiw, but it sucks that it makes it easier for botters.

 

I get most of what grumpy is saying

but can anyone explain what is deep crawl............(.although I do have some idea)

In this context, he means that bots crawling large chunks of the entire site are the problem, as opposed bots that just grab the latest chapters. This is because crawling the whole damn site repeatedly costs Grumpy a ton of actual money for the bandwidth to serve the bot's requests.

Edited by Anomandaris, 21 October 2015 - 03:34 AM.

Hell is other people.


#133
arimareiji

arimareiji

    Fingerling Potato

  • Donator
  • 61 posts

Well, this is certainly the hottest announcement topic.

 

I want to clarify something. I post this now and here not because I want to play police, ego or some stuff like that. If it was, I would've changed a long ago. I post this now, I change my objective as someone who maintains this site. I want to shave off a few million page hits a day. With great thanks to kenshin, our bandwidth costs don't increase that much with increased traffic. I still maintain my original image source nodes, so it's not a big shave off in cost, but it is completely manageable. What's biggest (always has been) cost is the HTML of this site, processing the pages that needs to be served. These run on farm of really beefy CPU servers with SSD and I'm currently looking to see if it's necessary to purchase another to handle the load. And one of my ad networks haven't paid me in 3 months. So I'm running on a red.

 

We have over 10,000 comics. And over 300,000 chapters. When 100 other crawlers think they need this content, some of it on a few minute basis! Numbers REALLY add up. Humans don't do this. Because there is follows. You don't need to visit 10,000 comics just to see if there's something new.

 

There are number of anti-crawl features on this site already. All of which I tried to make that doesn't hinder normal users at all. It has caught a few real people using download scripts too. But it's insufficient; it's too lightweight. Pretty much since year 1 of Batoto, other crawlers have been using IP distributed crawls. Without further tracking tools for me, they're just not possible to track.

 

Hitting some of the new chapters are less of a concern. It's the deep crawl that concerns me.


That's precisely it.

 

My apologies if this is too simplistic, but it almost sounds like a layman's understanding of a DDOS attack. If a crawler is making thousands of requests an hour, is it possible (and would it do any good) to simply shut down any source that makes more than X requests per minute?

 

(To clarify: If an IP range belonging to small provider WeDontCare.com is generating 85 requests per minute, shut down all of WeDontCare.com's IP range and tell them why if you can actually get ahold of them. If an IP range belonging to large provider BigCorp.com is generating 210 requests per minute 24 hours a day, it might be worth asking BigCorp.com to investigate - they may not care about having their users locked out of Batoto, but if they're aware of their network being used to attack another site and do nothing, that puts them in a bad spot.)


Edited by arimareiji, 21 October 2015 - 03:43 AM.


#134
Anomandaris

Anomandaris

    Potato Spud

  • Members
  • 24 posts

My apologies if this is too simplistic, but it almost sounds like a layman's understanding of a DDOS attack. If a crawler is making thousands of requests an hour, is it possible (and would it do any good) to simply shut down any source that makes more than X requests per minute?


I would be very surprised if Grumpy is not already rate-limiting in some shape or form. The thing is, the source keeps switching because of the way the crawlers are written. They use a big pool of IP addresses to spread the load.

Also, FYI, the only real way to beat a DDOS is to have enough money to outlast the other guy. The deep crawling that hits Batoto is kind of like a low intensity very long duration DDOS, I guess, except that it's not trying to shut down the site, even if burning Grumpy's money essentially has the same effect.

Edited by Anomandaris, 21 October 2015 - 03:41 AM.

Hell is other people.


#135
Cake-kun

Cake-kun

    Potato

  • Contributor
  • 160 posts
  • LocationNot on your plate, hopefully

As a group of a bandits who operated in shadier side of the certain imageboard, I am actually absolutely okay with Batoto going private.

We were doing releases for our fun. I don't want someone else to control where we have that fun. When someone takes the stuff and do that, I get very upset. I been around, I did enough; I had people steal translations from me. I want my group to enjoy what they're doing. I am not, and have never been, okay with people taking my group's work.

Only site I authorize is Batoto. Ever. It will remain that way forever. And for that, I support privatization of some of the chapter; In fact, I want my groups' work to be few that will be strictly members' only, which will hinder crawlers and people who steal other people's hard work and try to make money from it. For 10+ years I been in scanlation scene; money was never throughly concern with me, and I could've shoved up ads on our sites too. We didn't. We don't need it. And I won't start doing it any time soon, unless 10$ a month becomes too hard for me eventually, and that won't be for a long time.

 

tl;dr, it's sad it has to go that way, but we are really not losing anything in the first place; Lot of the readers that, quiet literally, stole from Batoto, also have login system anyways to promote their activity, so it makes absolutely no difference at this point. For me, some people's fun < my group's fun. We can just stop scanlating, and that's the end. I don't want us to end service that way.


Edited by Cake-kun, 21 October 2015 - 03:48 AM.


#136
sunadajae

sunadajae

    Potato Spud

  • Members
  • 37 posts

If the problem is only deep crawls, then what about making only old uploads member only?  most true traffic will certainly be for recent uploads, and speed/availability are the average user's main concerns.  Have all new uploads open for anyone during a window of a month or two, or even a year, and make older stuff member only.  This will let you track crawlers of the whole site but does nothing to hurt the average fan looking for their fix (or daily new upload crawlers that aren't your main concern). something like crunchyroll is doing seems appropriate; they get it first and the latest chapter is registration free, so fans are happy to get it there(minus the javascript reader), and in no time things like quick scans of Attack on Titan became obsolete.  delaying new uploads will only make people try to find it faster elsewhere, and member only will kill your traffic entirely.

 

Also, I strongly second adding an optional 'read this first on batoto' cover page(second page to avoid credit strippers) that uploaders and scanlators can stick into their upload with the click of a button to keep spreading the word to new readers.  That was a powerful tool in getting theCompany's fanbase redirected here.


Edited by sunadajae, 21 October 2015 - 04:20 AM.


#137
zuram

zuram

    Potato Spud

  • Members
  • 19 posts

Vigorous, legally it shouldn't be an issue, at least, not more than it is.

 

Viewing or downloading it's the same regarding the law.

 

 

But it would be a good incentive for registering and it might even be lighter on HTML requests too, as a user would have to only click a single link instead of having to load 20 pages.

 

You'd hurt those crawlers more by taking the users away from them than by hindering your potential users.

 

 

And Anom:

Anwyay, clearly I don't think comparison to mandatory encryption backdoors is a good way to discuss this topic.

 

Because I wasn't talking about it in the literal sense, but in the relative results.

 

Also, you missed the part that the criminals wouldn't use backdoored tools, thus avoiding the backdoors themselves. That was my point, not only the increased vulnerabilities for users, that is a different issue.

 

I disagree. Login itself can be handled fairly easily, but it makes bots much easier to detect and ban.

 

Yes, unless that site happens to make, let's say, 500 or 1000 users with separate IPs for the purpose of crawling. They make it so that they emulate a human user reading (like, waiting a few seconds between pages to download them) and other features and there you go. You ban 1? 100 more come in.

 

On the short term, it will be a trial and error, but in the long term, they end up finding workarounds.

 

Basically what all this boils down to is whether the increased ad revenue (driving them to make better bots) outweighs the increased difficulty of writing bots or not, and if not, does that benefit outweigh the inconvenience to users caused by having to log in

 

Well, imagine all those users going to mangafox... now tell me if by multiplying their ad revenue by, idk, 50 or 100%, if they wouldn't have a good incentive to make better bots.

 

 

 

 

The thing is that if it's productive to enter into an arms race with crawlers. As things get bypassed, solutions are harder, more expensive and put bigger burdens on your legitimate users.

 

Also, remember that existing sites, like mangafox, already have deep linked most of the database. That means that even if you set this up, those sites have it easier to keep it crawling.

 

New crawlers may have it harder to set up (not much, as you ban they create more users or they just work slower to set their site), but existing ones will have it better to keep doing what they do (plus you may have taken competition away from them, including you). Plus that you would send thousands of users their way.

 

 

My apologies if this is too simplistic, but it almost sounds like a layman's understanding of a DDOS attack. If a crawler is making thousands of requests an hour, is it possible (and would it do any good) to simply shut down any source that makes more than X requests per minute?

 

If I'm guessing it right, they already use multiple sources so that it's harder to find them.

 

 

 

Grumpy is looking for ways to make the payoff not worth the cost. Requiring logins would give him more ways to raise the cost (much easier to identify and ban bots if they have to log in, lots more options for how to do it too), although obviously it runs the risk of increasing the payoff for bot writers too (by increasing their traffic and thus their ad revenue).

 

I know what he's trying to do and why.

 

But just because you got a problem it doesn't mean that any solution that you think of is a good one. Nor that if it's possible to find a good solution, or even a solution.

 

It hurts those sites more to take users away from them by making batoto better or easier for their users than making it harder.

 

It's better not to stir up a pit full of snakes, even if you need what's behind it.



#138
mike_art03a

mike_art03a

    Potato Sprout

  • Contributor
  • 5 posts
  • LocationCanada

I'll chime here as both a server admin and scanner.

 

I can understand where Grumpy is going on this. Bots/Leech scripts are causing more load on servers than what ordinary people would, especially if sites are directly linking to the image files themselves. I should know this because a few were pulling files directly (linking to them directly that is, instead of hosting the files themselves) from our reader at PocketLoli until I started banning multiple TCP sessions from the same IP as well as blacklisting domains that were hitting up the same files in excess of 5 times/minute. What a lot people don't realize is that the constant linking causes extra work for the server. Apache's gotta serve multiple requests, Database back-ends are being hit, and the server's hardware's getting chewed up for nothing, and we (the admins) are only seeing x number of users loading up the whole page vs. image files being access at astonishing rates. And that, sadly will cost money. Servers aren't free and increased load on them only serves to wear them out faster (SSDs and HDDs have a finite life span, CPUs running flat-out all the time will burn out quicker, etc.) and it costs money to replace parts and machines. Then there's the downtime that goes with deploying a new machine and migrating data to it... especially if you use a custom setup.

 

So, if it puts people off that they have to spend 2 minutes to register and will move somewhere else, then it's their loss. As far as I'm concerned, bring it on. I'm tired of seeing our hard work being lifted and dumped elsewhere for a profit. Also, a smaller and more engaged audience means better interaction with the community and it could also serve to lower Batoto's operating costs as they reduce the amount of servers needed to serve a smaller audience.


Michael Artelle
Webmaster, Server Admin @ Doremi Fansubs

Former Group Leader for (now defunct) Pocket Loli Scans AND...

Just another server junky willing to offer hosting for small manga groups! Ask about it!


#139
zuram

zuram

    Potato Spud

  • Members
  • 19 posts
So, if it puts people off that they have to spend 2 minutes to register and will move somewhere else, then it's their loss. As far as I'm concerned, bring it on. I'm tired of seeing our hard work being lifted and dumped elsewhere for a profit

 

The problem is that, and that goes for all scanlators complaining about their job being used for profit:

 

- Your hard work will keep being lifted and dumped elsewhere for a profit. Sites will keep crawling, that is unavoidable, they will just have to be better at it; but they will.

- As those users that don't want to spend 2 minutes to register go to those sites, they will make a bigger profit than before.

 

It may reduce Grumpy's operation costs, but it's not sure on that. As bots get better, the costs will go up again as people will keep crawling. In fact, unless it's an automated system without supervision, it may increase Grumpy's staff costs banning all that people.

 

Direct linking, though, will be reduced. But that should be easier to prevent without needing to bother your users.


Edited by zuram, 21 October 2015 - 04:12 AM.


#140
mike_art03a

mike_art03a

    Potato Sprout

  • Contributor
  • 5 posts
  • LocationCanada

The problem is that, and that goes for all scanlators complaining about their job being used for profit:

 

- Your hard work will keep being lifted and dumped elsewhere for a profit. Sites will keep crawling, that is unavoidable, they will just have to be better at it; but they will.

- As those users that don't want to spend 2 minutes to register go to those sites, they will make a bigger profit than before.

 

It may reduce Grumpy's operation costs, but it's not sure on that. As bots get better, the costs will go up again as people will keep crawling. In fact, unless it's an automated system without supervision, it may increase Grumpy's staff costs banning all that people.

 

Direct linking, though, will be reduced. But that should be easier to prevent without needing to bother your users.

True enough, grumpy could always employ a login system that allows people to use their Google, Twitter, Facebook, etc. credentials (which I think he already does) to basically one-click it. That shouldn't be a major issue, and the profiteering isn't that much of a major concern to myself. What I hate seeing is that people are abusing a free service and when said free service wants to take measures to protect itself, everyone freaks.


Michael Artelle
Webmaster, Server Admin @ Doremi Fansubs

Former Group Leader for (now defunct) Pocket Loli Scans AND...

Just another server junky willing to offer hosting for small manga groups! Ask about it!