Jump to content

Primary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Secondary: Sky Slate Blackcurrant Watermelon Strawberry Orange Banana Apple Emerald Chocolate Marble
Pattern: Blank Waves Squares Notes Sharp Wood Rockface Leather Honey Vertical Triangles
Photo

Batoto becoming registered only?


  • This topic is locked This topic is locked
512 replies to this topic

#361
aviar

aviar

    Fingerling Potato

  • Members
  • 64 posts

You said that you didn't know what one was. If you ever decide that such a thing might be a solution to some problem, there it is.
 
 
It does. It works for them. They have a total of 51.813 Gbit/s bandwidth, of which they're using 3.891 Gbit/s right now.


I think kandaj means Java isn't as viable in terms of the end-user. Java takes a bit of overhead, though it's supposedly performant for an interpreted language.  Kandaj may have also been pointing out that Java may be seen less and less on end-users machines due to: performance penalties, security track record, and Googles decision to drop NPAPI. This of course doesn't mean Javas not viable (the slew of JVM based languages is proof to the contrary), but that it might be more and more relegated to enterprise level use, meaning that more and more end-users would be installing the Java JRE just to run the P2P application, something which might be seen as inconsiderate/unecessary. The biggest argument against a P2P solution though is, Grumpy doesn't want to do any unecessary work, which is a pretty important tenent in programming. His statements (Grumpy's) seem to indicate that there isn't a bandwidth issue at the moment, but that instead the current action against scraping is more of a preemptive action (and a line in the sand after years of abuse). Of course this all just my interpretation of things, and I may be way off base.


Edited by aviar, 23 October 2015 - 07:28 AM.


#362
x5c0d3

x5c0d3

    Potato Spud

  • Members
  • 17 posts

Few things.

  • We don't really care about SEO. Bare minimum of it is sufficient. It has never been a big thing for us. So, don't worry about that.
  • The easier the hurdle to pass, the less useful it will be, but more user friendly.
  • This change will break a lot of other sites and apps that rely on Batoto to function.
  • Opting for partial private may also be possible. Like newly uploaded are visible for few days without registration.
  • Objective: No more crawly crawly on this site.

 

Beeing a selfemployed webdeveloper I want to write down my opinion on your points.

  • SEO mostly just looks nice for humans. Spiders like from Google can read and separate dynamic links since long. It's just better to use SEO to avoid double content.
  • The problem with the hurdle is the biggest one. Do I want to harden my system like hell or do I want it stay user friendly? It's hard to find a middle way. It doesn't matter what you change. There will always be people who hate you for that and want the old system back. ;)
  • You should not care about which other sites will break because they just sit on your back. So if they break it should lower your load (hopefully).
  • If you leave newly uploaded chapters visible you will still have the crawlers on your site looking for new chapters 100 times a minute. But otherwise if you don't provide this you will lose all your guests. How about a small cookie that counts the daily read chapter and after about 4 or 5 chapters they see a nagscreen(layer) telling them to register. This could be bypassed easily by people who have a bit of knowledge but 99% of the net users don't even know what a cookie is. Let alone knowing about the developer console of a browser. ;)
  • To slow down crawlers you could set a maximum pageview per minute per IP. If some IP opens too many pages in one minute they just see a slow down please page or something else.

Last words: I love to read mangas here. So if I can help you with PHP/JS or server administration stuff feel free to contact me so I can give a bit back.



#363
Halo

Halo

    Potato

  • Donator
  • 171 posts
I'm done with unnecessary suggestions. 
 
After reading these lovely reddit threads, I'm rather glad to see the scumbags go. If Batoto destined to perish without them, so be it.

Edited by Halo, 23 October 2015 - 09:22 AM.


#364
aviar

aviar

    Fingerling Potato

  • Members
  • 64 posts

I didn't know there were reddit posts about this (I don't use reddit/tumblr/etc.). Find it funny though. I imagine all these people moving to aggregators, thus spawning aggregators that harvest more proactively and grind all manga reader sites to dust, making smaller scanlators unable to release anything in a centralized form. Once the reader sites are gone they start pulverizing scanlator sites, to the point where larger scanlators can't handle uploading their works to personal group sites due to the sheer bandwidth costs. And if suddenly scanlators decide to move to an aggregator that permits uploading manga, the other aggregators turn on it like sharks in a freeding frenzy, thus obliterating any means of content delivery, ensuring  that original content becomes uploading a set of poorly scanned jpgs with text translated in felt onto tumblr.

 

Eitherway, I mostly lament the fact that webscrapers are harming the ecosystem by driving the bandwidth costs up for sites. If they target smaller groups, and if batoto is any indicator, it wouldn't surprise me if maintaining a site became unfeasible.


Edited by aviar, 23 October 2015 - 10:39 AM.

I have come to warn you of the things beyond the wall and the men behind the machines.


#365
Skunkizz

Skunkizz

    Potato Sprout

  • Members
  • 2 posts

Hello, long time lurker here since whenever ToG was on it's 40ish chapter.

 

I can't say I like this being implemented but as I've grown fond of this site and always liked the way this site stands for the scanlators I feel like giving it a try however much it makes me cringe inside to register. But i would like to point out that I wouln't have registered if i hadnt seen the topic in the corner of my eye while F5 and the change would have been abrubt without any real info or warnings.

 

As this will probebly be my only post I want to offer my thanks as well as best wishes on too Grumpy for his work on this site as I have always appreciated it and after reading this massive wall of text that is this topic, I can only be saddened that you've been forced to this action caused by these crawlers but i can understan how its reasonable.

 

Then I'm off back to my lurking shadows and i must apolagy for any typos or other errors' as this ain't my comfort zone



#366
Kurono

Kurono

    Potato Spud

  • Members
  • 14 posts

I support this action against crawlers. I remember some years ago, when Batoto would load every image instantly, and I could speed read a series in a day. Right now at most hours, no matter what server I select (some are better than others), it takes up to 15 seconds to change pages and have the image loaded.

 

And no, it's not my bandwidth speed, I have a (relatively?) fast 50Mb download stream.



#367
mangapaagal

mangapaagal

    Potato Spud

  • Members
  • 34 posts
  • Locationdelhi
I don't much about crawlers but if bots are your problem and speed is priority then wouldn't the simplest solution would be to use captcha on intensive user

I believe you have thought and discussed about it, but as I saw no mention of captcha in your post(I am sorry but I am too lazt to read 19 pages) decided to mention it as i believe CAPTHA can serve as very good middle ground between registered user and bots

Edited by mangapaagal, 23 October 2015 - 02:58 PM.


#368
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

I don't much about crawlers but if bots are your problem and speed is priority then wouldn't the simplest solution would be to use captcha on intensive user

I believe you have thought and discussed about it, but as I saw no mention of captcha in your post(I am sorry but I am too lazt to read 19 pages) decided to mention it as i believe CAPTHA can serve as very good middle ground between registered user and bots

Yes, it is a potential method.



#369
mythee

mythee

    Potato Sprout

  • Members
  • 3 posts
How about a system where the site randomly changes some factor the bots rely on every time a chapter is loaded? Like what you need to do to navigate to another page or... Idk @-@;; just shooting in the dark, here.

#370
arimareiji

arimareiji

    Fingerling Potato

  • Donator
  • 61 posts

I still wonder: Is there any merit to my (modified) suggestion of engaging the aggregators' ISPs about the fact they're functionally attacking this site (or if the ISP doesn't care, range-blocking the ISP), and flat-out blocking all proxies? Many if not most ISPs would balk if asked to provide the culprit's details... but that's not necessary. I imagine any ISP that wants to avoid trouble would be willing to disconnect them for ToS violations if provided with addresses and times.

 

If there's some fatal flaw, i.e. "we have no reliable way to figure out which requests are from an aggregator bot and which are from an innocent reader", I'd genuinely want to hear it. (On the other hand: If the reason there's been no response is that the flaw is so stupidly-obvious I should have realized it myself, my apologies.)

 

To me it seems like if it's reasonable to block people who won't create an account that identifies them (and I do believe it is), it's also reasonable to block people who won't use an IP address that can be used to identify them.

 

~~~~~

 

I didn't know there were reddit posts about this (I don't use reddit/tumblr/etc.). Find it funny though. I imagine all these people moving to aggregators, thus spawning aggregators that harvest more proactively and grind all manga reader sites to dust, making smaller scanlators unable to release anything in a centralized form. Once the reader sites are gone they start pulverizing scanlator sites, to the point where larger scanlators can't handle uploading their works to personal group sites due to the sheer bandwidth costs. And if suddenly scanlators decide to move to an aggregator that permits uploading manga, the other aggregators turn on it like sharks in a freeding frenzy, thus obliterating any means of content delivery, ensuring  that original content becomes uploading a set of poorly scanned jpgs with text translated in felt onto tumblr.

 

Eitherway, I mostly lament the fact that webscrapers are harming the ecosystem by driving the bandwidth costs up for sites. If they target smaller groups, and if batoto is any indicator, it wouldn't surprise me if maintaining a site became unfeasible.

 

At least to my understanding, this does make sense. As long as they're in the game, for-profit aggregators are going to attack* the easiest source, whoever that may be. Right now Batoto is the easiest... but if Batoto shuts down or becomes inaccessible, scanlator sites that allow you to read online will be next. If Batoto can't cope with it, I doubt they'll have a snowball's chance in hell. Once they go down, the aggregators will move on to looking for downloads and/or preying on each other. Maybe it'll be more work, but they'll adapt if they want to stay in business. Does someone who has a good understanding of the subject have a good feel for whether aggregator bots will have the same devastating effect on forums that they do on readers?

 

* - Technically it may not fit the definition of an attack... but if it looks like a duck, walks like a duck, and quacks like a duck, it's probably not a dromedary.


Edited by arimareiji, 23 October 2015 - 03:59 PM.


#371
satou123

satou123

    Potato Sprout

  • Members
  • 2 posts

I rarely logged in, unless need to post comments....

 

Most of the times, I just use things like FMD or Domdomsoft to download from here, in case there's some update, as I can easily save the images to read it later, as internet isn't something that reachable 24/7 here, but it's kinda hard to right click and save image one by one, lol.

 

Anyway, thanks for giving such a service for this several years. Probably will still drop by regularly, though.



#372
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

I still wonder: Is there any merit to my (modified) suggestion of engaging the aggregators' ISPs about the fact they're functionally attacking this site (or if the ISP doesn't care, range-blocking the ISP), and flat-out blocking all proxies? Many if not most ISPs would balk if asked to provide the culprit's details... but that's not necessary. I imagine any ISP that wants to avoid trouble would be willing to disconnect them for ToS violations if provided with addresses and times.

 

If there's some fatal flaw, i.e. "we have no reliable way to figure out which requests are from an aggregator bot and which are from an innocent reader", I'd genuinely want to hear it. (On the other hand: If the reason there's been no response is that the flaw is so stupidly-obvious I should have realized it myself, my apologies.)

I have tried to contact hosts/isps about this before. But the effort factor is just too high. Some hosts were furious and some couldn't care less. Given the amount of time it takes to provide proof of abuse and all the talk, it's not really worth it.

 

Proxies don't exactly say: "I'm a proxy" so it's hard to know what is and what is not. There are also legitimate proxies that people can't avoid using either because they're at work or some weird ISP thing.

 

There is no reliable way to figure out if a person is a bot or a human. It's a growing problem in the Internet as a whole, not just us. Some commercial solutions to blocking out bots claim as much as 20-30% of normal web hits are by bots. Some bots are very much like human and some humans are very much like a bot.

 

I rarely logged in, unless need to post comments....

 

Most of the times, I just use things like FMD or Domdomsoft to download from here, in case there's some update, as I can easily save the images to read it later, as internet isn't something that reachable 24/7 here, but it's kinda hard to right click and save image one by one, lol.

 

Anyway, thanks for giving such a service for this several years. Probably will still drop by regularly, though.

The reason Batoto did not choose to support download back when it started was because it gave an incentive for users to visit the scanlator sites. Since they almost always have a download link. After these years, I don't know how effective that has been, but that was the reason anyway.



#373
satou123

satou123

    Potato Sprout

  • Members
  • 2 posts

Some bots are very much like human and some humans are very much like a bot.

 

lol

 


The reason Batoto did not choose to support download back when it started was because it gave an incentive for users to visit the scanlator sites. Since they almost always have a download link. After these years, I don't know how effective that has been, but that was the reason anyway.

 

Well, most of the times, it's because scanlation need some feedback (read:attention -- used to be on the similar business), either the reader give some comments or even chat on the irc. Some of scans, like to use n+x days before submit it on this site, though.

 

Last post, it's time to turn back into lurker.


Edited by satou123, 23 October 2015 - 05:20 PM.


#374
Grumpy

Grumpy

    RawR

  • Administrators
  • 4,078 posts
  • LocationHere of course!

Btw, not using captcha?

Not yet... at least. And even if I did use captcha, I would make it so that it's not seen by humans... if I can. Captchas are annoying, I think it should be one of the last resorts.

 

In any case, less focused on security and more on actually able to run. I also can't realistically throw everything I have at once for time reasons and being human.



#375
Natureboy

Natureboy

    Baked Potato

  • Donator
  • 1,162 posts
  • Locationdeep in the forest

That >50% of traffic from bots number I mentioned earlier in the thread was from a commercial provider of anti-bot software. (A real study classifying web hits actually claimed 56% of traffic to a typical/medium web site was from bots, which looks like too many significant figures to me. : )  Because the vendors have a marketing incentive highlight bot versus human traffic and their customers are sites that already have had problems with bots and/or DDOS attacks, we should probably take their estimates with a large sack of salt.

 

Looking over those reddit threads on potential Batoto changes, I wanted to clarify that Batoto is not for profit. The site is just trying to cover costs for the non-volunteer services required. Some people seemed to be assuming otherwise. But to comment I'd have to make a Reddit account and ... I can't be bothered.  ;)


Edited by Natureboy, 23 October 2015 - 05:32 PM.


#376
Bloodwork

Bloodwork

    Potato Sprout

  • Members
  • 7 posts
  • LocationEU

If what's coming next is killing all Android readers, is there any plans to make batoto more mobile friendly? Something like a fullscreen mode or something?



#377
arimareiji

arimareiji

    Fingerling Potato

  • Donator
  • 61 posts

I have tried to contact hosts/isps about this before. But the effort factor is just too high. Some hosts were furious and some couldn't care less. Given the amount of time it takes to provide proof of abuse and all the talk, it's not really worth it.

 

Proxies don't exactly say: "I'm a proxy" so it's hard to know what is and what is not. There are also legitimate proxies that people can't avoid using either because they're at work or some weird ISP thing.

 

There is no reliable way to figure out if a person is a bot or a human. It's a growing problem in the Internet as a whole, not just us. Some commercial solutions to blocking out bots claim as much as 20-30% of normal web hits are by bots. Some bots are very much like human and some humans are very much like a bot.

 

Thank you for the explanation... I guess it's too good to be true that there would be some way to take them out of the game instead of playing whack-a-mole (with the moles constantly getting better at ducking out of sight before you can whack them).

 

Maybe some day we can all chip in for plane tickets so you can take the Jay and Silent Bob Strike Back approach (but to bot-wranglers instead of Internet critics). (^_~)

(nfsw language)



#378
White Cloud Pavilion

White Cloud Pavilion

    Potato Spud

  • Contributor
  • 15 posts

Our group uses Bato.to because of how open it is to manga readers with few restrictions involved, this announcement has only just come to my attention and I must say I am quite worried. We're a group that want's to get our chapters out to every single mangasite possible (Mangafox, Mangahere, Readmanga.Today, Mangadoom, Kissmanga, Mangatown, etc) for maximum exposure to the series itself.

 

We use Bato.to as our first uploading choice because of how friendly it is to anon visitors, which recieve unlimited access to all the chapters with full convienant access. We also use Bato.to as our very first uploading choice because of the power we have to manage/upload which is very useful for last minute changes, and how it spreads the chapters to other manga sites within an hour as we want maximum exposure for the series.

 

We uploaded our chapters and noticed only the latest 3 chapters were avaliable when logged off, and other manga sites have yet to recieve our upload (likely due to the changes made to the reader) which requires us to upload to another site to get things to spread around for our fans/readers. We're of course against content being locked off to members only as a group like ours that wishes to spread the series to all readers as soon as possible.

 

Loyal fans of Bato.to who read here would likely enjoy the idea of such restrictions, but I would like to point out that it would alienate many scanlation groups who loyally upload here such as our group. We're not in support of these content restrictions that will limit our fans access to previous chapters, partially forcing them to sign up with your service to read. Hopefully many other groups share the interest of spreading the hard work that goes into a chapter to all manga websites, and to all fans who wish to read their favorite series on their favorite manga reading website.

 

We love Bato.to and love the openness and flexibility that it gives to scanlators to bring fans their favorite series to them as fast as possible. As a contributor and as someone who posts links directly from social media to Bato.to with continued outside promotion. A warning as these future continued changes could potentially push our group to seek out an alternative, which we truly do not want to do because we love it here. We want our readers that decide to click on our social links which direct traffic to Bato.to to have full access to both new and old chapters avaliable which is no longer the case.

 

Bato.to is the only service that provides a superior experience to scanlator contributors compared to the many other manga websites, which is why Bato.to is our first priority uploading choice. It would be very sad if Bato.to went backwards instead of forwards and began locking, and restricting access to anon readers who want to gain access to chapters immediately. All chapters must be open to all of our readers on the front page of a series which allows ease of access to the chapter.

 

- Higasho Retired Editor of AS [Qcer/Technician/Uploader]

Archangel Scanlations


Edited by archangelscans, 23 October 2015 - 09:36 PM.


#379
x5c0d3

x5c0d3

    Potato Spud

  • Members
  • 17 posts

Not yet... at least. And even if I did use captcha, I would make it so that it's not seen by humans... if I can. Captchas are annoying, I think it should be one of the last resorts.

 

And CAPTCHAs need to be created/checked and this also brings higher load. ;) I guess you are thinking about something like a CSRF token.

 

How about a system where the site randomly changes some factor the bots rely on every time a chapter is loaded? Like what you need to do to navigate to another page or... Idk @-@;; just shooting in the dark, here.

 

These factors have to have something in common in every version. Like they are the link to the next page. And if they have something in common there is a regex to parse and grab them.

 

How about setting up some honeypots. For example invisible placed chapter links that a human does not see. Crawlers don't check the visibility and get catched when they grab these pages. You could whitelist known crawler ips or just set a short block for a minute or two where they only get redirected via 302 to a "You are blocked" page. ;) With mod_rewrite it would be possible to randomize the links to the honeypots.


Edited by x5c0d3, 23 October 2015 - 10:04 PM.


#380
Lata

Lata

    Potato Sprout

  • Members
  • 4 posts

This isn't a strong hindrance in stopping people from ripping stuff from sites. There are tons of applications for ripping registered sites. Simple scripting and an account will be business as usual for rippers. In a days it will be business as normal.

 

Many of these uploads are not done by the scanalators but by fans.

 

Personally I am no fan of registration for viewing. I have a lot of bato bookmarks but I also have other sites. I will most likely consider other sites that I already use for new mangas. Still let me say thank you for your site and consideration. You cannot please everyone and you have to do what you think is best.