Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Page 1 of 2 12 LastLast
Results 1 to 15 of 18

This is a discussion on Bandwidth theft alert possible? in the Shared & Semi-Dedicated forum
Last summer I had sites sucking up my bandwith for a couple of weeks. I blocked them and everything went back to normal. Just by ...

  1. #1
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104

    Bandwidth theft alert possible?

    Last summer I had sites sucking up my bandwith for a couple of weeks. I blocked them and everything went back to normal.

    Just by sheer luck I was on my CPanel today and noticed I'd already used 6 Gig this month. If I were that popular I'd charge admission.

    Webalzier showed me who was draining so I blocked a couple of IPs and one entire webhost (theplanet.com).

    Is there anyway I can be alerted to unusual bandwidth volume, either generally or by a specific source? Some sort of (free, not too hard to install) software?

    Near as I can tell it is small, sleazy ecommerce sites that are doing this to me. Don't know if I'll ever understand why.

  2. #2
    Old Hillbilly Connie's Avatar
    Join Date
    Sep 2001
    Location
    Hills of Missouri
    Posts
    2,648
    Richard,

    I don't have an answer to your problem other than check the logs daily. I am
    curious how 6 Gigs could be stolen? That is a lot of bandwidth. What was the
    time frame for this theft?

    Forum Moderators - Jag Staff

    Spam Whackers Blog - Dedicated to fighting Spam and providing General SEO Tips
    Organize your Kitchen or purchase Kitchen Accessories at Condells
    Ihelpyou Forum - Dedicated to "Best Practices" SEO

  3. #3
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104
    The bandwidth counter would've restarted at midnight the 1st. It was at 6 Gig this afternoon. Back in the Summer I lost about 30 Gig in a very short time.

    I've run across various examples of bandwidth and content theft over the last few months. Some examples are linked to at the bottom of:

    http://www.edifyingspectacle.org/pcs..._apocalyps.php

  4. #4
    Old Hillbilly Connie's Avatar
    Join Date
    Sep 2001
    Location
    Hills of Missouri
    Posts
    2,648
    I didn't find anymore information in your link than I did in your original post.
    So in less than 2 days 6 Gigs were stolen? That is a lot. How are they doing it?
    Are they linking to your images or what? I'm not to smart about some of this
    stuff but I am interested in how a few sleazy sites could suck that much bandwidth
    in such a short period of time.

    I have a little bandwidth theft. Nothing like you describe. I have
    stopped most of it by hotlink protecting certain files.


    Forum Moderators - Jag Staff

    Spam Whackers Blog - Dedicated to fighting Spam and providing General SEO Tips
    Organize your Kitchen or purchase Kitchen Accessories at Condells
    Ihelpyou Forum - Dedicated to "Best Practices" SEO

  5. #5
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104
    Sorry it wasn't clear. I'd meant the links at the bottom which mentioned some of my earlier experiences guesses as to their purposes.

    Four more Gig gone since my first post. About 4,000 Error 403 pages displayed.

  6. #6
    Loyal Client
    Join Date
    Sep 2001
    Location
    Wichita, KS
    Posts
    1,647
    holy crap dude, thats some serious leech

  7. #7
    Yeah, I know a LOT! Vin DSL's Avatar
    Join Date
    Mar 2003
    Location
    Arizona Uplands
    Posts
    10,775

    Re: Bandwidth theft alert possible?

    Originally posted by richardevanslee
    ...Don't know if I'll ever understand why...
    Don't ask, don't tell?!?!
    DISCLAIMER Any resemblance between the views expressed above and those of the owners and operators of this system is purely coincidental. Any resemblance between these views and my own are non-deterministic. The existence of Vin DSL is questionable. The existence of views in the absence of anyone to hold them is problematic. The existence of the reader is left as an exercise in the second-order coefficient.

    No Guts, No Story! VinDSL © 2010

  8. #8
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104

    Much better now

    Just a note to say that thanks to AWStats, Webalizer and WhoIs.sc (don’t want to block out the search engine bots) I’ve managed to get my bandwidth consumption back to normal. Guess I’ll have to remember to spend at least one day a week checking my stats.

    Haven’t worked this hard to keep people out since Google took a joke weblog entry too seriously and made me the 2nd and 3rd results in searches for “Naked Christina Aguilera,”

  9. #9
    Jag Veteran
    Join Date
    Sep 2002
    Posts
    650

    Re: Much better now

    Originally posted by richardevanslee
    Haven’t worked this hard to keep people out since Google took a joke weblog entry too seriously and made me the 2nd and 3rd results in searches for “Naked Christina Aguilera,”
    LOL! That's a good tip for all these folks who try to promote their sites

  10. #10
    Kubla Khan lookout's Avatar
    Join Date
    Aug 2002
    Location
    Orodruin
    Posts
    1,386
    Besides the weblog checks, sometimes it's interesting to see who Google thinks is linking back to you as well. This also is handy for checking how successful you've been in promoting your site to others.

    You can do so under their advanced search options. One way of doing so is to enter www.yourdomain.com under "Find the exact phrase" and yourdomain.com under "Don't return results from the site or domain" on the form and search. I usually like to set display results to 100 entries per page. Be sure to check the last page of entries, to get an actual count of unique sites linking, which is likely different than the number of links listed from the search.
    The trouble with our times is that the future is not what it used to be.
    - Paul Valery

  11. #11
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104
    I do that occasionally (mostly vain curiosity). And get a daily report from Technorati (weblogger’s link report). And I check my Refer logs (easy way for me to see my ongoing hits throughout the day.

    Most of the problem domains don’t have any links to me. I’ve even opened the page source in Mozilla and search for my domain). So I add them to my .htaccess even if the hits are only sporadic. I’ve come to dislike referral log spammers almost as much as the comment spammers.

    And most of the problem IPs do not have a website that I’ve been able see or find listed at the IP that caused the bandwidth load.

    Thanks, any advice or suggestions are welcome.

  12. #12
    Kubla Khan lookout's Avatar
    Join Date
    Aug 2002
    Location
    Orodruin
    Posts
    1,386
    Most of the problem domains don’t have any links to me. I’ve even opened the page source in Mozilla and search for my domain).
    Yes, I've seen some like that too. I suspect the referring URL is either disguised and/or hidden via the linking site's own security in such cases. It may well lie deeper within the site behind a restricted area.

    If you're willing to invest some time and money, here are a couple of commercial links for you on this subject that might be worth investigating further:

    http://www.Artistscope.net (see their Link Protect product)

    http://www.coldlink.com

    I can't vouch for how well either of them work myself. The .htaccess file methods of hotlink prevention have been suitable for my own purposes.

    I know you've said you've found some of the culprits, but what exactly are they doing to cause such massive demands on your allocated bandwidth? Downloading images, big files, databases or what? Surely the wild fluctuations aren't just from a sudden surge in downloading ordinary html pages. Any possibility that a runaway script on your site is involved in some fashion (triggered periodically)?
    The trouble with our times is that the future is not what it used to be.
    - Paul Valery

  13. #13
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104
    Oh, I’m just a hobbyist (my used bookshop is my livelihood). My web hosting account is all I can afford to spend. Mark Pilgrim posted a script for blocking anybody who violates robots.txt. I don’t have the expertise to really understand it and don’t want to implement something like that without the skill and knowledge to handle it properly. Thanks for the pointers though.

    I have very few images. Back in the summer the first time I was hit I had few than thirty on the site total. And the stats programs don’t show image files consuming much bandwidth. I blocked hotlinking the first time I caught someone doing it even though the intent was benign.

    I check my Refer log fairly often and know when I stumbled onto a topic that is temporarily popular. And I know which website’s are my most regular referrers. (I couldn’t put much more care into this if it was the course of my income.)

    Dean Allen’s Refer is the only script that I run, it is well behaved.

    There are sleazoid ecommerce sites that create ‘content’ on their sites by having search bots create links to website’s that are vaguely related to what they sell. So one that sells quitting smoking will randomly link to a bunch of websites that talk about quitting smoking, including an old page of mine. That way when Google indexes them it’ll see content instead of just advertising and be cluttered up with useless links that may bring someone to their site who’ll buy something.

    The funniest example of this is geometry.net which lists me as a celebrity. It simply grabbed three or four weblog entries in a section that just lists random people around the web as ‘celebrities.’ Like most of these sites it they don’t sell anything directly but have Amazon affiliate links.

    My sexuality and personal weblogs have a fair amount of sexually themed entries (but not porn, I’d never want the hassles of running any sort of adult site). I’ve found a couple of sites that have grabbed my headlines and generated links to sell stuff. The funniest took a headline about an article I’d cited on female orgasm. Next to it was “Find out more about female orgasms” link. The link went to Tiger Direct which primly announced that they didn’t stock female orgasms. But you were on Tiger’s site with the referrer as an affiliate if you did somehow buy something.

    So some people are just grabbing whatever as a way to fill up web pages rather than have to pay someone to do real work. Some folks have found their weblogs completely copied by sites overseas.

    I’ve sometimes wondered if my atheism weblog simply hasn’t generated ill will and there was an attempt to simply knock me offline. Without wishing to sound paranoid that was about the only I could think of when several Gigs would go in less than a day. Maybe malicious kids.

    Or some of all of the above.

  14. #14
    Kubla Khan lookout's Avatar
    Join Date
    Aug 2002
    Location
    Orodruin
    Posts
    1,386
    It's hard to imagine how many times a simple text based page must be downloaded to generate that amount of bandwidth use in so short a period. You must have quite a flair for picking hot topics. There's definitely some bad bots and other miscreants about. I still can't help thinking there's more to this, but your analysis seems reasonable.

    On robots, the damage is usually done once one's site has been spidered. Not as if the data gathered by a bot will be discarded anytime soon. Best to have a good robot.txt file with appropriate bot and site file/folder exclusions in place before going live. If you didn't, you can always add one. Even if you did, it might need some tweaking now that you've seen where the threats occur. I'd suggest biting the bullet and changing a few folder names to break the links already spidered. Naturally one should take steps to redirect site content requests to an appropriate location. (Think custom error pages, .htaccess redirects, and link rel tags in page headers). The better bots (which follow robot.txt file directives) should pick up on this if done properly and reindex the site on their next visit. The bad bots may not ever catch on to the change.

    One other comment. I see your custom error page isn't exactly compact (this is a page that should load as quickly as possible, so keep the graphics minimal). Since you were reporting quite a few errors, you might consider paring down the images there to see if that helps at all.
    Last edited by lookout; 12-08-2003 at 02:20 PM.
    The trouble with our times is that the future is not what it used to be.
    - Paul Valery

  15. #15
    JPC Addict richardevanslee's Avatar
    Join Date
    Feb 2003
    Location
    Durham, NC
    Posts
    104
    The huge gulps the offenders have taken is one reason I think they may have been grabbed all 3,200+ pages. I’ve had surprising affinity for several themes that were popular with searches on Google and crew this year but I couldn’t have planned it if you put a gun to my head.

    Not counting my cgi-bin and Refer logs I doubt my robots.txt blocks more than a 100 – 150 files (tiny hand optimized graphics). It won’t hurt to add a third directory to put images in that is blocked from the beginning.

    I have about thirty bad bots banned. I may have to take the chance with a script that blocks an IP the moment robots.txt is violated. I’m hesitant because Apache directives are anything I know much about. But I did ban another three IPs since I got home this afternoon. Only one showed signs of being abusive (German porn site) the other two were just junk e-ecommerce sites that were probably hoping to create referral log spam. I hadn’t even thought about Google indexing referrer logs until I started getting hits about a silly herbal supplement from mine.

    Thanks,
    Richard

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •