Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Page 1 of 2 12 LastLast
Results 1 to 15 of 26

This is a discussion on How long does it take FSCK to run?! in the VPS & Dedicated forum
2AM last night Jaguar support sent me a ticket saying FSCK was running on my "hardware node" and that it would be completed "shortly". That ...

  1. #1
    JPC Member
    Join Date
    Mar 2006
    Posts
    29

    How long does it take FSCK to run?!

    2AM last night Jaguar support sent me a ticket saying FSCK was running on my "hardware node" and that it would be completed "shortly".

    That was over eight hours ago. FSCK's that have run on my computer take about 20-30 minutes, sometimes and hour if it finds something wrong.

    What the hell is going on with draco? My VPS is going on more than 8 hours of downtime, and that's unacceptable.

  2. #2
    Friendly rainboy's Avatar
    Join Date
    Apr 2006
    Location
    Eindhoven, The Netherlands
    Posts
    546
    FSCK runs will run as long as is needed to repair and check the filesystem, so depending on the how large the arrays are on a system, and how long ago the previous FSCK was done, and how much data did change. Really no way to tell you this, especialy not if the hardware RAID is also trying to do its job as a disk did fail.

    I must say that 8 hours would be quite some time for a FSCK, but it seems that other hosting companies have exactly the same problem when a disk fails. Maybe they should make more filesystems to avoid this from happening, but then i don't know if that would be a good sollution or even possible.

    Kindest regards,
    Patrick

  3. #3
    VPS Client
    Join Date
    Mar 2006
    Location
    UK
    Posts
    258
    I have to say I hope this doesn't carry on for much longer, I too have been offline for over 8.5 hours now, and that does seem a lot more than the message
    The VPS node draco.nocdirect.com is displaying some filesystem problems and is undergoing file system check FSCK at this time. The node should be back up shortly. Thank you for your patience
    leads us to believe.

    Is there an anticipated time of completion for this yet, I don't want to disturb Support, because the more times they get bothered the longer this repair could take and I am sorry to have to say, surely this has already taken too long.Whilst this may be an overnight job in the USA in Europe (where I am) this has taken the whole day already and we are well into the evening.
    Last edited by Rebel007; 06-09-2006 at 11:54 AM.

  4. #4
    Administrator Eric's Avatar
    Join Date
    Sep 2001
    Posts
    853
    Unfortunately, on these large RAID arrays, FSCKs can take 4 hours or more to fix broken inodes, etc. Les has updated the announcement thread for more details. We apologize for the problems.
    Eric E. [eric@jaguarpc.com]
    Jaguar Technologies, LLC
    JaguarPC.com * DedicatedSpace.com

  5. #5
    JPC Member
    Join Date
    Apr 2006
    Posts
    37
    now its almost 11+hrs since the service is down, i thought the downtime will be minimum but now its feeling like it is taking forever... I contacted support 10min ago they said they can not give a ETA which sounds not good... surely it means there is still more downtime to go... as if there was little more left they would have said 1 or 2hrs or moreless...

    just a update now http://forum.jaguarpc.com/showthread.php?t=14216

    they are running a 2nd check now as 1st had some problem... so should we expecting another ~10hr down time ???

  6. #6
    Darth Admin (aka Jag) JPC-Greg's Avatar
    Join Date
    Sep 1998
    Posts
    5,201
    I wish there was something we could do to reduce, or better yet elliminate, them. This is one benefit a dedicated server has over vps, if you had your own 40-60gig drive or better it could work/repair that at worst in a few hours, usually much faster.

    As a vps client on a node, your just a fraction of a much larger array. If there are problems from a poor shutdown it wont go until its all checked. By all means we are open to ideas on how to approach or better deal with this type of impact to you, the client. From a client point of view you just see downtime and silence... fsck, fsck, fsck... all we see is a flying cursor fixing files. I still cant beleive an fsck takes this long either if I didnt see it for myself.

    We just need to figure out what caused it to go down and try to avoid the need for fsck's but its something that cant be outright escaped nor should it. This happens as inodes get lost, corrupt, etc during use as a machine is improperly or abruptly shut down. The machine can usually do an fsck if things arent too bad very quickly and move on. When its looking like there could be sevre problems if it allowed things to continue then it wont even boot until you talk the long walk through each partition... and on a vz node the /vz partition is a big nasty hog.
    Greg L. | Chief Executive Officer
    JaguarPC.com

    Helpful Links
    Knowledge Base | Network Status

    Need a Manager?
    (pm) | (email) David, Customer Service Manager
    (pm) | (email) Zach, Community Liason, Sales manager
    (pm) | (email) Masood, Chief Technical Officer
    (pm) | (email) Les, Chief Operations Officer

  7. #7
    JPC Member
    Join Date
    Mar 2006
    Posts
    29
    All that technical stuff is great, Jag, and I appreciate it because I understand it, but it doesn't get back the two clients I lost during the downtime. Downtime I wasn't responsible for, downtime I didn't cause, and downtime I could do nothing about. But who suffers? You? No.

    I do, my clients do.

  8. #8
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    Smaller arrays

  9. #9
    Yeah, I know a LOT! Vin DSL's Avatar
    Join Date
    Mar 2003
    Location
    Arizona Uplands
    Posts
    10,775
    BSD

    *edit* Check your PM for POC, Chief...
    Last edited by Vin DSL; 06-09-2006 at 03:20 PM.
    DISCLAIMER Any resemblance between the views expressed above and those of the owners and operators of this system is purely coincidental. Any resemblance between these views and my own are non-deterministic. The existence of Vin DSL is questionable. The existence of views in the absence of anyone to hold them is problematic. The existence of the reader is left as an exercise in the second-order coefficient.

    No Guts, No Story! VinDSL © 2010

  10. #10
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Looks like that's where these big ass machines really come to a halt: these checks just don't seem to scale up.

    I think Ron's right: smaller arrays are the only way out. Guess that means either smaller machines or multiple arrays per server. In the latter case a hard shut-down could still corrupt all (two or three) arrays, but at least you would have the option of plugging one (or two) into a test machine to get the checks done at twice (or thrice) the speed - I do presume these are all hotpluggable drives anyway.
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  11. #11
    Yeah, I know a LOT! Vin DSL's Avatar
    Join Date
    Mar 2003
    Location
    Arizona Uplands
    Posts
    10,775
    Quote Originally Posted by Gwaihir
    these checks just don't seem to scale up...
    You're getting warm...
    DISCLAIMER Any resemblance between the views expressed above and those of the owners and operators of this system is purely coincidental. Any resemblance between these views and my own are non-deterministic. The existence of Vin DSL is questionable. The existence of views in the absence of anyone to hold them is problematic. The existence of the reader is left as an exercise in the second-order coefficient.

    No Guts, No Story! VinDSL © 2010

  12. #12
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Quote Originally Posted by Vin DSL
    You're getting warm...
    Well yeah, it is unusually hot around here and will remain so for more than a week. But ehm.. I don't think you're refering to that, are you?

    In other words: care to elaborate on the enigmatic statement?

    Or should I on mine? What I meant is that generally with these multi-cpu systems with their big memory banks and all, twice as big a system gets twice as much work done in the same amount of time. So you can have twice as many users on it, etc, and peak loads on one site nicely even out with low tide on others and everyone is happy with the awesome power at their fingertips.

    Not so with these darn arrays, it seems. Twice as big an array takes twice as long (or longer?) to check up. Nothing the bad ass raid controller and massive cpu power can do to alleviate it. In addition to that, it would seem that twice as many users means twice as many system crashes relating to user activity and it seems that such hard crashes of the hardware node are what make an extensive fsck necessary. And of course, twice as many users means twice as many users affected by the outage. Then, to top it of, it appears such a big VPS machine takes a lot of time (one or two hours?) to bring all VPSses (back) up after a system restart..

    So, perhaps over the top bad ass, maximum redundancy machines are NOT the best way to go with these VPSses after all?
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  13. #13
    Jag Veteran
    Join Date
    Sep 2002
    Posts
    650
    Quote Originally Posted by Jag
    I wish there was something we could do to reduce, or better yet elliminate, them. This is one benefit a dedicated server has over vps, if you had your own 40-60gig drive or better it could work/repair that at worst in a few hours, usually much faster.

    As a vps client on a node, your just a fraction of a much larger array. If there are problems from a poor shutdown it wont go until its all checked. By all means we are open to ideas on how to approach or better deal with this type of impact to you, the client. From a client point of view you just see downtime and silence... fsck, fsck, fsck... all we see is a flying cursor fixing files. I still cant beleive an fsck takes this long either if I didnt see it for myself.

    We just need to figure out what caused it to go down and try to avoid the need for fsck's but its something that cant be outright escaped nor should it. This happens as inodes get lost, corrupt, etc during use as a machine is improperly or abruptly shut down. The machine can usually do an fsck if things arent too bad very quickly and move on. When its looking like there could be sevre problems if it allowed things to continue then it wont even boot until you talk the long walk through each partition... and on a vz node the /vz partition is a big nasty hog.
    There is so much you can do, I don't even know where to start.
    For one, what do you mean by "the poor shutdown"? How did that happend?
    Don't you have UPS and dual power supply on these servers? I guess no.

    For two, why putting all your eggs in one basket? Make smaller data partitions, for example, 1 partition for each VPS. Have a small root and /usr partitions. If you do - it's not true that the server wont boot if you skip fsck. You can boot just fine with root and /usr and mount each VPS partition as you go, after fsck.

    For three, get NAS or (may be) DAS. It's not that expensive these days.

    And last, have a daily backups and a spare server readily available. If a server goes down for whatever reason and is not expected to be back in 15 mins, unplug the failed server and let it do fsck off line. Use the spare server with the backup data instead. Most users wouldn't mind to lose changes made during the last 24 hours if the alternative is losing 12-24 hours of uptime. And for those who want to recover every single bit of changes, you can grant temporary access to the failed box when it completes fsck (some time in the next century, that is).

  14. #14
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,312
    Gerilya,
    Would linux check each partition or only suspect partitions?

  15. #15
    Jag Veteran
    Join Date
    Sep 2002
    Posts
    650
    Quote Originally Posted by Ron
    Gerilya,
    Would linux check each partition or only suspect partitions?
    After "unclean" shutdown, all partitions are marked "unclean", prompting fs check. However, small root and /usr partitions are checked/fixed fast and they are just enough to get the server started.
    You get in trouble only if you have 1 big partition, which was probably the case here.

Page 1 of 2 12 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •