Welcome to the JaguarPC Community
JaguarPC
Sales: (888) 338-5261
Support: (888)-551-3050
Page 1 of 4 1234 LastLast
Results 1 to 15 of 48

This is a discussion on Normal load average? in the Shared & Semi-Dedicated forum
My account on xenon feels pretty slow... Code: perry@slappy perry $ time ssh user@example.com uptime 10:23am up 21 days, 29 min, 0 users, load average: ...

  1. #1
    JPC Member
    Join Date
    Jan 2002
    Posts
    35

    Normal load average?

    My account on xenon feels pretty slow...

    Code:
    perry@slappy perry $ time ssh user@example.com uptime
     10:23am  up 21 days, 29 min,  0 users,  load average: 21.34, 32.07, 27.90
    
    real    2m9.202s
    user    0m0.012s
    sys     0m0.003s
    A few minutes before that, I gave up waiting..

    Code:
    perry@slappy perry $ time ssh user@example.com uptime
    Killed by signal 2.
    
    real 6m14.439s
    user 0m0.009s
    sys 0m0.006s
    Lemme try from a different box a few hundred miles away on a different network.. Could be network problems on my end I suppose.

    Code:
    perry@ceg perry $ time ssh user@example.com uptime
     10:30am  up 21 days, 36 min,  0 users,  load average: 23.41, 22.59, 24.71
    
    real    0m54.858s
    user    0m0.010s
    sys     0m0.004s
    perry@ceg perry $ time ssh user@example.com uptime
     10:37am  up 21 days, 43 min,  0 users,  load average: 16.31, 20.42, 23.27
    
    real    1m20.079s
    user    0m0.011s
    sys     0m0.003s
    Anyone else on xenon having slowdowns recently? I opened a support ticket for this (2146878), they say "Your domain is working fine at our end. It would be appreciated if you could please check and confirm this once again from your end."

    Is a load average over the past 15 minutes of nearly 25 acceptable?

    Running "time wget http://example.com/page -O /dev/null" from both machines results in wildly varying response times.. any where from 10 seconds to 4 minutes.
    Last edited by perry; 05-17-2005 at 09:38 AM.

  2. #2
    JPC Member
    Join Date
    Jan 2002
    Posts
    35
    Ticket reply says it is fixed. Web feels faster. Can get email over POP3 again. Load averages seem to be going down. This seems like it is an ongoing problem for me.. overall slow response times every now and then. I'll post again next time I notice it (don't use the site all that much...).

  3. #3
    Yeah, I know a LOT! Vin DSL's Avatar
    Join Date
    Mar 2003
    Location
    Arizona Uplands
    Posts
    10,775
    Personally, I don't think the load on a dual CPU server should be over 4.00, but that's just me, evidently. Anything over 10.00 is obscene, IMHO...
    DISCLAIMER Any resemblance between the views expressed above and those of the owners and operators of this system is purely coincidental. Any resemblance between these views and my own are non-deterministic. The existence of Vin DSL is questionable. The existence of views in the absence of anyone to hold them is problematic. The existence of the reader is left as an exercise in the second-order coefficient.

    No Guts, No Story! VinDSL © 2010

  4. #4
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Waiting for the reply to come in can be something on the line. The reported load cannot; that is the machine itself. I noticed the same extreme loads on Orion at about the same time. I didn't post a ticket, as I noticed "root" was logged in by shell, so I assumed they'd be on it already.

    AFAIK these are dual processor machines capable of handling two threads simultaneously. To me that means I don't like seeing loads above 2.00 in that 15 minute average; those should IMHO be pretty rare. At times I wish they'd start logging this number for their clients (or we could do it ourselves, but many of us doing that would obviously be a waste of resources), as it's very hard to get a good impression of this and I too have doubts at times that this is reasonably under control. (I tend to notice it mainly when my e-mail client throws a warning when pop3 times out.)
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  5. #5
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,306
    That "load" number is the count of runnable processes on the queue and does not necessarily mean that the machine is in poor health, however it is a good indicator most of the time.
    Our rule of thumb was 3 per CPU (i.e. 6 on a dual cpu machine). Anything above that was considered overloaded.

    That number on these machines, from the two different servers that my sites are on, usually has absolutely positively NOTHING, repeat NOTHING to do with CPU load. It has to do with an I/O bottleneck as the CPU idle times are high when the load is high. The disk is maxed out, and a CPU in disk wait status is counted as a runnable process, and hence is in the load number.

    It's a "server load" number not a "cpu load" number. There are perhaps hundreds of processes "running" on the server "at the same time". Of course this is a human concept of the actual timesharing process, where these dual CPU machines can never actually physically have more than 2 process running at a given time - they hand out time slices to the hundreds of "concurrent" active processes.

    Now, when a machine is so unbalanced in terms of resources that the CPU is sitting idle while the disk is backed up, a healthy load number is probably less than 3 per CPU, perhaps as low as 2.5 or even 2 per CPU.

    If you want an historical record of CPU utilization, sar is probably enabled on your machine. Simply typing
    "sar"
    will give you today's historical CPU utilization; typing
    "sar -f /var/log/sa/sadd"
    where dd is the day of the month) will give you stats for that day, to a max of one month or less depending on the server config;
    "man sar"
    will give you all of the documentation.

    All that said, a number in the 20's is not a good thing.

  6. #6
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,306
    Here's an example of sar output from today from one of my machines:

    05:40:00 CPU %user %nice %system %idle
    05:50:00 all 25.49 0.12 4.15 70.24
    06:00:01 all 16.74 0.68 3.59 79.00
    06:10:00 all 22.54 1.70 4.68 71.08
    06:20:00 all 21.28 0.12 3.57 75.03
    06:30:00 all 20.03 0.09 3.51 76.37
    06:40:00 all 14.95 0.79 3.37 80.88
    06:50:00 all 19.81 1.78 4.63 73.78
    07:00:00 all 18.02 3.35 4.57 74.06
    07:10:00 all 20.85 1.51 5.10 72.55
    07:20:00 all 31.21 1.08 5.86 61.85
    07:30:00 all 27.13 1.94 5.51 65.43
    07:40:00 all 19.52 0.73 4.20 75.54
    07:50:00 all 19.73 0.65 4.17 75.45
    08:00:00 all 18.27 3.16 4.14 74.43
    08:10:00 all 21.33 0.19 4.47 74.01
    08:20:01 all 21.34 0.25 4.17 74.24
    08:30:00 all 21.04 1.56 4.33 73.07
    08:40:00 all 26.70 1.51 5.15 66.65
    08:50:00 all 29.65 0.16 5.42 64.76
    09:00:00 all 22.19 1.45 5.26 71.11
    09:10:00 all 24.37 0.60 5.24 69.79
    09:20:00 all 24.51 1.02 4.65 69.82
    09:30:00 all 23.94 0.02 4.28 71.76
    09:40:00 all 22.27 0.04 4.55 73.13
    09:50:00 all 23.95 0.08 5.31 70.66
    10:00:00 all 22.78 0.09 5.27 71.86
    10:10:00 all 27.68 2.85 6.51 62.96
    10:20:00 all 23.40 3.28 5.96 67.37
    10:30:00 all 19.53 0.64 5.02 74.80
    10:40:00 all 20.96 0.12 5.16 73.76
    10:50:01 all 23.25 0.33 5.83 70.59
    11:00:00 all 20.36 0.20 5.28 74.16
    11:10:01 all 20.75 1.12 5.69 72.44
    11:20:00 all 26.07 2.16 6.17 65.60

    11:20:00 CPU %user %nice %system %idle
    11:30:00 all 26.70 0.90 6.20 66.20
    11:40:00 all 22.47 0.54 4.83 72.16
    11:50:00 all 23.68 0.91 4.72 70.69
    12:00:00 all 25.63 0.19 5.36 68.81
    12:10:00 all 25.82 0.00 5.01 69.17
    12:20:00 all 30.05 0.10 6.11 63.74
    12:30:02 all 25.45 0.01 4.25 70.29
    12:40:00 all 23.56 0.11 4.46 71.87
    12:50:00 all 22.13 4.27 3.96 69.65
    13:00:00 all 23.50 0.29 4.48 71.73
    13:10:00 all 28.22 0.24 5.30 66.25
    13:21:03 all 28.23 0.01 5.24 66.52
    13:30:00 all 29.95 0.26 5.46 64.33
    13:40:08 all 30.99 0.07 5.48 63.46
    13:50:01 all 26.97 0.22 5.32 67.50
    14:00:09 all 26.25 0.01 5.10 68.64
    14:10:03 all 26.98 0.01 5.55 67.46
    14:20:00 all 27.16 0.01 5.49 67.34
    14:30:00 all 27.44 0.01 5.18 67.37
    14:40:59 all 26.78 0.01 5.29 67.92
    14:50:03 all 18.21 0.38 4.32 77.09
    15:00:22 all 32.25 0.04 5.82 61.89
    15:10:00 all 28.12 0.73 6.36 64.79
    15:20:00 all 26.24 0.33 8.46 64.97
    15:30:00 all 25.26 0.01 6.96 67.77
    15:40:00 all 24.49 0.01 4.63 70.87
    15:50:00 all 22.80 0.01 4.10 73.09
    16:00:00 all 26.02 0.01 4.63 69.33
    16:10:00 all 34.30 0.17 5.70 59.83
    16:20:00 all 32.26 0.14 5.27 62.32
    16:30:00 all 31.13 0.17 5.05 63.65
    16:40:00 all 23.39 0.01 4.01 72.60
    16:50:00 all 24.15 0.16 4.64 71.05
    17:00:00 all 23.51 0.19 4.19 72.11

    17:00:00 CPU %user %nice %system %idle
    17:10:00 all 24.01 0.03 4.34 71.62
    17:20:01 all 24.34 0.27 4.07 71.32
    17:30:00 all 22.44 0.35 4.28 72.93
    17:40:00 all 24.70 0.41 4.73 70.16
    17:50:00 all 16.99 0.12 3.10 79.79
    18:00:00 all 16.91 0.17 3.69 79.23
    18:10:00 all 17.73 0.10 3.95 78.22
    18:20:00 all 21.23 0.08 4.17 74.53
    18:30:00 all 24.25 0.01 4.40 71.34
    18:40:01 all 20.16 0.11 3.84 75.89
    18:50:00 all 17.54 3.80 3.43 75.23
    19:00:00 all 12.75 0.35 2.95 83.95
    19:10:00 all 12.78 0.24 3.86 83.12
    19:20:01 all 15.58 0.01 3.48 80.94
    19:30:00 all 13.72 0.21 2.81 83.27
    19:40:01 all 10.25 0.06 2.47 87.22
    19:50:00 all 10.06 0.02 2.49 87.43
    20:00:00 all 11.91 0.16 2.47 85.46
    Average: all 20.96 0.75 4.42 73.87
    More than one CPU sitting idle during all of the 10 minute averages reported. Pretty healthy CPU wise.

  7. #7
    Jag Veteran
    Join Date
    Sep 2002
    Posts
    650
    In non-technical terms, load numbers of 20 or more usually mean server is crawling to the point where connections (for example, mail) start time out and your site visitors go away.

    Could be network problems on my end I suppose.
    And no, it has nothing to do with network problems on your end.

    Also, I dont think sar is installed on shared hosting servers.

  8. #8
    JPC Member
    Join Date
    Jan 2002
    Posts
    35
    So, going by the rule of thumb, these numbers would indicate an overload? Imagine the backups could be running around now..

    Code:
    perry@slappy perry $ time ssh user@example.com uptime
      12:31am  up 21 days, 14:37,  0 users,  load average: 11.57, 9.84, 8.26
    
    real    0m17.566s
    user    0m0.012s
    sys     0m0.004s
    Kinda frustrating when I wanna go look at something on my site and it take 10 seconds for a webpage to pop-up (according to time wget http://example.com -O /dev/null). Good thing I don't really have an audience for my site or they wouldn't be my audience for long!

    Looks like sar is installed on xenon. Here's the average for the 17th:

    Code:
                      CPU     %user     %nice   %system     %idle
    Average:          all     23.57      1.59     34.96     39.88
    Never used sar before, so I'm not exactly sure what I'm looking at. If user + nice + system > 100, then there's a problem?

    Here's the sar output for when that uptime measurement was taken:
    Code:
      00:30:01          all     31.37      0.00     52.96     15.66
    What's the coorelation there?
    Last edited by perry; 05-17-2005 at 11:38 PM.

  9. #9
    Ron
    Ron is offline
    Loyal Client
    Join Date
    Aug 2002
    Posts
    7,306
    That's a busy machine... with 15.66 percent idle... that's borderline, and there are probably points in that 10 minute average where there is < 10% idle.

    That's pretty good if the cpu is at that level and load is only 8 or so... from that very limited amount of info, I'd guess that's a fairly balanced machine, but certainly one that's either right at or over the line in the amount of total work being requested. If you're seeing 10 second response times, then it's over the edge, at least while you're looking at it.

    Frequently, if you open a ticket about server load, they can see if there's a runaway process or some other problem. If you get one of those "looks fine from here" responses, persist, and show them your uptime and sar ouput.

    BTW, you can try "sar 5 5" and it will show you 5 second averages for 5 periods. (about 25 seconds total) and that's a more real time than looking at the historical sar output. Don't run sar too much in real time, as it does impose a penalty on the server, probably in the vicinity of 1/2 to 1% nowadays. (Back in the day, sar itself and other real-time system resource monitors (like top/c or top/d on VAX) could easily consume 10% of the machine! lol)

    Finally, don't go by initiating a Secure Shell for calculating your response time; grab a webpage, preferably one that visitors regularly see on your site. While repsonse time SHOULD be good for EVERYONE at all times, you really want to see what it's taking for an average visitor to see a page, not how long it takes for the webserver to bring up a page that's not in any cache. SSH is most likely not running and certainly not in memory when you request it for the first time. (EDIT: ooops I see that you did exactly that -- tried to grab a page right off the bat.... never mind!)

    I don't want to sound like I'm saying that you're getting good service... you're not. Just trying to put it in perspective a bit. Sorry you're having the troubles... open up another ticket or reopen your first one.

  10. #10
    JPC Member
    Join Date
    Jan 2002
    Posts
    35
    Quote Originally Posted by Ron
    Finally, don't go by initiating a Secure Shell for calculating your response time; grab a webpage, preferably one that visitors regularly see on your site. While repsonse time SHOULD be good for EVERYONE at all times, you really want to see what it's taking for an average visitor to see a page, not how long it takes for the webserver to bring up a page that's not in any cache. SSH is most likely not running and certainly not in memory when you request it for the first time. (EDIT: ooops I see that you did exactly that -- tried to grab a page right off the bat.... never mind!)

    I don't want to sound like I'm saying that you're getting good service... you're not. Just trying to put it in perspective a bit. Sorry you're having the troubles... open up another ticket or reopen your first one.
    I'll notice a slow response time when I go visit a page, then I'll go take a look at the server load to see if the server is overstressed or if there's a network problem between here and there. For instance, I just tried two pages with wget. First took 47 seconds, second (different) took 1 minute 17 seconds. The speed is fine (100KB/sec for a 40k file..), just takes a long time to get the page. So I go look at the load average.. 17.99, 16.56, 10.53 and it took 58 seconds to get those numbers.

    Load average this morning (7:30 central time) was pretty decent.. 3.00, 4.87, 5.34. Getting webpages was snappy too.

    I'll just keep opening tickets in hopes of whatever it is that causing the problem gets fixed. Hey, at least I'm getting quick responses to those tickets Makes the problem a bit more bearable knowing that someone is looking in to the situation.

  11. #11
    JPC Member
    Join Date
    Jan 2002
    Posts
    35
    Problem corrected, again. 3 seconds to get a page, load averages down to 3.89, 3.94, 5.80.

  12. #12
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Quote Originally Posted by Ron
    That "load" number is the count of runnable processes on the queue and does not necessarily mean that the machine is in poor health, however it is a good indicator most of the time.
    Our rule of thumb was 3 per CPU (i.e. 6 on a dual cpu machine). Anything above that was considered overloaded.
    I have assumed so far that this 3 per CPU rule of thumb is for the short term (most minutes should stay under that), while the 1 per CPU should be met long term (the 15 minute average should stay under that).

    The 1 per CPU is something JAG said here on these boards once, so I assume that makes sense, though he didn't indicate on what time scale to think.

    That sar sounds like a most informative tool. Too bad it isn't available on Orion. Perhaps I'll ask for it in a ticket.
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  13. #13
    the Windlord Gwaihir's Avatar
    Join Date
    Jun 2002
    Posts
    2,562
    Masood was as quick as usual, but said it isn't (supposed to be) available on shared servers. With server load right now again in the 20's, I can't find it very convincing.

    Perhaps it would be nice if JAG was to give some clear pointers as to what is considered normal and how it is / isn't monitored. I must say that responses to server load tickets have always left me dissatisfied in the past; giving me the feeling the answers are generally a polite way of saying "mind your own business", giving me a very dualistic feeling on opening one of those.
    Regards,

    Wim Heemskerk
    ---
    Visit MeCCG.net - Cardgaming in J.R.R. Tolkien's Middle-earth
    And Gwaihir.net - The Middle-earth CCG store

  14. #14
    Yeah, I know a LOT! Vin DSL's Avatar
    Join Date
    Mar 2003
    Location
    Arizona Uplands
    Posts
    10,775
    This is all great and fine, however...

    Before I came here, I considered 1.0 per CPU to be the benchmark. So, in the old days, I would consider 2.0 to be the limit. That's before I came here...

    Since I've been here, I watched 'my' server crash at (like) 65.0, then 100.0, then 240.0 --- whatever! All I can tell you is I've pretty much decided 6.0 is 100% here. After that, things start to slow down. All considering, I think 4.0 is normal, e.g. somewhat less than 100%...

    You guys can explain it away any way you want - like 'Tech Support' saying it's because of backups, et cetera - but 25's are NOT normal. I don't care which church you attend...
    DISCLAIMER Any resemblance between the views expressed above and those of the owners and operators of this system is purely coincidental. Any resemblance between these views and my own are non-deterministic. The existence of Vin DSL is questionable. The existence of views in the absence of anyone to hold them is problematic. The existence of the reader is left as an exercise in the second-order coefficient.

    No Guts, No Story! VinDSL © 2010

  15. #15
    Administrator Eric's Avatar
    Join Date
    Sep 2001
    Posts
    853
    We are working on a plan to help reduce if not compeltely eliminate these types of occurances in the future. I cannot really give any details on how we are going to implement this plan, but I can assure you we are in the planning stages because we do not like to see these types of posts in the forums any more than you like experiencing these types of problems. Hang in there! If you feel your account would be better of on a different machine for the time being, please open a ticket and request that your account be moved.
    Eric E. [eric@jaguarpc.com]
    Jaguar Technologies, LLC
    JaguarPC.com * DedicatedSpace.com

Page 1 of 4 1234 LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •