Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Performance

Linux performance tuning

Submitted by on August 22, 2007 – 2:40 pm 4 Comments

Linux Performance Tuning

April | May 2007 | by Jaqui Lynch

Note: This is the second article in a two-part series. The first installment was published in the February/March issue.

In last issue’s article I introduced basic Linux* performance-tuning concepts. Now I delve into I/O and memory tools and commands that are useful for optimizing Linux performance.
Uptime

Uptime provides the analyst with the system load average (SLA). The SLA can only be used as a rough indicator as it doesn’t take scheduling priority into account. It also counts as runnable all jobs waiting for disk I/O, including network file system (NFS) I/O. However, uptime is a good place to start when trying to determine whether a bottleneck is CPU or I/O based.

Running uptime provides three load averages: the last minute, the last five minutes and the last 15 minutes. The load average should be divided by the number of processors. If the calculated value is borderline, but has been falling over the last 15 minutes, then it would be wise to just monitor the situation. However, a value between 4 and 7 is fairly heavy and means that performance is being negatively affected. Above 7 means the system needs serious review and below 3 means the workload is relatively light. If the system is a single-user workstation, then the load average should be less than 2. The ruptime command also allows you to request uptime information remotely.
mpstat

The mpstat command shows a breakdown of utilization by an individual processor that’s very useful on a multiprocessor system. The command shows the kernel level, user CPU, system CPU, nice time, idle time, wait time and interrupts per second. Similar data can be obtained with the sar command.
sar

The sar command provides a good alternative to uptime with the -q option or mpstat with the –u option. This command provides statistics on the average length of the run queue, the percentage of time the run queue is occupied, the average length of the swap queue and the percentage of time the swap queue is occupied.

The run queue lists jobs that are in memory and runnable, but doesn’t include jobs that are waiting for I/O or sleeping. The run queue size should be less than 2 per processor. If the load is high and the runqocc=0, then the problem is most likely memory or I/O, not CPU. The swap queue lists jobs that are ready to run but that have been swapped out.

The powerful sar command is run by typing:

sar -options int #samples

Where valid options generally are: -g or -p: paging, -q: average Q length, -u: CPU usage, -w: swapping and paging, -y: terminal activity, -v: state of kernel tables.

One advantage of using sar is you can write the data to a file and then post-process it when the system isn’t busy. The file is created using the -a and -o flags. An example of creating a file of 30 snapshots, each two seconds apart, would look like:

sar a o sardata.out 2 30

This file would then be processed using:

sar -u -f sardata.out

This would cause the CPU usage report to display from the recorded file.
iostat

After determining that the problem may well be CPU-based by using uptime and sar, it would then be necessary to move on to iostat to get more detail. Running iostat provides much information, but the values of concern are %user and %sys. If (%user + %sys) > 80 percent over a period of time, then it’s likely the bottleneck is CPU. In particular, it’s necessary to watch for average CPU being greater than 70 percent with peaks above 90 percent. It’s also possible to get similar information by running the ps -au or sar -u commands. Both commands provide information about CPU time. The sar -u command, in particular, breaks the time into user, system, time waiting for blocked I/O (i.e., NFS, disk, etc.) and idle time.

Once it’s been determined that the problem is a CPU bottleneck, there are several options to consider. It’s possible to limit the CPU time a process can use with the limit command. If the problem relates to one process, then it’s also possible to model or profile that process using one of the profiler commands to find out whether optimizing the program code is a possibility.

Running the time command also provides a good indication of whether the process is CPU-intensive. Compiler options have been proven to extensively affect the performance of CPU-intensive programs and turning on optimization, if you’re able to do so at compile time, is worthwhile.

If none of the aforementioned options improves performance and the problem still appears to be CPU, three options remain:

  • Run the workload on another more powerful system
  • Reduce the priority of the process (use nice or renice)
  • Upgrade the CPU

I/O Scheduling

The 2.6 kernel introduces a concept called I/O elevators and provides four I/O schedulers (CFQ, NOOP, anticipatory and deadline). The choice of I/O scheduler can greatly impact I/O performance. An I/O elevator is a queue where I/O requests are ordered based on where their sector is on the disk. I/O elevators are included in the deadline and anticipatory I/O schedulers.

The CFQ (completely fair) scheduler is the default on Red Hat Fedora Core 6. Check what you’re using by running dmesg and scanning through the output. Different tunables for schedulers can be found in: /sys/block/device/iosched (i.e., /sys/block/sda/queue is the directory for the CFQ scheduler tunables on disk sda).

Using 64 internal I/O queues and a single I/O dispatch queue, the CFQ scheduler targets the fair allocation of I/O bandwidth among the initiators of I/O requests.

The NOOP scheduler is a minimal-overhead scheduler that has been shown to be useful for large I/O subsystems that use RAID controllers. It’s worth trying this out if the server is connected to a large RAID storage-area network (SAN).

The anticipatory scheduler is targeted at reducing the per-thread read response time and tries to assist synchronous sequential reads especially during streamed writes. It’s not designed for random read workloads with many seeks all over the disk; it will perform poorly if used for this kind of workload.

The deadline scheduler uses five I/O queues to keep track of I/O. It’s designed to emphasize average read-request response time for workloads that seek all over the disk.
I/O Problems

If the problem doesn’t appear to be CPU, then it’s necessary to investigate memory and I/O as possible causes. Again, it’s possible to use iostat or sar to get the information needed. The 2.6 kernel provides three new flags on the iostat command: -p, -k and –x. The –p flag adds a column for %iowait into the processor statistics. It also tracks time spent waiting for the Hypervisor (%steal). The -k flag provides KB/sec written and read instead of blocks. Additionally you can use the tps field as an indicator of logical I/Os per second. Finally the –x flag provides detailed performance statistics such as read and write service times.

iostat provides data on several important values for each physical disk. These values include percent of time the physical disk was busy, kilobytes per second to/from the disk, transfers per second to/from, kilobytes read and kilobytes written. This data will help determine if an imbalance of I/O exists among the physical volumes. If all appears normal, the next step is investigating the filesystems at which the I/O is directed. If the I/O is directed at the page files, then memory must be investigated.

If the bulk of the I/O (more than 30 percent) is going to a logical volume or filesystem not used for paging, then the problem is most likely user I/O. This can be resolved by:

  • Checking fragmentation
  • Reorganizing the filesystem
  • Adding physical volumes
  • Splitting the data
  • Adding memory

Paging is investigated using vmstat, iostat, sar and swapon. Information on the page spaces available including physical volume and utilization is provided by swapon -s. The tool vmstat provides information on actual memory, free pages, processes on the I/O waitq, pageins, pageouts, etc. It doesn’t provide timestamps. It’s also possible to use the sar -r, -b or -w commands.

The free -m command shows memory usage broken down into the following: total memory, used memory, free memory, buffers, file-cache size (cached), total size of swap space, used and free swap space. This information can also be obtained by using: more/proc/meminfo.

Sequential readahead can help sequential I/O performance. Previously we would use hdparm or change the values in /prop/sys/vm. For the 2.6 kernel use blockdev:

  1. To get the current readahead value:

blockdev –getra /dev/sda

  1. To set the readahead value temporarily:

blockdev –setra 4096 /dev/sda

(Note: 4096 is just an example. Testing determines the optimal value.) The OS will read ahead 4096 pages and throughput will be higher. To make the change available every time you boot, add: blockdev –setra 40964 /dev/sda, blockdev –setra 4096 /dev/sdb, etc., to /etc/rc.d/rc.local. The blockdev command also has other options, so examining the main pages is recommended.

Figure 1 shows the impact of small blocksizes on sequential performance. The move from a 4 KB blocksize to a 16 KB blocksize increased bandwidth from 18.6 MB/sec to 1 GB/sec.

If there’s no performance problem with the CPU and the system isn’t disk or memory bound, it becomes necessary to investigate the network to check whether it’s remote-file I/O bound. This is the last step before running the more resource-heavy trace command to determine what’s really happening.
The Diagnosis

A great deal of information can be gleaned from the system with relatively minimal effort. This information can then be used to determine what’s happening with the system. Additionally, free tools such as nmon and ganglia allow you to get a full view of the system in one page. Running nmon in logging mode creates a file that can be used by the nmon analyzer (Excel spreadsheet) to produce graphs of what’s happening on the system. It may be beneficial to check into a technical grid, this will allow you to forgo managing storage systems one by one.

To properly measure performance, it’s helpful to automate the reporting process using a scripting language (e.g., perl) combined with scheduling commands (e.g., at or cron). These languages can also be used to create graphical views of the output from the tools. By using these tools and this methodology, it’s possible to diagnose performance problems on most UNIX* systems, using non-proprietary tools that come standard with the system.

References

  • http://linuxperf.nl.linux.org (This is great, but it’s in Dutch.)
  • www.psc.edu/networking/perf_tune.html
  • Enabling high performance data transfers on hosts: www.ibm.com/ developerworks/linux/library/l-async/index.html
  • www.ibm.com/developerworks/linux/library/l-scheduler/index.html
  • DB2* memory and file caching: www.ibm.com/developerworks/ db2/library/techarticle/dm-0509wright/index.html
  • Performance Tuning for Linux* Servers – Johnson, et al. ISBN 0-13-144753-x
  • www.ibm.com/redbooks, search on Linux performance: Red Hat is Redpiece 3861, Novell SUSE is 3862
  • http://people.redhat.com/alikins/system_tuning.html

Additional Tools

  1. Ganglia: This tool is great for clusters and grids and provides central reporting (http://ganglia.sourceforge.net).

  2. Nmon: This tool runs on AIX* and Linux* and can be used with the nmon analyzer to produce useful spreadsheets that analyze and summarize performance (www.ibm.com/collaboration/ wiki/display/WikiPtype/nmon).

  3. Sysstat rpm: This is required to get the iostat, mpstat and sar commands.

Print Friendly, PDF & Email

4 Comments »

  • Thomas says:

    Great1
    Thanx!

    Cannot understand the low popularity and lack of comments …

  • zaclo says:

    Last time I downloaded the new “update” for firefox which was firefox 4.0 it was actually worse then the last version and my internet speed actually got slower. So has firefox 5.0 been any better for you guys?

  • Brendan O says:

    What would you expect in a web developer as far as coding knowledge goes?

    I’ve never been to college once for programming. But I do have really good knowledge in HTML, CSS I have decent knowledge in PHP and mysql. I’ve worked with Linux server before and I’m currently running a Linux box here at home. I’ve made user login, register forms, upload forms (Video, Images) as well as I know how to make a site that will run very dynamically.

    I don’t yet use too much classes in PHP but I do use some. Pretty much if I wanted to I could make a decent website. Today standards though where do I stand? One of my weak spots with programming is JavaScript, I haven’t worked too much with XML (mainly because I haven’t really had the use for it). As far as making graphics I blow at that. I mean I can make some graphics to get me by but I wouldn’t consider myself a pro by any stretch of the imagination. I love the work that I do. Coding has always been something that I really enjoy doing. Watching something come from nothing to something that is cool I always loved doing. Overall I’ve been coding for 3 and a half years. I pick up rather quickly on HTML, CSS was kinda like a package deal for me when using HTML to style everything on my site. PHP was something that I kind of forced myself to learn because really without it you can’t really do anything fun. And mysql was a package deal with PHP. :P

    So tell me what are my chances out in the web development field?

    Thanks,
    Jon W

  • stingerms says:

    I’m a 16 year old college student with no personal income, I have tried to get a part time job but I’ve found it impossible after submitting an applications to a number of different places.

    Over the past few years I have been asked by a number of different people for advice or help fix their personal computers. I have been quite successful, with most of the problems I have been presented with (all software issues).

    For example today (it game me the idea) a friend had a unbootable Dell laptop. I rescued the files using a Linux boot disk, then formatted and re-installed the OS (+ drivers). Now the laptop is working perfectly.

    My friend suggested I put an advert in the local paper (not charging much) for (software only) computer repair/ basic consultancy.

    I can rescue data from unbootable computers, format/install/ fix OSs, repair most application software issues I am presented with, install and update drivers, setup wireless networks, setup PC and portable devices, tune change settings to improve machine performance etc… I could also act as a consultant, if someone gave me their budget and explained what they wanted from a computer I could advise them what they should buy, from an informed position.

    I’m considering putting an advert in the local paper and in 2 shop windows. I wouldn’t charge much, maybe £5 an hour for repair work and if I can’t fix it to the owners required standard I wouldn’t charge anything.

    This is just an idea, I’m looking for people’s opinions of the idea and if you believe there is anything I need to consider if I choose to do this.

    Much appreciated.

Leave a Reply

%d bloggers like this: