Networking

Unix and Linux network configuration. Multiple network interfaces. Bridged NICs. High-availability network configurations.

Applications

Reviews of latest Unix and Linux software. Helpful tips for application support admins. Automating application support.

Data

Disk partitioning, filesystems, directories, and files. Volume management, logical volumes, HA filesystems. Backups and disaster recovery.

Monitoring

Distributed server monitoring. Server performance and capacity planning. Monitoring applications, network status and user activity.

Commands & Shells

Cool Unix shell commands and options. Command-line tools and application. Things every Unix sysadmin needs to know.

Home » Performance

Changing Process CPU Affinity on Linux

Submitted by on September 13, 2011 – 4:58 pm 5 Comments

A common real-life scenario: on a multi-CPU system Oracle processes have taken over and the system has ground to a crawl. The average system load is in double-digits and even logging in takes several minutes. The possible root causes for the problem can range from inefficient SQL queries (the common problem) to insufficient system resources. But at this point you just need to make the system a bit more responsive, so you can start troubleshooting.

And so your options are:

1. Have the DBAs shut down Oracle or at least make it stop accepting new queries. This may take some time, because the system is so overloaded. This also means a service outage, which may not be an option at this time.

2. Renice Oracle processes to a lower priority. This is a viable and simple option. All you need to do is run something like “renice 10 -u oracle”. But in many cases the improvement in system responsiveness will be marginal at best.

3. Shift CPU affinity of Oracle processes to specific CPUs/cores, leaving at least one CPU free to handle the OS activity. This particular option may make the system more responsive, even if only temporarily.

The first step is to determine the number of CPU cores available on your server:

# grep processor /proc/cpuinfo
processor       : 0
processor       : 1
processor       : 2
processor       : 3

In this case we have four cores. The plan is to shift all Oracle processes to cores 1, 2 and 3, leaving core 0 available for system processes. The command used to perform this operation is “taskset” and is a part of the “schedutils” package. This package is usually a standard part of any current Linux server installation.

The syntax we would use to move a specific process to cores 1-3 is as follows:

taskset -cp 1-3 

Of course doing this on a process-by-process basis can get very tedious. Some basic automation is in order:

user=oracle
ps -f U ${user} | grep ${user} | awk '{print $2}' | sort | uniq | while read pid
do
     echo "Changing CPU affinity of PID ${pid}"
     taskset -cp 1-3 ${pid}
done

This script can be added to cron to run, say, every five minutes to make sure that any new Oracle processes that spawn during that time are also forced to run on selected CPUs. Additionally, you can force all “root” processes to core 0 using the same script, with just a couple of modifications:

user=root
ps -f U ${user} | grep ${user} | awk '{print $2}' | sort | uniq | while read pid
do
     echo "Changing CPU affinity of PID ${pid}"
     taskset -cp 0 ${pid}
done

Hopefully, now your server will be responsive enough for you to investigate the root cause of the problem.

A quick word of advise: if your Oracle server has been working fine in the past and there were no recent changes to SQL scripts (but don’t take your DBA’s word for that), you may want to run “top” and take a look at the system wait time. If it is high, the most likely cause is physical memory shortage leading to heavy swap utilization. See how much physical memory is left and check utilization of swap space. You may need to restart Oracle and user applications or even to reboot the system to clear out any defunct Oracle processes holding physical memory.

Another common cause of high system wait time is network performance. This can be especially impacting if your data storage is network-mounted. You can use “bonnie++” or rsync on a network-mounted filesystem to test network throughput. You can read about other network performance testing options here.

Print Friendly, PDF & Email

5 Comments »

  • cotoraa says:

    Very good :-)

  • Jeremy Xargor is my gamertag says:

    Please help I don’t understand, my homework asks to identify where the process is used in the criminal justice system, the processes meaning probation and parole.Thanks any help will be appreciated :)

  • JDOGG1122 says:

    In Europe, you buy a phone and then a SIM card. To recharge, you buy cards which give you more units for your cell phone. Is it the same in the States? I am only talking about prepaid plans. Do you buy the cell phone and SIM at the same time? And do you pay monthly? Do you “buy” minutes? If yes, how? Any other information would be great. I’ really want to understand the system/process. Thanks.

  • Le Pwner says:

    I tried to end the process on the task manager but it says that it is a critical system process.
    It is not Lsass.exe. The process is called Isass.exe

  • clntvrrt says:

    Every proxy site that I’ve gone to is blocked, the sysadmin took away ability to input proxies, and we can’t run any exe files that we download. They also have circumventors blocked so you can’t access your own IP. I also can’t shut down certain system processes from the task manager. It says access is denied. I even tried using elevated commands from the command prompt and killing the task ID process.

Leave a Reply

%d bloggers like this: