Monitoring and diagnosing performance problems in CentOS

CentOS is widely used in the server world due to its stability, security, and extensibility. It’s a popular choice for developers and administrators who need a reliable system for hosting applications, websites, and databases.

However, despite its strong reputation, even CentOS can face performance issues. To avoid downtime and ensure the smooth functioning of your server, it’s crucial to identify and diagnose these issues promptly.

This guide looks at how to monitor and solve performance issues in CentOS. It will cover key tools, common issues, and practical steps you can take to keep your server in peak condition.

Why is prompt CentOS troubleshooting important?

Here are key reasons why you must prioritize prompt issue troubleshooting in CentOS environments:

Prevent downtime and revenue loss

Server downtime can lead to serious consequences for businesses, especially those reliant on online services. For example, an e-commerce website running on CentOS may experience a sudden spike in traffic, causing performance degradation.

If the server isn't monitored and the issue isn't addressed promptly, the website could go offline during peak sales hours. This would result in lost sales and a poor customer experience.

Stop small issues from becoming big problems

Performance issues often start small but can snowball into bigger problems if left unchecked. For example, a web server may show signs of increased memory usage due to a faulty process or memory leak.

Initially, the impact may be minor — slightly slower load times or brief pauses in service. However, if the issue isn’t diagnosed early, it can escalate into a more serious problem, potentially causing the entire server to become unresponsive and requiring a reboot.

Avoid reputational damage

When performance issues affect a company's services, it can have a lasting impact on the brand's reputation. Take the example of a SaaS company whose platform is hosted on CentOS. If the platform becomes slow or unavailable due to undiagnosed server issues, users may begin to lose trust in the service.

Bad reviews, social media backlash, and poor word of mouth may harm the company's public image irreparably.

Improve long-term server health

Regular troubleshooting and monitoring aren’t just about solving immediate problems — they also help improve the long-term health of your CentOS servers. For example, you may notice frequent spikes in CPU or memory usage.

After troubleshooting, you may find the root cause to be an underlying configuration issue and fix it. Had this issue been left unchecked, it could have eventually led to hardware failure or the need for costly upgrades.

Ensure security

Performance problems can sometimes be indicators of security breaches or malicious activity. Timely troubleshooting can help identify and mitigate security threats, protecting your system from unauthorized access and data theft.

For example, a sudden spike in CPU usage, memory consumption, or network traffic could be because of malware running on the system.

Setting up monitoring tools

You must have a variety of tools in your repertoire to become an effective troubleshooter. This section will walk you through installing and setting up some of these.

top

top is a built-in command-line tool that provides real-time information on system resource usage, including CPU, memory, and processes. It’s a go-to tool for quick checks on system load.

  • Install: Pre-installed on all CentOS systems.
  • Usage: Run top to see a live view of system performance.
  • Example: To sort processes by memory usage, press M while top is running.

htop

htop is an enhanced version of top with a more user-friendly interface and additional features, such as better process management and color-coded metrics.

  • Install:

sudo yum install htop

  • Usage: Run htop for a detailed and interactive view of system resources.
  • Example: Use the arrow keys to navigate processes, and press F9 to kill unresponsive ones.

netstat

netstat displays detailed information about network connections, routing tables, and interface statistics. It’s especially handy for diagnosing network-related performance issues.

  • Install: Pre-installed on CentOS systems.
  • Usage: Run netstat to view active network connections.
  • Example: Use netstat -tuln to list all listening ports.

sar

sar is part of the sysstat package and provides detailed reports on CPU, memory, I/O, and network performance over time. It’s useful for tracking historical performance data.

  • Install:
sudo yum install sysstat
sudo systemctl enable --now sysstat
  • Usage: Run sar commands like sar -u to display CPU usage or sar -r for memory statistics.
  • Example: Use sar -u 1 1 to display CPU usage report and sar -r 1 1 for memory usage report.

vmstat

vmstat reports on virtual memory, processes, CPU activity, and I/O system usage. It’s great for identifying bottlenecks related to memory or CPU.

  • Install: Pre-installed on CentOS systems.
  • Usage: Run vmstat to see system performance stats.
  • Example: vmstat 5 will show system statistics every 5 seconds.

dstat

dstat provides a combination of the functionality found in vmstat, iostat, netstat, and more. It gives a clear snapshot of several system metrics in real time.

  • Install:

sudo yum install dstat

  • Usage: Run dstat to view CPU, memory, disk, and network usage simultaneously.
  • Example: Use dstat --cpu --disk --net to focus on current CPU, disk, and network statistics.

iostat

iostat is useful for monitoring disk I/O statistics and can help you pinpoint slow or overworked disks.

  • Install:
sudo yum install sysstat
  • Usage: Run iostat to view disk performance metrics.
  • Example: Use iostat -dx 5 to get extended stats with a 5-second refresh interval.

nload

nload is a network traffic monitoring tool that provides a visual representation of incoming and outgoing traffic in real time.

  • Install:

sudo yum install nload

  • Usage: Run nload to view traffic statistics.
  • Example: Use nload -i 5 to refresh the data every 5 seconds.

iftop

iftop allows you to monitor real-time bandwidth usage for network interfaces. Use it if you want to detect excessive traffic or network congestion.

  • Install:

sudo yum install iftop

  • Usage: Run iftop to display bandwidth usage by IP address.
  • Example: Use iftop -i eth0 to monitor traffic on the eth0 interface.

atop

atop is an advanced system and process monitor that gives you a detailed overview of CPU, memory, and disk usage, along with historical performance data.

  • Install:

sudo yum install atop

  • Usage: Run atop for a detailed report on system metrics.
  • Example: atop -r can be used to replay historical performance data for troubleshooting past issues.

tcpdump

tcpdump is a packet analysis tool that can be used to capture and visualize network traffic.It’s useful for diagnosing network performance issues, as it lets you inspect packets at a granular level.

  • Install:

sudo yum install tcpdump

  • Usage: Run tcpdump to capture network traffic.
  • Example: Use tcpdump -i eth0 to capture packets on the eth0 interface.

Monitoring CPU performance

This section explores some common CPU issues, explains how to detect them, and provides steps for troubleshooting.

High CPU usage

Description: High CPU usage can slow down applications, degrade server performance, or even cause services to become unresponsive.

Detection:

  • Run the top command and look at the %CPU column to track CPU usage of high CPU consuming processes.
  • Run sar -u 1 5 to check CPU usage every second for the next 5 seconds (5 times).

Troubleshooting:

  • Use top or htop to look for processes with unusually high CPU usage.
  • If a process is unresponsive, you can kill it using the kill command (kill -9 <PID>). If possible and/or needed, restart the process or service.
  • Sometimes high CPU usage is due to inefficient code. Check if the application can be optimized.
  • Long-running or runaway processes can spike CPU usage. Use ps aux to find and stop any unneeded processes.
  • If high CPU usage is persistent and isn’t resolved by any of the above tips, you may need to upgrade to a more powerful CPU or add additional CPUs.

CPU idle time and low utilization

Description: If CPU usage is consistently low while the server is underperforming, it could mean the server isn't fully utilizing its hardware capabilities.

Detection:

  • Run sar -u 1 5 and check the idle column to see how much of the CPU is sitting idle.
  • Use vmstat to look at the id (idle time) column.

Troubleshooting:

  • Review the server’s workload to see if tasks can be distributed more efficiently across the available CPU cores.
  • If the server is consistently underutilized, it may be oversized for its workload. Consider resizing or reallocating resources, especially in cloud environments where you're billed for CPU usage.
  • CPU underutilization could indicate bottlenecks in other areas, such as disk I/O or memory. Check disk and memory usage to ensure they are not the limiting factors.

CPU throttling

Description: CPU throttling happens when the CPU reduces its speed to prevent overheating or power consumption issues. It can cause a noticeable slowdown in system performance.

Detection:

  • Run cpupower frequency-info to check the current CPU frequency.
  • Look for messages related to CPU throttling in the system logs by running dmesg | grep "throttling".

Troubleshooting:

  • Throttling is often triggered by high temperatures. Use sensors to monitor CPU temperatures and ensure that proper cooling is being provided.
  • Make sure that CPU fans and heatsinks are clean and functioning properly. You may need to improve airflow or upgrade your cooling system.
  • On some servers, you may want to disable CPU power-saving features to avoid throttling. This can be done by adjusting power management settings in the BIOS or kernel.
  • If throttling persists despite proper cooling, it may be a sign of faulty hardware that needs to be replaced.

Monitoring memory usage

Next, let’s go over common memory issues, how to detect them, and practical troubleshooting steps.

Unexpectedly high memory consumption

Description: High memory consumption can lead to swapping (using disk space as virtual memory) and degraded performance.

Detection:

  • Run free -m to check the memory usage in megabytes, especially focusing on the used and free columns.
  • Run vmstat 5 to monitor memory usage over 5-second intervals.

Troubleshooting:

  • Analyze the outputs of top or htop to find processes that are consuming large amounts of memory. Keep an eye out for processes with high RES (resident memory) values.
  • If a single process is consuming too much memory, you can stop it using the kill command (kill -9 PID).
  • Sometimes, applications are configured to use more memory than necessary. Review application configuration files for memory limits and adjust accordingly.
  • If memory usage remains persistently high, you may need to add more RAM to your server.

Memory leaks

Description: Memory leaks occur when a program continually allocates memory but fails to release it after the task is complete.

Detection:

  • Run smem -t to display memory usage statistics for all processes, including total memory usage trends.

Troubleshooting:

  • If you have control over the application's source code, use valgrind --leak-check=yes ./your_program to check for memory leaks and fix accordingly.
  • Often, memory leaks are due to bugs in software. Check for updates or patches that address memory leak issues.
  • Use control groups (cgroups) or systemd service units to set memory limits for processes. This prevents a single process from consuming all available memory.

Monitoring disk I/O performance

Poor disk performance can result in latency, slow data access, and system freezes. This section talks about common disk-related issues.

High disk latency

Description: High disk latency occurs when the system takes too long to read from or write to the disk. This can slow down application response times, cause delays in system operations, and impact overall server performance.

Detection:

  • Run iostat -x 5 to monitor extended I/O statistics, especially the await column, which indicates the average time (in milliseconds) for I/O requests to complete.
  • Run dstat --disk to get real-time information on disk I/O activity, including read/write operations.

Troubleshooting:

  • Identify the processes causing the most disk I/O and either stop or limit them.
  • Consider upgrading your disks from HDDs to SSDs, which have significantly lower latency.
  • For servers with heavy disk I/O workloads, consider implementing RAID configurations (RAID 10, RAID 5) to improve both performance and redundancy.
  • Run disk-intensive tasks, such as backups or database indexing, during off-peak hours to avoid I/O bottlenecks.

Running out of storage space

Description: Running out of disk space is a common problem on CentOS systems, especially when logs or temporary files accumulate without being rotated or cleared.

Detection:

  • Run df -h to get an overview of disk space usage for all mounted filesystems. Focus on the % used column.
  • Run du -sh /* to see the total size of directories in the root filesystem. This helps identify which directories are using the most space.

Troubleshooting:

  • Start by deleting unnecessary files, logs, and temporary files. For example, you can find all the old or large log files in the /var/log directory and either rotate or delete them. You can also clear application caches, such as those found under /var/cache.
  • Use logrotate to automate the compression and removal of old log files.
  • If disk space remains a recurring issue, you may need to add more physical or virtual storage to your server.

Advanced diagnostics and troubleshooting

Sometimes, when basic monitoring doesn’t reveal the root cause of performance issues, you may need to experiment with advanced tools and techniques. This next section will cover some of them.

strace

strace is a powerful tool for tracing system calls made by a process. It can come in particularly handy if you need to diagnose slow or misbehaving applications.

  • Usage: To trace a running process, use strace -p <pid>, or to trace a command directly, run strace <command>.
  • Example: If an application is hanging during file I/O, strace can show you if it's waiting for a specific system call, like read() or write().

perf

perf is a comprehensive performance monitoring tool that provides detailed statistics on CPU usage, cache hits/misses, and other low-level system events.

  • Usage: Run perf top to view real-time CPU usage for specific functions or perf record to capture performance data for later analysis.
  • Example: You can use perf to track which processes are responsible for excessive CPU usage and drill down into specific functions causing the issue.

Kernel tuning

Kernel tuning allows you to modify the behavior of the Linux kernel to better suit your server’s workload. For example, if you want to optimize performance for networking or memory management, you can adjust the relevant kernel parameters to do so.

  • Usage: Use sysctl -w <parameter>=<value> to adjust kernel parameters on the fly. You can make changes permanent by editing the /etc/sysctl.conf file.
  • Example: To improve memory management, you can tune parameters like vm.swappiness, which controls the kernel's preference for swapping memory to disk.

CentOS best practices

To prevent many of the aforementioned issues and ensure the general health and performance of your CentOS machine, follow these best practices:

  • Keep CentOS and all installed packages up to date so that you have the latest security patches and performance improvements.
  • Set up regular automated backups using tools like rsync to prevent data loss in case of failure.
  • Regularly check logs in /var/log/ to catch early warnings of potential issues. Tools like logwatch or rsyslog can help automate this process.
  • Disable unnecessary services to free up resources. Use systemctl to manage service states.
  • Implement security best practices. For example, you should enable a firewall with firewalld or iptables, use SELinux, and enforce strong SSH configurations.
  • Use dedicated monitoring tools like Site24x7 to set up alerts for CPU, memory, disk usage, and other critical metrics. Site24x7 has purpose-built monitoring agents for all Linux operating systems, including CentOS.

Conclusion

CentOS is undeniably a robust, reliable, and feature-rich operating system. However, it can still encounter performance issues from time to time. Prompt troubleshooting of these issues will help you maintain a healthy server state.

Was this article helpful?

Related Articles

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 "Learn" portal. Get paid for your writing.

Write For Us

Write for Site24x7 is a special writing program that supports writers who create content for Site24x7 “Learn” portal. Get paid for your writing.

Apply Now
Write For Us