Table of Contents

Overview

Just today (a specific time which I cannot tell) we had a time drif on the VM which most probably caused a FS to become a Read-Only. We don't know if the time drift caused the FS to become Read-Only, but we had to investigate anyway. There are couple command which you can use.

General

General trouble shooting isn't focused at specific component of the operation system, but it gives general idea of the system if there is performance issue, corruption and so on.

Logs

You can always check the logs. Very important log is:

/var/log/messages

That log records information about structural changes and other errors

Journals

You can also inspect the journals as follows:

Check journals since date

root@k8s-master:~# journalctl --since 2020-05-01
-- Logs begin at Fri 2020-05-01 13:35:29 UTC, end at Fri 2020-05-01 14:02:20 UTC. --
May 01 13:35:29 k8s-master systemd-journald[395]: Runtime journal (/run/log/journal/) is 1.2M, max 9.9M, 8.6M free.
May 01 13:35:29 k8s-master kernel: Initializing cgroup subsys cpuset
May 01 13:35:29 k8s-master kernel: Initializing cgroup subsys cpu
May 01 13:35:29 k8s-master kernel: Initializing cgroup subsys cpuacct
May 01 13:35:29 k8s-master kernel: Linux version 4.4.0-177-generic (buildd@lgw01-amd64-035) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.12) ) #207-Ubuntu SMP Mon Mar 16 01:16:10 UTC 2020 (Ubuntu 4.4.0-177.2
May 01 13:35:29 k8s-master kernel: Command line: BOOT_IMAGE=/boot/vmlinuz-4.4.0-177-generic root=LABEL=cloudimg-rootfs ro console=tty1 console=ttyS0
May 01 13:35:29 k8s-master kernel: KERNEL supported cpus:
************************************************************************

Commands

There are several commands, which can help you identify usage with the system:

Check Corruption

[root@*********** log]# e2fsck -nv /dev/mapper/oravg-oradata
e2fsck 1.42.9 (28-Dec-2013)
Warning!  /dev/mapper/oravg-oradata is mounted.
Warning: skipping journal recovery because doing a read-only filesystem check.
/dev/mapper/oravg-oradata contains a file system with errors, check forced.
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (12969949, counted=13514969).
Fix? no

Free inodes count wrong (3931129, counted=3931397).
Fix? no


        1031 inodes used (0.03%, out of 3932160)
         398 non-contiguous files (38.6%)
           2 non-contiguous directories (0.2%)
             # of inodes with ind/dind/tind blocks: 0/0/0
             Extent depth histogram: 748/7
     2758691 blocks used (17.54%, out of 15728640)
           0 bad blocks
           3 large files

         742 regular files
          12 directories
           0 character device files
           0 block device files
           0 fifos
           0 links
           0 symbolic links (0 fast symbolic links)
           0 sockets
------------
         754 files

Specific

You can choose to trace a specific process and what commands to the kernel he is passing as follows:

Using Strace

Strace is a tool which can be installed and it is used to display what request the application is passing to the kernel to execute. Usually these commands are very low level: “Read pointer”, “Write data”, “Allocate latch” and etc.

Locate a session

First thing is to locate the session PID, in our example we want to monitor a NAGIOS client, therefore we have to minitor a process called: “xinet” which is a parent to the nagios:

[root@oraback libexec]# ps -leaf | grep -i xinet
0 S root      4622 31523  0  78   0 - 15298 pipe_w 10:33 pts/0    00:00:00 grep -i xinet
1 S root      6177     1  0  75   0 -  5416 -      Jan19 ?        00:00:16 xinetd -stayalive -pidfile /var/run/xinetd.pid

Trace

Once we have located the parent process, we can start tracing:

Start tracing

[root@oraback libexec]# strace -o /tmp/x -fp 6177
Process 6177 attached - interrupt to quit
Process 4644 attached (waiting for parent)
Process 4644 resumed (parent 6177 ready)
Process 4645 attached (waiting for parent)
Process 4645 resumed (parent 4644 ready)
Process 4644 suspended
Process 4646 attached (waiting for parent)
Process 4646 resumed (parent 4645 ready)
Process 4647 attached (waiting for parent)

This command, will trace all info, into a file called: /tmp/x

Verbose

In case of need, the trace can be extended to include more data using:

Enable Verbose mode

[root@oraback libexec]# strace -vo /tmp/x -fp 6177
Process 6177 attached - interrupt to quit
Process 5089 attached (waiting for parent)
Process 5089 resumed (parent 6177 ready)
Process 5091 attached (waiting for parent)
Process 5091 resumed (parent 5089 ready)
Process 5089 suspended
Process 5093 attached (waiting for parent)
Process 5093 resumed (parent 5091 ready)
Process 5093 detached
Process 5089 resumed
Process 5091 detached

After that we can examine the content of a trace file:

Check the log

[root@oraback libexec]# less /tmp/x
2280  <... nanosleep resumed> NULL)     = 0
2280  nanosleep({1, 0},  <unfinished ...>
2286  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2286  futex(0x38891f8, FUTEX_WAKE_PRIVATE, 1) = 0
2286  futex(0x3889224, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 161, {1528735731, 389058000}, ffffffff <unfinished ...>
2287  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2279  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2287  futex(0x38d5658, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2279  futex(0x31f49f8, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2287  <... futex resumed> )             = 0
2279  <... futex resumed> )             = 0
2287  futex(0x38d5684, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 319, {1528735726, 392983000}, ffffffff <unfinished ...>
2279  futex(0x31f4a24, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 319, {1528735726, 392985000}, ffffffff <unfinished ...>
2278  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2277  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2275  <... futex resumed> )             = -1 ETIMEDOUT (Connection timed out)
2278  futex(0x31f4968, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2277  futex(0x3884a78, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2275  futex(0x31f4b18, FUTEX_WAKE_PRIVATE, 1 <unfinished ...>
2278  <... futex resumed> )             = 0
2277  <... futex resumed> )             = 0
2275  <... futex resumed> )             = 0
2278  futex(0x31f4994, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1589, {1528735722, 422523000}, ffffffff <unfinished ...>
2277  futex(0x3884aa4, FUTEX_WAIT_BITSET_PRIVATE|FUTEX_CLOCK_REALTIME, 1589, {1528735722, 422489000}, ffffffff <unfinished ...>