Debugging Analysis of Kernel panics and Kernel oopses using System Map:

Linux Kernel Debugging Introduction:

There are various ways to debug an kernel like debugging by printing ,Using kernel symbols,Using a kernel debugger but this page describes some tricks and techniques to help Interpreting an Oops message and Kernel panic but before going forward we should understand what is kernel OOPS and panic.

A kernel panic is an action taken by an operating system upon detecting an internal fatal error from which it cannot safely recover and force the system to do a controlled system hang / reboot due to a detected run-time system malfunction (not necessarily an OOPS). The operation of the panic kernel feature may be controllable via run-time sysconfig settings such as hung task handling. This is a kernel panic.

OOPS are due to the Kernel exception handler getting executed including macros such as BUG() which is defined as an invalid instruction. Each exception has a unique number. Some “oops”es are bad enough that the kernel decides to execute the panic feature to stop running immediately. This is a kernel crash optionally followed by invoking a panic.

When a Kernel OOPS is encountered in a running kernel an OOPS message like ([ 67.994624] Internal error: Oops: 5 [#1] XXXXXXXXXXX) is displayed on the screen. The OOPS message contains the following: the values of the CPU registers, the address of the function that invoked the failure i.e PC, the stack, and the name of the current process executing. By using this OOPS statement, you can begin to debug the specific problem in the kernel. However, sometimes this OOPS message is insufficient.

How to interpret oopses with System.map

In Linux, the System.map file is a symbol table used by the kernel.

The System.map is required when the address of a symbol name, or the symbol name of an address, is needed. It is especially useful for debugging kernel panics and kernel oopses. The kernel does the address-to-name translation itself when CONFIG_KALLSYMS is enabled so that tools like ksymoops are not required.

More detailed information can be found here System.map.

Note: Addresses inside System.map may change from one build to the next or in another word new System.map is generated for each build of the kernel however it is must to have System.map of the same Linux kernel on which Kernel panics/oopses has been reported to debug the problem.

Note in the kernel backtraces in the logs, the kernel finds the nearest symbol to the address being analysed. Not all function symbols are available because of inlining, static, and optimisation so sometimes the reported function name is not the location of the failure.

How to Debug Kernel panics and oopses:

 

67.994406] Unable to handle kernel paging request at virtual address 02120bc4
[   67.994495] pgd = 94240000
[   67.994553] [02120bc4] *pgd=00000000
[   67.994624] Internal error: Oops: 5 [#1] PREEMPT SMP ARM
[   67.994926] CPU: 0    Not tainted  (3.8.13.23-XXXXXXXX #1)
[   67.994996] PC is at add_range+0x14/0x6c
[   67.995056] LR is at XXXXXXX+0x38/0x44
[   67.995117] pc : [<80049F3C>]    lr : [<8004a1ec>]    psr: 20000013
[   67.995117] sp : 9423fd90  ip : 9423fda8  fp : 9423fda4
[   67.995176] r10: 00000000  r9 : 9423ff60  r8 : 8000da84
[   67.995233] r7 : 000041fd  r6 : 00000081  r5 : aa068088  r4 : aa068088
[   67.995290] r3 : ac8ceb80  r2 : 021ab618  r1 : 00000000  r0 : 02120bc0
[   67.995348] Flags: nzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment user
[   67.995406] Control: 10c5387d  Table: 2424004a  DAC: 00000015
[   67.995462] Process cat (pid: 1352, stack limit = 0x9423e238)
[   67.995518] Stack: (0x9423fd90 to 0x94240000)
[   67.995577] fd80:                                     aa068088 aa068088 9423fdb4 9423fda8

Here is the kernel backtrace where Kernel is crashing   at “add_range” function. (Actually nearest function symbol to the crash). Let’s analyze step by step.

1. Crash occurs at below location as per Back Trace.

PC is at add_range +0x14/0x6c

 2. Grep/find add_range() in System.map file and note down symbol name address i.e. 80049f28

#Linux-Kernel # grep add_range System.map 
80049f28 T add_range

3.Replace add_range symbol name address in “add_range+0x14” = 80049f28 + 0x14 = 80049F3C

4.“80049F3C” should be same as PC Address in the Back trace .Wow it is same so it means that Kernel version I am using and on which issue is reported is same (also depends on same .config settings). Let’s move to next step.

5.run objdump on vmlinux to get the disassembly and detail on objdump program and vmlinux can be fetched from below hyperlinks.

objdump: is a program for displaying various information about object files. For instance, it can be used as a disassembler to view executable in assembly form.

vmlinux: is a statically linked executable file that contains the Linux kernel in one of the object file formats supported by Linux, The vmlinux file might be required for kernel debugging, symbol table generation or other operations,

#objdump -D -S --show-raw-insn --prefix-addresses --line-numbers vmlinux > objdump

6.Find “add_range” in vmlinux.objdump and look for PC address calculated above i.e.   80049F3C

80049F3C <add_Range+0x14> e5903004  ldr     r3, [r0, #4]

7.The crash point can be identified below.

ldr     r3, [r0, #4] = r0+4 = 02120bc0+4 = 02120bc4 /*replace r0 with r0 register value from the Back Traces*

8.Wow ! This  is same as fault address

Unable to handle kernel paging request at virtual address 02120bc4

Conclusion: Here r0 is pointing to invalid address and from disassembly found out where r0 is pointing and found out why r0 is pointing to invalid address.

Using GDB to find the location where your kernel panicked or oopsed.

A quick and easy way to find the line of code where your kernel panicked or oopsed is to use GDB list command. You can do this as follows.

Let’s assume your panic/oops message says something like:

 

[  174.507084] Stack:
[  174.507163] 
ce0bd8ac 00000008 00000000 ce4a7e90 c039ce30 ce0bd8ac c0718b04 c07185a0
[  174.507380] 
ce4a7ea0 c0398f22 ce0bd8ac c0718b04 ce4a7eb0 c037deee ce0bd8e0 ce0bd8ac
[  174.507597] 
ce4a7ec0 c037dfe0 c07185a0 ce0bd8ac ce4a7ed4 c037d353 ce0bd8ac ce0bd8ac
[  174.507888] Call Trace:
[  174.508125] 
[<c039ce30>] ? sd_remove+0x20/0x70
[  174.508235] 
[<c0398f22>] ? scsi_bus_remove+0x32/0x40
[  174.508326] 
[<c037deee>] ? __device_release_driver+0x3e/0x70
[  174.508421] 
[<c037dfe0>] ? device_release_driver+0x20/0x40
[  174.508514] 
[<c037d353>] ? bus_remove_device+0x73/0x90
[  174.508606] 
[<c037bccf>] ? device_del+0xef/0x150
[  174.508693] 
[<c0399207>] ? __scsi_remove_device+0x47/0x80
[  174.508786] 
[<c0399262>] ? scsi_remove_device+0x22/0x40
[  174.508877] 
[<c0399324>] ? __scsi_remove_target+0x94/0xd0
[  174.508969] 
[<c03993c0>] ? __remove_child+0x0/0x20
[  174.509060] 
[<c03993d7>] ? __remove_child+0x17/0x20
[  174.509148] 
[<c037b868>] ? device_for_each_child+0x38/0x60
[  174.509241] 
[<c039938f>] ? scsi_remove_target+0x2f/0x60
[  174.509393] 
[<d0c38907>] ? __iscsi_unbind_session+0x77/0xa0
[scsi_transport_iscsi]
[  174.509699] 
[<c015272e>] ? run_workqueue+0x6e/0x140
[  174.509801] 
[<d0c38890>] ? __iscsi_unbind_session+0x0/0xa0
[scsi_transport_iscsi]
[  174.509977] 
[<c0152888>] ? worker_thread+0x88/0xe0
[  174.510047] 
[<c01566a0>] ? autoremove_wake_function+0x0/0x40

Lets say you want to know what line of code represents sd_remove+0x20/0x70. cd to your directory of your kernel tree and run gdb on the “.o” file which has the function sd_remove() in this case in sd.o, and use the gdb “list” command, (gdb) list *(function+0xoffset), in this case function is sd_remove() and offset is 0x20, and gdb should tell you the line number where you hit the panic or oops. This worked reliably for most cases.

#gdb sd.o
gdb)list *(sd_remove+0x20)
0x1650 is in sd_remove
(Kernel/linux-xxx/drivers/scsi/sd.c:2125).
2120    static int sd_remove(struct device *dev)
2121    {
2122            struct scsi_disk *sdkp;
2123    
2124            async_synchronize_full();
2125            sdkp = dev_get_drvdata(dev);
2126           
blk_queue_prep_rq(sdkp->device->request_queue, scsi_prep_fn);
2127            device_del(&sdkp->dev);
2128            del_gendisk(sdkp->disk);
2129            sd_shutdown(dev);
(gdb)

so dev_get_drvdata()is the function where crash has been happened and Lets analyze why dev_get_drvdata(dev)is crashing.

 

Disassembling the kernel

Cross tools are needed.

The objdump utility

The main utility used to do it is the objdump one.

arm-none-linux-gnueabi-objdump –dr vmlinux  /*If We have object code handy then, we can disassemble the individual object file also like objdump -S panic.o"

gdb on vmlinux

One can disassemble a built kernel using gdb on the vmlinux image. This is useful when one gets a kernel Oops message and a stack dump – one can then disassemble the object code and see where the Oops is occurring. For example:   

#arm-none-linux-gnueabi-gdb –silent vmlinux
#disassemble printk
Dump of assembler code for function printk:
0xffffffff8023dce0 <printk+0>:  sub    $0xd8,%rsp
0xffffffff8023dce7 <printk+7>:  lea    0xe0(%rsp),%rax
0xffffffff8023dcef <printk+15>: mov    %rsi,0x28(%rsp)
0xffffffff8023dcf4 <printk+20>: mov    %rsp,%rsi
0xffffffff8023dcf7 <printk+23>: mov    %rdx,0x30(%rsp)
0xffffffff8023dcfc <printk+28>: mov    %rcx,0x38(%rsp)
0xffffffff8023dd01 <printk+33>: mov    %rax,0x8(%rsp)
0xffffffff8023dd06 <printk+38>: lea    0x20(%rsp),%rax
0xffffffff8023dd0b <printk+43>: mov    %r8,0x40(%rsp)
0xffffffff8023dd10 <printk+48>: mov    %r9,0x48(%rsp)
0xffffffff8023dd15 <printk+53>: movl   $0x8,(%rsp)
0xffffffff8023dd1c <printk+60>: movl   $0x30,0x4(%rsp)
0xffffffff8023dd24 <printk+68>: mov    %rax,0x10(%rsp)
0xffffffff8023dd29 <printk+73>: callq  0xffffffff8023d980 <vprintk>
0xffffffff8023dd2e <printk+78>: add    $0xd8,%rsp
0xffffffff8023dd35 <printk+85>: retq   
End of assembler dump.

 

How to Interpret Assembly Language(EABI C function call mapping to ARM registers)

First of all we should disassemble the kernel function with either OBJDUMP utility or using gdb on vmlinux kernel image by referring above mention section in Document. For example here is the dis-assembly of add_range() kernel function where I will demonstrate how this all works.This will be different depending on how the compiler optimizes,but it should give an idea.

#gdb disassemble add_range
Dump of assembler code for function add_range:
   0x8004c4d8 <+0>:    mov    r12, sp
   0x8004c4dc <+4>:    push    {r4, r5, r6, r7, r11, r12, lr, pc}
   0x8004c4e0 <+8>:    sub    r11, r12, #4
   0x8004c4e4 <+12>:    ldrd    r6, [r11, #4]
   0x8004c4e8 <+16>:    ldrd    r4, [r11, #12]
   0x8004c4ec <+20>:    cmp    r7, r5
   0x8004c4f0 <+24>:    cmpeq    r6, r4
   0x8004c4f4 <+28>:    bcs    0x8004c510 <add_range+56>
   0x8004c4f8 <+32>:    cmp    r2, r1
   0x8004c4fc <+36>:    lsllt    r3, r2, #4
   0x8004c500 <+40>:    addlt    r2, r2, #1
   0x8004c504 <+44>:    addlt    r1, r0, r3
   0x8004c508 <+48>:    strdlt    r6, [r0, r3]
   0x8004c50c <+52>:    strdlt    r4, [r1, #8]
   0x8004c510 <+56>:    mov    r0, r2
   0x8004c514 <+60>:    ldm    sp, {r4, r5, r6, r7, r11, sp, pc}
End of assembler dump.

Corresponding Kernel C function.

int add_range(struct range *range, int az, int nr_range, u64 start, u64 end)
{
        if (start >= end)
                return nr_range;
        /* Out of slots: */
        if (nr_range >= az)
                return nr_range;
        range[nr_range].start = start;
        range[nr_range].end = end;
        nr_range++;
        return nr_range;
}

Lets analyse the first 3 lines which is more or less same/common across all functions and here r12=IP(Intra-Procedure-call scratch register),r11=FP(Frame pointer).The FP keeps track of the variables from function to function.It is a frame on the stack of the function.please explore basic frame layout for more detail.so in simple words SP is where the stack is and FP is where the stack was like PC and LR register.

 0x8004c4d8 <+0>:    mov    r12, sp   /*get a copy of sp*/
 0x8004c4dc <+4>:    push    {r4, r5, r6, r7, r11, r12, lr, pc} /*Save the frame,link register,program counter and other Register on to the stack */
 0x8004c4e0 <+8>:    sub    r11, r12, #4 /*Set the new frame pointer.*/

The next 2 instruction passing 4 bytes and 12 bytes from the Frame pointer into the r6 and r4 registers for an function call and in other words values stores at r11+#4 would be stored in r6 and values stores at r11+#12 would be stored in r4.

Note:LDRD is used to store Double Word instruction however content also will be loaded into r7 and r5 register.This function call is dealing with 64 bit data so 64 bit data is manipulated in stack only.

  0x8004c4e4 <+12>:    ldrd    r6, [r11, #4]
   0x8004c4e8 <+16>:    ldrd    r4, [r11, #12]

Note:The first four registers r0-r3 are used to pass argument values into a subroutine and to return a result value from a function.so R0=range,R1=az,R2=nr_range,R3=start,R4=end.

The next instructions  can easily mapped with c code

Note:Underlying mapping somewhat different from the normal C to Assembly conversion mapping because here 64-bit value is being passed in Function call argument which is u64 start and u64 end and to deal with 64 bit data it has to be stored in register pair and can be retrived using ldrd instruction from stack using  frame pointer.

 0x8004c4ec <+20>:    cmp    r7, r5    /*first instruction compare r7 and r5 register i.e store 32 bit LSB for start & End whose value is stored in stack.       
 0x8004c4f0 <+24>:    cmpeq    r6, r4  /*This next instruction performs an comparison only if the result of above  { cmp    r7, r5 } instruction found true(i.e r7=r5). 
 0x8004c4f4 <+28>:    bcs    0x8004c510 <add_range+56>
 0x8004c4f8 <+32>:    cmp    r2, r1   /*This instruction compare values stored in resisters r2 and r1 which are passed argument values i.e nr_range and az. 

Corresponding C code is 

if (start >= end)
                return nr_range;
/* Out of slots: */
        if (nr_range >= az)
                return nr_range;

Lets move to next instructions .

0x8004c4fc <+36>:    lsllt    r3, r2, #4 
0x8004c500 <+40>:    addlt    r2, r2, #1
0x8004c504 <+44>:    addlt    r1, r0, r3
0x8004c508 <+48>:    strdlt    r6, [r0, r3]
0x8004c50c <+52>:    strdlt    r4, [r1, #8]

Corresponding C code is 

range[nr_range].start = start;
        range[nr_range].end = end;
        nr_range++;

 

0x8004c510 <+56>:    mov    r0, r2 /*move r2 content into r0 register which can be return back and As I said R0-R3 are also used to hold return value from function.
0x8004c514 <+60>:    ldm    sp, {r4, r5, r6, r7, r11, sp, pc} /*LDM is used to load multiple instructions and similar to  POP stack instruction.

Corresponding C code is
 
    return nr_range;

Here are the ARM Register definition for your reference and please keep in mind these register when you are mapping C function to ARM registers and more detailed information can be found here.

Note:In addition to above technique’s,don’t forget to visit Tour of ARM Assembly(http://www.coranac.com/tonc/text/asm.htm ) which will help you to understand following’s in deeper detail and after going through it, I Bet you,you should be able to produce some nice ARM assembly, or at least be able to read it well enough.

  • General assembly

  • The ARM instruction set

References:

  1. Procedure Call Standard for the ARM® Architecture
  2. ARM Procedure Call Standard
  3. Arm Instruction set manual