Debugging Stack Traces

When BlueRange Mesh reboots, it tries to save the reason in the ramRetainStruct. The Stacked Program Counter (PC before e.g. the hardfaultHandler was called) will be saved in stacktrace[0]. Using this information it is possible to get the source code line where the error happened. Use the following:

<gccDir>\bin\arm-none-eabi-addr2line -e <fruitymeshDir>\_build\debug\NRF51_BOARD\BlueRange Mesh.out 0x35BA5
The address must be given as a hex value, otherwise addr2line doesn’t print a valid result. BlueRange Mesh must have been compiled with debug information using -g.

If the line is not given and you just get sth. like ??:?, you can also use readelf or any online elf parser such as e.g. this one.

<gccDir>\bin\arm-none-eabi-readelf -a <fruitymeshDir>\_build\debug\NRF51_BOARD\FruityMesh.out

This will give you a list of addresses and sizes of where different methods are located in the binary. You can try to find an address that is a bit lower than the reported error address. Then check if your address is within the size of that method and you will know that the issue happened within this method.

Reading Flash Of A Broken BlueRange Mesh Node

In order to analyze a broken node, the flash and UICR can be read back. A good idea is to first read back UICR and flash and then compare the .hex with either the last .hex flashed onto this beacon (from the fruitydeploy beacon directory) or flash the beacon again and read it back.

Read back code and UICR to a .hex file:

nrfjprog --readuicr --readcode E:\dump.hex

Convert both .hex files to a hex dump to view them with a standard text editor:

srec_cat C:\test\dump.hex -intel -o C:\test\dump.txt -hex_dump

Then, use The TortoiseDiff Tool to see the differences:

  • Page 0 is the master boot record and should not have been altered

  • The first ~120kb are the softdevice and should not have been altered

  • Afterwards comes the application that should not have any changes

  • The free space after the application can be any code or data

  • The settings are located 2 pages before the bootloader:

    • 0x3E400 (NRF51)

    • 0x76000 (NRF52)

  • The bootloader is located at:

    • #define FRUITYLOAD_ADDRESS 0x3EC00 (NRF51)

    • #define FRUITYLOAD_ADDRESS 0x78000 (NRF52)

In order to check the correctness of the settings, convert the hex dump of the settings pages to a C array using Sublime and load it into the flash of a node in the simulator. This permits to debug the startup behaviour. If only the settings are broken, debugging a real node is also an option.

Debug Running Node

When the nodes are somewhere in the building and one of them is stuck in a hard fault or another endless loop, debugging is complicated. Follow these steps:

  • If the node is connected with its USB cable, put a battery in the node, then unplug the USB cable.

  • Take the node to the workstation while it is running on battery

  • Use two USB cables that are both plugged into the docking station (do not plug one into the laptop and the other in the docking station because they might have a different ground).

  • One of the cables is plugged into the DevKit/debugger.

  • The other cable is plugged into the node. Debugger and node are now powered from the same ground level. This is important.

  • Next, connect the debug cable to the node.

  • In Eclipse, select the debug configuration "Running Target". Connect to running target must be selected and no reset must be performed.

  • The node should not have restarted during this procedure.

Finding Problems In A Live System

Often, an error only occurs sporadically and in a system that is only accessible remotely. BlueRange Mesh has some built-in functionality to find errors in these cases. First, live reports can be used (cf. StatusReporterModule). A number of live report statements may be added to the code and they will be reported through the mesh as soon as they get triggered. A drawback with live reports is that they may not be available if the trigger occurs while the mesh connection is lost. In this case, the error logging facility of the Logger class can be used. This way, the error is logged to RAM and can be requested once the node is available in the mesh again.

nRF Sniffer

The nRF sniffer is used with Wireshark. A custom dissector for BlueRange Mesh JOIN_ME packets has been written and is saved in the util/wireshark folder. Integration into Wireshark described at the beginning of file fruitymesh.lua.

The nRF Sniffer doesn’t work with an updated Segger firmware. It is necessary to flash the Segger firmeware JLink_V498c onto the beacon with JLinkConfig.exe, option: replace firmware. Afterwards, the sniffer should work together with the fixed sniffer application received by Nordic Semiconductor.

If the fixed sniffer application does not find the beacon, open the buggy official version until the beacon is found, then close it and reopen the fixed version.

Segger Virtual Com Port (VCOM)

A few words on the virtual COM port offered by some Segger debugger chips such as the on-board chips on the Nordic Development Kits: A few pins of the nRF controller are directly connected to some pins of the Segger debugger chip. The Segger chip will decode the UART signal and will then send it over USB to a host computer using a virtual com port. While this is a nice feature that means you do not have to use a dedicated UART-USB converter, such as an FTDI chip, it is very error prone because of the implementation in the Segger chip.

Problems that we have experienced so far:

  • If you send data without flow control and then switch flow control on, you must disconnect the board from power before you can send this data through. Apparently the Segger chip blocks the flow control in this case.

  • If you do not use flow control, many characters will get lost when sending a lot of data

  • If you use flow control e.g. a 1Mbit/s and send a lot of data, the Segger chip will raise the flow control lines half of the time and will severely limit the throughput

  • Even with flow control enabled, characters will be lost at high speeds. This was confirmed with a logic analyzer.