Digging Into Go Internals: Low-Level Insights for Reverse Engineers

Digging Into Go Internals: Low-Level Insights for Reverse Engineers

By Jacopo Ferrigno

Sooner or later, whether in your penetration testing career or when playing a CTF, you will end up having to reverse engineer some binary executable, be it a firmware, a mobile application or a native application running on a Linux or Windows system with a limited amount of time. Inevitably you will end up spending much of your time re-discovering library functions (“Oh, the function at 0x00412200 is a strlen”) or defining structures coming from such libraries (looking at you str::vector and std::string ). 

Therefore, being able to automatically recover structures and function names would be very helpful in speeding up the analysis. Consider that many people normally work on stripped binaries1, as they allow us to identify and focus on the interesting functions, like the one defined by the author of the executable. I might not care for the string formatting functions, but I might be very interested in a function decrypting files. Knowing which functions are the encryption primitives would greatly speed up the analyses. 

Luckily when we work on compiled Go2 binaries we have a lot of information hidden inside the executable file. Unfortunately, there are few tools integrated in reverse engineering platforms like IDA pro, Ghidra or Binary Ninja. One notable exception is GoReSym from Mandiant which is a standalone tool you can use to extract a series of information3.

This brought me to develop a Binary Ninja plugin to recover functions names in Go binaries. Given the amount of information it was possible to obtain, I started digging more into Go internals and the initially simple plugin started growing in features and complexity, now being able to: 

- Print the list of source files which contained the original source code 

- Recover the functions names and create the proper symbols 

- Extract the internal Go type definition as structures, lists, maps 

- Create types in Binary Ninja to aid the analysis 

- GoReSym importer for obfuscated binaries 

The next image shows an example of a function after the parsing with the plugin. We have a lot of interesting information as the function names, the structures, their fields and the name and type of such fields! 

Figure 1 - Function After parsing

The next section will introduce some of the theory necessary to understand the inner working of the plugin. If you are the kind of person that likes details, keep reading. Otherwise, you can jump to the next section where I cover the plugin’s capabilities! 

Some Theory

The gopclntab

To fully understand the inner working of the plugin the first piece of information one must understand is the gopclntab. The full name may give some hint about its scope, in fact gopclntab stands for “program counter to line number”. But let’s see it in action! 

Take for example the following go program which will try to open the file “/etc/shadow” and panic if there is any error. 

Figure 2 - Example go binary

If we try to execute the program without root privileges it will rightly panic, given that a normal user is not able to open the file.

Figure 3 - Go executable panic

The interesting bit of information is the output after the panic since it contains: 

- main.main: the name of the function:  

- /home/dipus/go/src/panic.go: the full path of the source file containing the function:  

- 7: the line in the source file which raised the panic 

This information all comes from the gopclntab! But the most interesting thing about the gopclntab is that it is automatically added to every go binary during compilation and it is needed by the runtime, so it is not possible to completely remove it (although it can be obfuscated; see Garble for example). 

To greatly simplify, the gopclntab is a structure used by the runtime to map the current instruction pointer to a specific line in the original source code. For example, it contains metadata about: 

- textStart: the address of the first go function 

- funcnametab: a structure containing the name of the functions in the binary:  

- functab: A structure containing details about the functions

Figure 4 - Extract of the LineTable from the sources

A keen eye has probably noticed the part of the LineTable which might be interesting to analyze: the functab. To properly analyze the functab structure, we need to first understand its content, which is a series of _func structures. This structure is the one containing interesting information when reverse engineering, like: 

- entryOff: the address of the function (calculated as offset from the textStart) 

- nameOff: the name of the function as offset inside the funcnametab structure

How runtime.new/runtime.newobject Works

Another interesting piece of information contained in Go binaries is the type of information. Take for example the below source code for a simple go executable. It creates two objects: one of type MyStruct and another of type TypeCollection, respectively on lines 8 and 9. 

Figure 5 - Source code with the creation of two objects

The compiled source is where it becomes interesting. In fact, the allocation of the new object calls the function runtime.new (for go 1.2) or runtime.newobject (for later versions) passing the definition of the type.  

Figure 6 - Compiled allocation of a new object

This is another treasure of information for a reverse engineer. This because the rtype definition, the object passed to runtime.new/newobject, contains information about: 

- size: the size of the object 

- kind: the base type of the object, like struct, array, boolean, etc. 

- str: the name of the type 

Figure 7 - Type definition inside go binaries

 Even just this information might be enough to identify the purpose of a structure (imagine a structure called EncryptionData), but we can extract even more information depending on the actual type of the variable. For example, structures have additional information like: 

- fields: the list of the fields of the structure of kind structField 

- name: the name of the field 

- typ: the type of the field 

Figure 8 - Example of struct type definition

It is quite evident to anyone that performed reverse engineering that being able to extract information from the gopclntab and the type definitions could greatly speed up the process and help focus on what is important.

Golang Binary Parser Plugin

All the information collected culminated in a Binary Ninja plugin, called Golang binary parser. This plugin is able to parse Go binaries starting from version 1.2. The latest version of the plugin went live as this blog post was published, and can be installed using the official plugin manager, as soon as the community plugin repository updates, or you can manually install a copy from the plugin Github repository here.

Let’s now showcase some of the functionalities of the plugin and the information that can be recovered. All the examples are based on nuclei version 3.3.10, which was chosen for two main reasons: it is open-source, so it is easy to compare the plugin’s output with the source code and it is a large binary, with a size of roughly 90 Mb, useful to stress-test the code.

Print file list

The first and fastest command is “Print file list”. Since the binary contains the list of all the source files of each function, it is possible to extract the full path of such files and get some information about the compilation environment. The plugin will, in order: 

  1. Parse the gopclntab 
  2. Parse each entry in the functab table and extract the information 
  3. Print the source path in the console 

The following is an example of the output of the plugin. 

Figure 9 - Output of the "Print file list" command

This can be very helpful when triaging executables and finding similarities in their build environments. The same command, as all the other ones, can be called using Binary Ninja in headless mode, as shown by this simple Python script. 

import binaryninja as bn 

import sys 

 

 

def main(): 

    if len(sys.argv) < 2: 

        print(f"{sys.argv[0]} ") 

        exit(-1) 

 

    bn.log.log_to_stdout() 

    test_file = sys.argv[1] 

 

    print(f"Loading {test_file}...", end='') 

    with bn.load(test_file) as bv: 

        print('done') 

        ctx = bn.PluginCommandContext(bv) 

        plugin = bn.PluginCommand.get_valid_list(ctx)['golang\\Print file list'] 

        plugin.execute(ctx) 

 

if __name__ == "__main__": 

    main()

Function Renaming

As already said, knowing the name of a function can greatly speed up the reverse engineering process. The plugin, by parsing the gopclntab can: 

- Rename the function that Binary Ninja identified 

- Create new functions if they were missed by standard analysis 

Basically, by taking the gopclntab as our ground truth we can enhance the analysis by adding information that would be otherwise lost or difficult to recover. By calling the command “Recover function names” the plugin will, in order: 

  1. Parse the gopclntab 
  2. Parse each entry in the functab table and extract the information 
  3. If the function does not exist in the database, it will create one 
  4. Define the function as a Binary Ninja symbol 
  5. Organize the function in the proper container 

During execution, the plugin will print some details in the Log tab (if you want to get more details just increase the Binary Ninja log level to debug). By executing the plugin, it was possible to recover a total of 88217 functions names, and 751new functions which Binary Ninja was not able to even identify. 

Figure 10 - Output of the function rename command

The following two images show an extract of the function main.main before and after the renaming. The first one shows the classical view with the function having the name sub_28f164, and not a lot of other information.

Figure 11 - Binary before function renaming

Now the state after the plugin execution: the function main.main has its proper name and it is organized in the correct container (on the left). The plugin was able to recover also the runtime functions like runtime.morestack_noctxt or runtime.gcWriteBarrier1, functions which we will be very happy to ignore when reverse engineering.

Figure 12 - Binary after function renaming

Recover Go Types

The next command, “Recover Go types”, builds on the previous ones and can extract the go type definition. This is very useful since it can give you information about the name of the variable, its type (is it a structure, a list?), its original package, its size, among others. 

The plugin will, in order: 

  1. Create the base Go types like arrays, byte, boleans 
  2. Collect cross-references to functions calls like runtime.new, runtime.newobject, runtime.makechan, etc 
  3. For each cross-reference extract the parameter passed to the function and save it 
  4. Parse each parameter and define the go type 

The following image shows the log output after the execution of the types extraction. The interesting bit is the number of go types extracted: 35453. 

The interesting thing is seeing how the functions gained  contextual meaning after the execution of the pluging, which was the main objective when developing it. The following image shows a function before the execution of the plugin and without the proper type information or functions names. I would say that the scope of the function as-is is unclear.

Figure 13 - Function before renaming and type parsing

Also the data passed to the function does not have particular sense, since it is a bunch of bytes at address 0x30e4e20. 

Figure 14 - Data containing type definition before parsing

Once parsed, the same data at 0x30e4e20 makes more sense. It is a STRUCT (looking at the kind field) called tablewriter.Table_type and it has 34 ( the Len of the fieds field) fields, each one with its own name (name field) and type (typ field).

Figure 15 - Data containing type definition after parsing

Also the function became instantly more understandable: it is an exported function ( the first character in the function name is uppercase), and it is creating a new table writer.

Figure 16 - Function after renaming and type parsing

Side note: if you do not want to parse the entire binary, maybe because you are only interested in a single type, there is a subcommand “Recover Go type at address”, which will start the analysis from the address under the cursor and parse only the necessary stuff.

Recover Binary Ninja Types

Now it is time to put all the analyses together and really ease the reverse engineering by letting a script do the busy work. Since we can recover the function names and then use them to recover the Go type definitions, it is only natural to automatically directly define the types in Binary Ninja! 

The next command, “Recover Binary Ninja types”, does exactly this, in fact it will: 

  1. Call the “Recover Go types” command (actually, it is an inherited command but this a detail)
  2. Recursively parse each Go Type definition to define the proper native type in Binary Ninja 

For “native type” I mean the type you can define in the Type view, as for example structure, enums, etc.  

Similarly to the other commands, the Log window will contain the messages printed by the plugin. The output is very similar to the “Recover Go types” one but it is nice to see that we were able to create 35453 native types! 

Figure 17 - Logs for the creation of the native types

Another way of seeing the plugin results is to confront the Types view before and after its execution. For example, the next figure clearly shows no types in the “User Types” section.

Figure 18 - Type view before the plugin execution

The same view after the section of the plugin is more interesting. Now we have a multitude of types with their proper C definition, all their fields complete with names and types. 

Figure 19 - Type view after the plugin execution

If we go back to our function tablewriter.NewWriter, we can see how having the type definition changes the readability of the function. Once we assign the type tablewriter.Table (actually a pointer) to the return value of runtime.newobject, the access to the structure table_writer becomes clear and understandable. 

Figure 20 - Function readability after type definition and assignment

Some Syntactic Analysis Sugar

Every analysis that I’ve presented tends to depend on the previous analyses. So, since I do not like having to call each analysis function myself in the correct order, I’ve introduced the two commands Parse Go executable and Parse Go executable with Binary Ninja types. 

The Parse Go executable will: 

  1. Rename the functions
  2. Comment the functions with their source file
  3. Recover the Go types 

while Parse Go executable with Binary Ninja types (as the name might suggest) will: 

  1. Rename the functions
  2. Comment the functions with their source file
  3. Recover the Go types
  4. Define the native Binary Ninja types

Bonus Command: GoReSym Importer

Sometimes you end up working with obfuscated Go binaries, this is normally done to hide the huge amount of information you can extract from them, so most of the normal analysis will break.  

One thing about me is that I do not like to reinvent the wheel so I searched for existing solutions to perform basic analysis and recover the minimum set of information necessary to kickstart my own analyses. I’ve found the solution in GoReSym (an open-source tool from Mandiant that is a “Go symbol parser that extracts program metadata”) and added two simple commands: 

- Import GoReSym which will parse and import the tool output 

- Run and Import GoReSym which will run the tool and import the results

The Future

The plugin was just published and yet, there is already a series of improvements and extensions which might be helpful and improve usability. 

The first one is the automatic application of the type as return for the functions like runtime.newobject and the other allocations functions, action which is currently manual. The plugin does not do it automatically because the go calling convention is, let’s say a moving target. So to avoid messing up the database and propagating wrong information, I decided to leave it to the user the task of assigning the type. 

There is a good chunk of information about function types which might be interesting to propagate, like the number of parameters, their types and the return type of the function. The issue is that a typical function type definition does not contain the address of the function, and so I cannot be 100% sure where to apply such definition. The workaround might be matching the name of the function type with the name of the recovered functions in the binary. I’ve yet to explore this path, but it looks promising. 

The released version has a few quirks that I’d like to iron out when parsing types (all information was extracted from the source code) like some fields with an unclear scope and parsing of peculiar declarations like function receivers of a function. 

Finally, the plugin would benefit from some memory optimizations. My initial idea was to write something which would work in Binary Ninja and as a stand-alone library, so some of the original design choices were not memory efficient. Now most of the functionalities are deeply intertwined with the Binary Nina API so the original assumptions do not hold anymore, apart from the sometimes non-optimal memory usage (but this might be a good excuse to be assigned a laptop with more RAM). 


1. https://en.wikipedia.org/wiki/Strip_(Unix)
2. https://go.dev/
3. https://github.com/mandiant/GoReSym

About the Author

Jacopo is a Senior Security Engineer at Anvil Secure. He has developed a broad skillset, with experience testing wired and wireless networks, web and mobile applications, and IoT devices by performing hardware analysis and reverse engineering of executable and custom protocols.

Jacopo has been a member of the mHACKeroni CTF team since 2018 and was part of the team that won the Hack‑a‑Sat finals in 2023. He is also a member of Tower of Hanoi, the Politecnico of Milano CTF team.

Tools

awstracer - An Anvil CLI utility that will allow you to trace and replay AWS commands.


awssig - Anvil Secure's Burp extension for signing AWS requests with SigV4.


dawgmon - Dawg the hallway monitor: monitor operating system changes and analyze introduced attack surface when installing software. See the introductory blogpost.


HANAlyzer - A tool that automates SAP HANA security checks and outputs clear HTML reports. See the introductory blogpost.


nanopb-decompiler - Our nanopb-decompiler is an IDA python script that can recreate .proto files from binaries compiled with 0.3.x, and 0.4.x versions of nanopb. See the introductory blogpost.


SAPCARve - A utility Python script for manipulating SAP's SAR archive files. See the introductory blogpost.


ulexecve - A tool to execute ELF binaries on Linux directly from userland. See the introductory blogpost.


usb-racer - A tool for pentesting TOCTOU issues with USB storage devices.

Recent Posts