überSpark: add tool-chain support for casm blocks

This task will add next-gen tool-chain support for casm blocks. Specifically, we will be enhancing the
the überSpark vbridge found here: uberspark/src-nextgen/tools/vbridge-plugin at develop · uberspark/uberspark · GitHub

The following are the proposed subtasks:

  1. Ensure your fork of uberspark.git is up to date with the upstream uberspark.git repository. The following commands can achieve this on a local checkout of your fork.

    git remote add upstream https://github.com/uberspark/uberspark.git
    git fetch -a upstream  
    git checkout develop 
    git pull upstream develop 
    git push origin develop
    
  2. Build and install the uberspark tool-chain as detailed here: uberspark/install.rst at develop · uberspark/uberspark · GitHub

  3. Create a feature branch on your fork off the develop branch. Name it feature-casm-blocks

  4. We will use the hello_mul example uobjcoll for this task found here: uberspark/src-nextgen/coss-examples/hello_mul/ucoss_src/uobjcoll at develop · uberspark/uberspark

  5. The following command will run the vbridge for the hello_mul uobjcoll when in the above folder:

    uberspark verify -v
    
  6. Our goal is to remove the usage of CASM_FUNCDEL and CASM_FUNCCALL within: uberspark/main.c at develop · uberspark/uberspark · GitHub and CASM_FUNCDEF within: uberspark/nullfunc.cS at develop · uberspark/uberspark · GitHub

  7. The casm function will be defined now within: uberspark/nullfunc.cS at develop · uberspark/uberspark · GitHub as void casm_main_nullfunc(void) with the casm instructions as is. Note the casm_ prefix in the function definition. We will adopt this convention to prefix all casm function definitions with the casm_ prefix.

  8. Within: uberspark/main.c at develop · uberspark/uberspark · GitHub we will now declare the casm function via extern void casm_main_nullfunc(void) and invoke it via casm_main_nullfunc().

  9. Within the vbridge we will now go through the AST and before invocation of any casm_ function we will add the prologue and epilogue to the AST as currently added by the CASM_FUNCCALL macro definition within: uberspark/cpu.h at develop · uberspark/uberspark · GitHub

  10. You will modify the vbridge casm module here: uberspark/ubersparkvbridge_casm.ml at develop · uberspark/uberspark · GitHub to add new functions to accomplish the aforementioned tasks.

Working branches for testing, PRs and merge information is as below:

Working branches:
TBD

PR(s):

Merge(s):

Hi @amitvasudevan ,

Which file should I look at after run the uberspark verify -v command to make sure the modifications preserve the same assembly? Currently I only see a _staging folder. Should I look into the _triage as you mentioned here?

If so which command to use to generate the _triage folder? Thanks!

Hi @zhang-zichao ,

We only have a _staging folder; the _triage folder is deprecated now.

The uberspark verify command currently uses the pre-processor macros to inline the CPU hardware model into the AST. The idea with this task is to replace those pre-processor macros and directly plant those pre- and post- modeling instructions within the AST via the verification bridge plugin.

Hi @amitvasudevan ,

I have been trying to access the name of the function call like so

match stmt.skind with
      | Instr (Call (lv_opt, e, e_list, loc)) ->
        ...

Do you happen to know how can I get the function call name as a string? Thanks!

Can you try looking into the expression e? If I am not mistaken it will contain a var node that represents the function name…

Found it inside varinfo, thanks!

1 Like

Hi @amitvasudevan ,

Sorry for the delay. I am just wrapping up the task and have a dumb question about the epilogue. What does line 2609 here suppose to do? Is this like return location to jump to? How will it appear in the AST? Thanks!

This line basically returns the result from the CASM function per the C ABI, which on x86 is a 64-bit return value with upper 32-bits in EDX register and lower-32-bits in EAX register.

You will need to add the following statement “as-is” into the AST right after the CASM function call after popping off the formal input parameters from the stack:

(uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax)

Let me know if that helps. Thanks @zhang-zichao !

Thanks for the reply. I tried to add the statement as is but the statement doesn’t seem to belong to stmtkind (or any Cil_types.instr) in frama-c’s Cil_types. I also tried to manually place this statement inside main.c and see how it is represented in the AST but the statement is not shown is AST dump in https://github.com/uberspark/uberspark/blob/develop/src-nextgen/tools/vbridge-plugin/ubersparkvbridge_ast.ml.

Can you share your fork/branch details? Also I am assuming you are working with the hello_mul example uobjcoll and the main.c there, if so can you also post the terminal dump of the vbridge output AST so I can see what is going on?

Thanks!

I’m on develop branch working with hello_mul example. So if I manually modify main.c there to put this statement right after casm_main_nullfunc(); like so:

...
uint32_t main (uint32_t multiplicand, uint32_t multiplier){
    uint32_t result;
    char l_mybuf[5];
    char *p= &l_mybuf;

    /* CASM_FUNCCALL(main_nullfunc, CASM_NOPARAM); */
    casm_main_nullfunc();
    (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax);
...

This is what it show in the first AST dump (before ast manipulation) looks like:
Sample AST dump

Do we need this if the return type of the casm function is void?

Good point. No we don’t, but we still need to figure out how to add it to the AST for CASM functions that do return a value…

I see a result = multiplier * multiplicand statement in the AST here: Sample AST dump output · GitHub

Is that present in main.c?

Yes it is in main:

uint32_t main (uint32_t multiplicand, uint32_t multiplier){
    uint32_t result;
    char l_mybuf[5];
    char *p= &l_mybuf;

    /* CASM_FUNCCALL(main_nullfunc, CASM_NOPARAM); */
    casm_main_nullfunc();
    (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax);
    //whois(0);
    result = multiplicand * multiplier;

Ahh now I see the problem. You will need to surround the above statements in parantheses separated by comma (to make the entire thing an expression) as in the existing definition of #define CASM_FUNCCALL(fn_name, ...) within: uberspark/cpu.h at develop · uberspark/uberspark · GitHub

So try the following in main.c:

( casm_main_nullfunc(), 
    (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax)
) ;

Changing main.c to

uint32_t main (uint32_t multiplicand, uint32_t multiplier){
    uint32_t result;
    char l_mybuf[5];
    char *p= &l_mybuf;

    /* CASM_FUNCCALL(main_nullfunc, CASM_NOPARAM); */
		(casm_main_nullfunc(),
     (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax)
     );
    //whois(0);
    result = multiplicand * multiplier;

However, the statement is still not shown in AST dump
Sample AST dump 2

I think we could distinguish two cases:

  1. If return value of CASM function call is returned to some variable foo then I believe we could insert foo = (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax) into the AST.
  2. If the return value of CASM function call is not used then we can ignore the statement ( (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax))

What do you think?

Do you want to try this in main.c and see if it shows up in the AST?

Changing main.c to the following

#include <uberspark/include/uberspark.h>
#include <uberspark/hwm/cpu/x86/32-bit/generic/include/hwm.h>

/* CASM_FUNCDECL(void main_nullfunc(void *noparam)); */
/* extern void casm_main_nullfunc(void); */
extern uint64_t casm_main_nullfunc(void);

extern uint32_t init (void);
extern uint32_t test_call (uint32_t value);

//global variable
unsigned int g_int;
unsigned int g_total=0;
unsigned char g_char;

unsigned int g_int_address = &g_int;

//global ACSL predicate
/*@ predicate is_separated (char * x) = 
    \separated(&g_char, x) &&
    \separated(&g_total, x);
*/


/*@ 
    requires 0 <= multiplicand <= 10;
*/
uint32_t main (uint32_t multiplicand, uint32_t multiplier){
    uint32_t result;
    char l_mybuf[5];
    char *p= &l_mybuf;

    uint64_t foo;

    /* CASM_FUNCCALL(main_nullfunc, CASM_NOPARAM); */
		foo = casm_main_nullfunc();
    foo = (uint64_t)(((uint64_t)hwm_cpu_gprs_edx << 32) | hwm_cpu_gprs_eax);
     
    //whois(0);
    result = multiplicand * multiplier;

    p++;

    /*@ assert 1;  */
    *p='A';
    
    /*@ assert is_separated(p)  ; */
    *p++ = 'B';
    
    p = &g_char;

    /*@ assert is_separated(p)  ; */
    *p++ = 'C';

    g_int = 0;

 

    for(g_int = 0; g_int < 20; g_int++){
        g_total = g_total + g_int; 
    }   

    test_call(g_int++ + g_total++);

    return result;
}

and it shows up in the AST as

uberspark >> [docker] uint32_t main(uint32_t multiplicand, uint32_t multiplier)^M
uberspark >> [docker] {^M
uberspark >> [docker]   uint32_t result;^M
uberspark >> [docker]   char l_mybuf[5];^M
uberspark >> [docker]   uint64_t foo;^M
uberspark >> [docker]   char *tmp;^M
uberspark >> [docker]   char *tmp_0;^M
uberspark >> [docker]   unsigned int tmp_1;^M
uberspark >> [docker]   unsigned int tmp_2;^M
uberspark >> [docker]   char *p = (char *)(& l_mybuf);^M
uberspark >> [docker]   foo = casm_main_nullfunc();^M
uberspark >> [docker]   foo = ((unsigned long long)hwm_cpu_gprs_edx << 32) | (unsigned long long)hwm_cpu_gprs_eax;^M
uberspark >> [docker]   result = multiplicand * multiplier;^M
uberspark >> [docker]   p ++;^M
uberspark >> [docker]   /*@ assert 1 ≢ 0; */ ;^M
uberspark >> [docker]   *p = (char)'A';^M
uberspark >> [docker]   /*@ assert is_separated(p); */ ;^M
uberspark >> [docker]   tmp = p;^M
uberspark >> [docker]   p ++;^M
uberspark >> [docker]   *tmp = (char)'B';^M
uberspark >> [docker]   p = (char *)(& g_char);^M
uberspark >> [docker]   /*@ assert is_separated(p); */ ;^M
uberspark >> [docker]   tmp_0 = p;^M
uberspark >> [docker]   p ++;^M
uberspark >> [docker]   *tmp_0 = (char)'C';^M
uberspark >> [docker]   g_int = (unsigned int)0;^M
uberspark >> [docker]   g_int = (unsigned int)0;^M
uberspark >> [docker]   while (g_int < (unsigned int)20) {^M
uberspark >> [docker]     g_total += g_int;^M
uberspark >> [docker]     g_int ++;^M
uberspark >> [docker]   }^M
uberspark >> [docker]   tmp_1 = g_int;^M
uberspark >> [docker]   g_int ++;^M
uberspark >> [docker]   tmp_2 = g_total;^M
uberspark >> [docker]   g_total ++;^M
uberspark >> [docker]   ;^M
uberspark >> [docker]   test_call(tmp_1 + tmp_2);^M
uberspark >> [docker]   return result;^M
uberspark >> [docker] }^M

Great! Then let’s go with your proposed solution. Thanks @zhang-zichao !