Why do function “have memory” in REBOL?

In rebol I have written this very simple function:

make-password: func[Length] [
    chars: "QWERTYUIOPASDFGHJKLZXCVBNM1234567890"
    password: ""
    loop Length [append password (pick chars random Length)]
    password
    ]

When I run this multiple times in a row things get really confusing:

loop 5 [print make-password 5]

Gives (for example) this output:

  • TWTQW
  • TWTQWWEWRT
  • TWTQWWEWRTQWWTW
  • TWTQWWEWRTQWWTWQTTQQ
  • TWTQWWEWRTQWWTWQTTQQTRRTT

It looks like the function memorised the past executions and stored the result and than used it again!

I did not ask this!

I would like to have output similar to the following:

  • IPS30
  • DQ6BE
  • E70IH
  • XGHBR
  • 7LMN5

How can I achieve this result?

Best answer

A good question.

Rebol code is actually best thought of as a very stylized data structure. That data structure “happens to be executable”. But you need to understand how it works.

For instance, from @WiseGenius’s suggestion:

make-password: func[Length] [
    chars: "QWERTYUIOPASDFGHJKLZXCVBNM1234567890"
    password: copy ""
    loop Length [append password (pick chars random Length)]
    password
]

Take a look at the block containing append password.... That block is “imaged” there; what it really looks like under the hood is:

chars: **pointer to string! 0xSSSSSSS1**
password: copy **pointer to string! 0xSSSSSSS2**
loop Length **pointer to block! 0xBBBBBBBB**
password

All series are working this way when they are loaded by the interpreter. Strings, blocks, binaries, paths, parens, etc. Given that it’s “turtles all the way down”, if you follow through to that pointer, the block 0xBBBBBBBB is internally:

append password **pointer to paren! 0xPPPPPPPP**

One result of this is that a series can be referenced (and hence “imaged”) in multiple places:

>> inner: [a]

>> outer: reduce [inner inner]
[[a] [a]]

>> append inner 'b

>> probe outer
[[a b] [a b]]

This can be a source of confusion for newcomers, but once you understand the data structure you begin to know when to use COPY.

So you’ve noticed an interesting implication of this with functions. Consider this program:

foo: func [] [
    data: []
    append data 'something
]

source foo

foo
foo

source foo

That produces a possibly-surprising result:

foo: func [][
    data: [] 
    append data 'something
]

foo: func [][
    data: [something something] 
    append data 'something
]

We call foo a couple of times, it appears that the function’s source code is changing as we do so. It is, in a sense, self-modifying code.

If this bothers you, there are tools in R3-Alpha for attacking it. You can use PROTECT to protect function bodies from modification, and even create your own alternatives to routines like FUNC and FUNCTION that will do it for you. (PFUNC? PFUNCTION?) In Rebol version 3 you can write:

pfunc: func [spec [block!] body [block!]] [
    make function! protect/deep copy/deep reduce [spec body]
]

foo: pfunc [] [
    data: []
    append data 'something
]

foo

When you run that you get:

*** ERROR
** Script error: protected value or series - cannot modify
** Where: append foo try do either either either -apply-
** Near: append data 'something

So that forces you to copy series. It also points out that FUNC is just a function! itself, and so is FUNCTION. You can make your own generators.

This may break your brain and you may run screaming saying “this is not any sane way to write software”. Or maybe you will say “my God, it’s full of stars.” Reactions may vary. But it is fairly fundamental to the “trick” that powers the system and gives it wild flexibility.

(Note: The Ren-C branch of Rebol3 has fundamentally made it so that function bodies–and source series in general–are locked by default. If one wants a static variable in a function, you can say foo: func [x <static> accum (copy "")] [append accum x | return accum] and the function will accumulate state in accum across calls.)

I’ll also suggest paying close attention to what is actually happening on each run. Before you’ve run the foo function, data has no value. What happens is each time we execute the function and the evaluator sees a SET-WORD! followed by a series value, it performs the assignment to the variable.

data: **pointer to block! 0xBBBBBBBB**

After that assignment, you’ll have two references to the block in existence. One is its existence in the code structure that was established at LOAD time, before the function had ever been run. The second reference is the one that was stored into the data variable. It’s through this second reference that you are modifying this series.

And notice that data will be reassigned each time the function is run. But reassigned to the same value over and over again…that original block pointer! This is why you have to COPY if you want a fresh block on every run.

Grasping the underlying simplicity in the evaluator rules is part of the giddy interesting-ness. This is how the simplicity was dressed up to make a language (in a way you could twist to your own means). For instance, there is no “multiple-assignment”:

a: b: c: 10

That’s just the evaluator hitting a: as a SET-WORD! symbol and saying “okay, let’s associate the variable a in its binding context with whatever the next complete expression produces.”. b: does the same. c: does the same but hits a terminal because of the integer value 10…and then also evaluates to 10. So it looks like multiple-assignment.

So just remember that the original instance of a series literal is the one hanging in the loaded source. If the evaluator ever gets around to doing this kind of SET-WORD! or SET assignment, it will borrow the pointer to that literal in the source to poke into the variable. It’s a mutable reference. You (or the abstractions you design) can make it immutable with PROTECT or PROTECT/DEEP, and you can make it not-a-reference with COPY or COPY/DEEP.


Related Note

Some argue that you should never write copy []…because (a) you might get in the habit of forgetting to write the COPY, and (b) you are making an unused series every time you do it. That “blank series template” gets allocated, has to be scanned by the garbage collector, and no one ever actually touches it.

If you write make block! 10 (or whatever size you want to preallocate the block) you avoid the issue, save a series, and offer a sizing hint.

reprinted the original text:Why do function “have memory” in REBOL? - CodeDay