Issue66

Title write attributes of __builtin__ dont affect the same name in the global
Priority bug Status deferred
Superseder Nosy List kayhayen, nssh
Assigned To kayhayen Keywords optimization, wrong_execution

Created on 2013-03-12.22:02:19 by nssh, last changed by admin.

Files
File name Uploaded Type Edit Remove
build-config.patch nssh, 2013-03-14.05:54:12 application/octet-stream
definition.patch nssh, 2013-03-14.02:21:58 application/octet-stream
example_builtin_overwrite.py nssh, 2013-03-13.13:06:44 application/octet-stream
replace_builtin.py nssh, 2013-03-14.02:22:16 application/octet-stream
replace_test.py nssh, 2013-03-14.02:22:11 application/octet-stream
step-01-inject-builtin.patch nssh, 2013-03-15.07:44:02 application/octet-stream
test_builtin.py nssh, 2013-03-24.11:58:56 application/octet-stream
Messages
msg283 (view) Author: nssh Date: 2013-03-24.11:58:56
Thanks continue to pay attention to it.
Add a test script (test_builtin.py) for case you cant handle yet.

about site.py
move builtin type modify code to a function xxx
call function xxx in MOD_INIT_DECL( __main__ ) first

about test 01
dont convert builtin function argument error to typeerror raise code,
actually optimize exception that 100% will occur are no effect, they are just bug.

about test 02
dont included builtin function name to print statement
actually optimize function like print is not need,  they wont perform many times

about test 03
difficult to solve, i dont have any idea
msg282 (view) Author: kayhayen Date: 2013-03-23.18:24:41
I was just merging your code as a first step, and then adding code to only allow 
writes to builtin functions that we support, i.e. otherwise RuntimeError.

One issue is that I think "site.py" is still run unguarded, and before we get a 
chance. This is a separate issue (embedding it). On my system, some builtin 
values come from "site.py" mechanism.

Yes, I was curious how well it fares, and as I am ill, and don't get much done, 
this was a nice task for me in this state.

I had to change "repr" to "compiled_module", so it compares well to "<type 
'module'>" in the programs test. And I removed the original value stuff. It 
depends on strings that are not initialized yet. And it's not used yet either, 
so why have it now.

Thanks for your patch. Already I listed you in the README.txt as contributor 
btw, you are giving very well appreciates impulses. Plus, the compiled builtin 
module code is grand. I didn't think that could work so easily.

This definitely will have an impact on global module variables. I now believe 
that it should be possible to avoid dictionaries for module variables entirely, 
and we will prototype it with the builtins.
msg242 (view) Author: nssh Date: 2013-03-15.10:51:54
I think about more.
I think the root of the problem is nuitka see the attributes of __builtin__ as
constant, but user can change it at the runtime, so they are not *true constant*.
For that nuitka made a lot of work, like check the name in locals, if it already
define by args or = that dont convert it, etc..
But still can't analysis the all case at compile time.

We can keep add and modify many code to analysis and let it switch.
But that's a bad path.
I hope nuitka can do something more simple and efficient.
I will submit a wish to explain this.

Now we maybe can add the RuntimeError raise code to disallow written to
__builtin__ when attribute name in builtin_name_all.
And a option to allow user disable optimization for some attribute.
That's brings more peace.
msg241 (view) Author: kayhayen Date: 2013-03-15.10:29:43
Looking more at it, the builtin module patch is a nice optimization already as it 
is. I believe, we can do away with "refresh" calls everywhere and use this, also 
without penalty. Sweet. Later we should have a "BuiltinsStub.cpp" for it, but 
that's later.

The string comparison, I believe could be faster, based on hashes. I am going to 
provide that, as it applies to parameter parsing. For known strings, you can first 
check hash value (which normally will directly mismatch for that code) and "is" 
identity.

The "name" and "_python_string_open" will normally be equal due to interning of 
strings. If they are not "==", and only if the hash is "==", we have to call 
string comparison. But that's only optimization.
msg240 (view) Author: kayhayen Date: 2013-03-15.10:12:08
I am impressed by step 1 code, much shorter than anticipated.

As for the many cases, solving it in generator may be too tricky to do. Instead, 
we need to covert ot to *not* convert to "ExpressionBuiltinRef" but to something 
else.

I thinking alone the lines of "ExpressionComparisonIs( condition = check on 
builtins stub flag, ... )

The cases might be "compile time builtin ref (can be optimized)" and "run time 
builtin ref" (cannot). These two will need to exist. One will be "compile time 
constant". Currently builtin ref is, which it should not be, only one of these 
two should.

The one which refers to compile time would generate code to builtins at startup, 
or replace during optimization. The second generate code to value as you provide 
it from builtins stub. 

Then optimization will see (ah, a condition I cannot tell, allow both, and do 
not optimize away) and do the proper thing, and not even optimize str(open).

Doing it for builtin calls only is a mistake of mine, sorry, if that confused 
you. It needs to be pushed more forward to when the builtin is referenced.

You could make your builtin stub provide variables for each builtin current 
value, and the original value, and a bool maybe, if it's changed.

Then, type inference will have the task to determine if such change can have 
occured or not, if optimization shall achieve it. Currently, you can just live 
without optimization of that check, and have the desired behavior.

And don't get mad, except if it helps. For me it did. :-)
msg239 (view) Author: nssh Date: 2013-03-15.07:06:47
Second step are very complex much than I expect.
I got (switch open...) ? (open_file...) : (call_function...) work but there many
case need handle because nuitka see them as a constant.
such as this code
a = str(open)+str(range)+"test"
It will convert to a constant string node include "<built-in function
open><built-in function range>test"
I need convert it to (switch open... && switch range) ? (const_string...) :
(eval_object...)
and
(switch open) ? (raise exception...) : (call_function...)
call_function ( ..., (switch open ) ? (const_tuple...) : (create tuple...) )
It makes me mad.
msg238 (view) Author: nssh Date: 2013-03-15.02:04:44
Add first step patch,
written of builtin module detection are done,
but now I don't known how to merge function call node and open call node,
it will take more time,
hope you can give some hints for it.
msg237 (view) Author: kayhayen Date: 2013-03-14.13:01:46
I think it helps to shape the image of what needs to be done. I will add the 
content to the developer manual later.

Your proposal is sound. If you can implement such a trap of overloading "open", 
that would be great. This "builtins" module replacement, I suggest you prototype 
it in Python first. That way you quickly get an idea of what it takes.

Once you have it running in Python with CPython, we just need to integrate it, 
which probably will be easy then. Once you know what you need to do, I am sure 
you will find it a lot easier than you expect. It's quite accessible source code, 
but I might be biased. :-)

One minor note, it should generate code like this:

usingOriginalOpen ? CALL_FUNCTION_WITH_POSARGS( _mvar___main___open.asObject0(), 
_python_tuple_str_plain_filename_str_plain_mode_tuple ) : OPEN_FILE( 
_python_str_plain_filename,
_python_str_plain_mode, NULL )

Basically, you just need to touch "OPEN_FILE" in Generator a tiny bit for that, 
and to plug your builtins trap early on.

So yes, please go ahead, and it sounds like something I would definitely want to 
merge.
msg236 (view) Author: nssh Date: 2013-03-14.12:46:33
Thank you for the very detailed reply, that's a good plan.
In my imagine, if we can swap __builtins__ that things maybe more simple,
first step, add global value "extern char usingOrignalOpen = 1;"
second step, use it to choise the optimization at runtime
if (usingOrignalOpen)
    DECREASE_REFCOUNT( OPEN_FILE( _python_str_plain_filename,
_python_str_plain_mode, NULL ) );
else
    DECREASE_REFCOUNT( CALL_FUNCTION_WITH_POSARGS( _mvar___main___open.asObject0(), 
        _python_tuple_str_plain_filename_str_plain_mode_tuple ) );
third step, while modify write to __builtins__.open, change usingOrignalOpen to 0
both of overhead and code modifier will be very small
(you are adding a pointer of function PyObject *_python_original_builtin_xxx and
I'm imaging adding a integer for check every time)
I will try to write a patch do that, but at now I do not quite understand
nuitka's source code so it may not success.
Sorry for delay your progress on a deferred issue ...
msg235 (view) Author: kayhayen Date: 2013-03-14.10:06:08
I made an experimental change to the implementation, and it kind of works well, 
but it is not useful yet.

In "OptimizeBuiltinCalls.py" now the result is wrapped with a condition "is" 
that checks against the original value at program start of builtins, and if it 
is changed raises a runtime error.

This is about making things visible in the node tree. In Nuitka things that are 
not visible in the node tree tend to be wrong. You probably saw me pushing 
around information to the node tree a lot already.

Later versions, it will become able to determine it has to be the original at 
compilt time, then the condition will be optimized away, together with the slow 
path. Or the other path, if it won't be.  Then it will be optimized away, or if 
doubt exists, it will be correct. That is the goal.

Right now, the change would mean to effectively disable all builtin call 
optimization, which is why I will push it as an infrastructure, but it won't 
really do anything at all yet. I am going to refine it further. It would be 
sweet, if at some relatively early stage, adding "hints.volatile( range )" at 
program main would suffice for you. And at a later stage, you would be able to 
remove it, as it's finding that out by itself.

I was also thinking that "__builtins__" is a builtin value itself. We could swap 
it with something that wraps things. But I kind of recall that sometimes it's a 
dictionary only, and only sometimes the module. I think at least in "__main__" 
module, it's a dictionary, where in other modules, it's a module object. But 
that only increases chances to get away with making it an object that helps us 
track changes to builtins.

It probably would be able to tell writes to builtin values, so we could change 
code pointers around to less optimistic versions of functions.

So, this is all about having "guards". Something, I have on the roadmap. It was 
interesting to check it out. Only needs a few nodes placed around the result of 
optimization. Unmask the "if False" in OptimizeBuiltinCalls.py if you want to 
play with it. For me it worked for "__builtins__.range = len; range(5)" to at 
least detect the problem.

Making the compatible version, will also require a full listing of all builtins, 
which is typing work merely, but not needed now. And a way to stop builtin 
optimization from optimizing builtin calls that it used in a wrap. Probably just 
some flag to indicate it when it visits it to skip it. That's for later.

But should we have that both, I figure, we could not raise a runtime error, but 
just do the correct thing, and you could do it for builtins from your build-conf 
and be fine. So if you want to work on this, you are welcome to give it a shot.

Yours,
Kay
msg234 (view) Author: kayhayen Date: 2013-03-14.07:59:40
I think I got it. Forgive me if I don't merge now, as you can see, it's only deferred 
to solve this on a global level.

Not compatible is a severe bug, making it critical.

As for "open", if you merely need it to use the correct builtin, I have added a 
refresh for it specifically. The idea though is that Nuitka must become able to 
recognize itself.

But what if it fails to do that? Well, for that case, I am going to add now for "--
debug" a variant that checks if the builtin value used matches the original one. And 
if it doesn't, it will give a warning and raise a runtime error.

That way, one can easier detect the problem, even after the solution of this issue.

For the "range" builtin, you will have to use your patch until it's properly solved, 
the hope it clearly, that it won't be that long.
msg233 (view) Author: nssh Date: 2013-03-14.05:54:12
Add another patch for less error prone implementations (build-config.patch).

usage:
nuitka-python [--build-config=FILE] main_module.py

config example:
NUITKA_BUILTIN_DONT_EXTRACT open
NUITKA_BUILTIN_DONT_EXTRACT range
#NUITKA_BUILTIN_DONT_EXTRACT True
msg232 (view) Author: nssh Date: 2013-03-14.02:21:58
Maybe i didn't say clearly, that's my fault.
I want a way to do that, if you simplliy disallowed, my few project will no
longer compatible with nuitka.
I wrote a patch let optimization be default but user can off it.
It affect nuitka's framework so you may not like to merge it, but this is a
solution.

Example run result with this patch:
$ ./bin/nuitka-python --recurse-all ../replace_test.py 
1
<type 'xrange'>
xrange(5)
<compiled function open at 0xb774777c>
1 2 3 4 5
msg231 (view) Author: kayhayen Date: 2013-03-13.14:22:33
For built-ins that are called at run time, Nuitka refreshes their value indeed. It is esp. much 
needed for __import__ which people will change. For others, e.g. int(), etc. this is not done at 
all. It's a bit case by case.

Also parsing of parameters, if the replaced variant takes others, it will generate exception raises 
at compile time. An "open( 1,2,3,4,5,6 )" will e.g. not attempt to call open, give directly the 
exception, preserving side effects of open arguments.

Right now, basically the idea is that behavior changing built-in overloads are not allowed and not 
generally honored.

And the print statement, it roughly goes like this:

source code: print True
->
tree print str(value of True variable)
->
tree print str(True)
->
tree print 'True'
->
code PRINT_ITEM_TO...

That way, the built-in lookup is optimized away, as is the conversion to string.
msg230 (view) Author: nssh Date: 2013-03-13.13:06:22
The test code I post before isn't right.
write attributes of __builtin__ actually will affect a part of the same name in
global.
tests are failed by nuitka's pre-processing,
because it replace print True or print open to PRINT_ITEM_TO( NULL,
_python_str_digest_xxxx );
and replace builtin function with non default parameters to a exception raise code.

looks nuitka is using the b) way i said, it store and refresh pointer of builtin
objects.
if i set __builtin__.open then pointer _python_builtin_open also will changed.

I wrote an example to realized builtin function replace, that's the thing I want
to do.
Improve pre-processing and PythonBuiltin class will make it more compatible,
but nuitka still need analysis skip the lookup from locals and globals or not,
that's another issue.
msg229 (view) Author: kayhayen Date: 2013-03-13.10:40:04
As for "open" assignments, it needs to be the global variable. If you did it from the outside, 
or from exec, or via globals(), frames, etc. you could still fool it badly. We need whole 
program analysis for this, to be sure.

For unsure cases, optimization could introduce guards, which replace a function with a version 
that assumes less, when a guard is violated, or a notification is receiving, depending on the 
ability to detect changes.

As for giving hints, there is a "hints" module foreseen. The idea is to implement something in 
CPython as well, that asserts that the promised thing cannot happen.

Such a hint, in case of the module, could be "do not write from outside to this module", and 
then it's replaced in sys.modules, with something that guards the attribute write, while 
aliasing all the value. And then, Nuitka would merely understand that the module dictionary 
does not escape (becomes writable by unknown code).

But as you say it's severely deferred.
msg228 (view) Author: nssh Date: 2013-03-13.10:02:07
Oh I dont notice the duplicate, sorry for that.
I can see nuitka replace "open" to OPEN_FILE when it wouldn't found "open = " in
that file.
It's a little danger.
This code can make the detect fail:
import sys
sys.modules["__main__"].open = 1
open("abc", "rb")
also:
import abcde
abcde.open = 1

I think it can resolve by these way:
*a:
dont optimization unsure things
and add some syntax like from __nuitka__ import constant_open, constant_range
or #define nuitka_optimize_constant_open 1 (will be more compatible with cpython)
*b:
make property such as "open" and "range" to pointer,
script can change these pointer to other function at runtime,
but only reduce property lookup time

I think static analysis is dangerous, many way can break it out.
I have a project use a module assert all module's file accessing by overwrite
property, although it is not common.
Resolve works are heavy so now I can accept it deferred.
msg227 (view) Author: kayhayen Date: 2013-03-13.08:31:27
Hello,

if you overwrite True within the module, it will work. Otherwise it's dependent 
on type inference, i.e. we need to see that import, and the write to "True" 
should make it distrust "True" references.

It's probably not going to happen soon (and not very important) and it's a 
duplicate of http://bugs.nuitka.net/issue52

You report is nicer though.
msg226 (view) Author: nssh Date: 2013-03-12.22:02:19
In CPython script can overload builtin function such like "range" "True" by
write property under __builtin__,
and affect global name, nuitka cannot.

test code:
import sys, os, __builtin__
__builtin__.True = 1
print True
print __builtin__.True

run result:
[test@localhost Nuitka]$ python ../1.py 
1
1
[test@localhost Nuitka]$ ./bin/nuitka-python ../1.py 
True
1
History
Date User Action Args
2014-07-05 08:51:31adminsetfiles: example_builtin_overwrite.py, definition.patch, replace_test.py, replace_builtin.py, build-config.patch, step-01-inject-builtin.patch, test_builtin.py
2014-07-05 08:50:40adminsetfiles: example_builtin_overwrite.py, definition.patch, replace_test.py, replace_builtin.py, build-config.patch, step-01-inject-builtin.patch, test_builtin.py
2013-11-16 12:12:58kayhayensetpriority: critical -> bug
2013-03-24 11:58:56nsshsetfiles: + test_builtin.py
messages: + msg283
2013-03-23 18:24:41kayhayensetmessages: + msg282
2013-03-15 10:51:54nsshsetmessages: + msg242
2013-03-15 10:29:43kayhayensetmessages: + msg241
2013-03-15 10:12:08kayhayensetmessages: + msg240
2013-03-15 07:44:02nsshsetfiles: + step-01-inject-builtin.patch
2013-03-15 07:43:41nsshsetfiles: - step-01-inject-builtin.patch
2013-03-15 07:06:47nsshsetmessages: + msg239
2013-03-15 02:04:44nsshsetfiles: + step-01-inject-builtin.patch
messages: + msg238
2013-03-14 13:01:46kayhayensetmessages: + msg237
2013-03-14 12:46:33nsshsetmessages: + msg236
2013-03-14 10:06:08kayhayensetmessages: + msg235
2013-03-14 07:59:40kayhayensetpriority: bug -> critical
messages: + msg234
keyword: + optimization
2013-03-14 05:54:12nsshsetfiles: + build-config.patch
messages: + msg233
2013-03-14 02:22:16nsshsetfiles: + replace_builtin.py
2013-03-14 02:22:11nsshsetfiles: + replace_test.py
2013-03-14 02:21:58nsshsetfiles: + definition.patch
messages: + msg232
2013-03-13 14:22:33kayhayensetmessages: + msg231
2013-03-13 13:06:44nsshsetfiles: + example_builtin_overwrite.py
2013-03-13 13:06:22nsshsetmessages: + msg230
2013-03-13 10:40:04kayhayensetmessages: + msg229
2013-03-13 10:02:07nsshsetmessages: + msg228
2013-03-13 08:31:27kayhayensetstatus: unread -> deferred
assignedto: kayhayen
messages: + msg227
keyword: + wrong_execution
nosy: + kayhayen
2013-03-13 00:23:53nsshsettitle: cannot overload __builtin__ functions -> write attributes of __builtin__ dont affect the same name in the global
2013-03-12 22:02:19nsshcreate