I'm hacking this article until its not wrong!
You are absolutely wrong. Your model is too simplistic, and will even be more so after the C++ committe meets in a few years. Your claims only hold true for a very few languages, like C.
I wrote these parts of the article from a position of experience, while your claims are from a position of theory. Therefore your theory does not consider enough variables. One phrase: Symbol tables. These tables are important for many forms of dynamic programming (such as Reflection) and are stored with the bytecode/executable/interpreted script in a great number of languages.
That is two articles of mine that have been deleted, out of three. One of them having been my user page. This is extremely frustrating. I am just deleting my entire entry until I have time to revert it with deeper explanations. -- forgotten gentleman
If you look at sourcecode, it is well-structured. It is decomposited into functions for Structured Programming, objects of OOP, etc. Compilers tend to propagate this structure into compiled code. Obfuscators erase as much of this as possible. -- forgotten gentleman
There is generally little point in plain obsfuscation of source code although some cases include:
When dealing with interpreted languages it could be argued that smaller (but less undestandable) variable names will keep code size down. However this is a false economy OK that wasn't NPOV
Obfuscated code is extremely difficult to debug. Variable names will no longer make sense, and the structure of the code itself will likely be modified into unrecognizability. This fact generally forces developers to maintain two builds: One that can be easily debugged, and another for release. Both builds should be tested to make sure they act identically.
Occasionally an obfuscator may be buggy, in a difficult to reproduce way. There is little one can do except find a newer version or fiddle with any inputs to the obfuscator until it works.
Well what may happen in the future is irrelevant to an encyplodedia. My claim holds true from at least all compiled languages I've dealt with. Bytecode I'm less familiar with so I'll have to check my facts on that, although I'm sure its very implentation dependant.
I'm sorry but symbol tables are an irrelevance from my experience. If you don't want people to know the names of your functions you strip the symbol table. Its only of use to debuggers.
I don't know what other pages have been deleted, I'm just arguing about this one.Welcome to Wikipedia
I'm sorry but breaking debuggers is not the issue here. People have written plenty of hairy code to attempt to confuse debuggers (I know I've had to bypass some of them), but thats a whole different ball game (and completly pointless IMHO).
Unless your talking about binary obfiscators your the type your talking about must just wreak code they touch. A linked list looks the same in assembly if its nodes are called wikjn and koip instead of n and p. Anything that fiddles with internal structures will just break stuff. If binaries generated by the source-code and its ofuscated equivilent are not identical then its defeated its own point.
I don't want this to become a "Your Wrong" and "I'm right" argument so I would welcome any other points of view?
Since neither of you two care to sign your respective statements, I don't know which if yours is which, but I'm sure at least one of you is clearly not helping to write a useful article. The fact is, "obfuscators" are commonly used pieces of commercial software that perform a specific function for specific reasons, and your opinion of their worth is out of place here. This is an encyclopedia, not a chat room. If you think obfuscators are worthless (I happen to agree for different reasons), then don't write one or buy one, but don't interfere with someone trying to write a useful article on the topic. Symbol removal and substitution is one method (if you think symbol tables can just be stripped, then you obviously don't program in a language with reflection), as is code rearrangement, debugger fouling, and others. Commercial software has used these and other methods, and they should be documented here. --LDC
I think we both want a useful article. Hence I've moved the discussion into talk to gain consensus.
As you may of picked up I don't think obfuscators are worth much at all but agree there should be an article about them. I just don't want it to be inaccurate.
I hope its debate rather than chat.
I'm venturing an opionion as to why I think the article if flawed. I attempted to correct it the Wikipedia way, found the original author disagreed and reverted the change, so I question the assumptions and statements in /Talk. Now we are debating to get a better article.
The article already states you can't obfuscated code that uses reflection. Where mangling the names is allowed where do you need the symbol table? Answer you don't, therefor can strip it. The renamed functions aren't hiding anything in the underlying code. But the way the article is written thats what is implied. See:
A good obfuscator is one that can generate logically identical source code that creates indentical binaries for textually different source code. Which is the exact opposite to what the article says.
No. They are different techniques for defeating crackers. They are different from obfuscating source code. Sure document them in Wikipedia but not in this article.
I agree that they're white elephants. To keep NPOV, I limited myself to an enumeration of advantages and disadvantages.
However, on a project where code size reduction was absolutely crucial, it worked well. (FYI, this was in Java. The size reduction was definitely nontrivial, even accounting for compression.)
I would suggest what you've encountered was an operation on the code which kept it to spec, but made it more difficult for you to "read and understand." From my perspective, having been on a project where they actually dedicated a programmer to parse obfuscated code to hand off to a fairly untrusted remote team... it fits all reasonable definitions of "obfuscated code" to me. In spirit and to the letter.
If your experience is mainly about the fun obfuscation contests, I can see how you would disagree.
We have different conceptions of obfuscators. I believe you have your conception from the fun contests. In my conception, an obfuscator keeps the translated program conformant to a black-box spec. Since obfuscation has all sorts of side-effects (performance, code size, etc), it is not always possible to use an obfuscator which works well on most other programs.
I thought about explaining this by referring to lexical scoping, but I thought better of it. Now I wish I had.
Here, I believe you're thinking too quickly. Static languages promote a sense of determinism. Dynamic ones don't, because code from outside the system may happily interact with its internals. So, getting rid of the symbol table is a net loss of information that may have undesirable effects. One bad effect is to destroy the whole point of having a dynamic language.
Tell me quickly how to strip the symbol table for a Java program, using javac. You can't. The symbol table is not merely a debugging aid in dynamic languages; it's the point. However, an obfuscator may do this, fully or partially, depending on how much you need the dynamism.
That's definitely a valid misunderstanding of the article's earlier versions. I was very unclear.
You can configure an obfuscator to leave parts of your code unrenamed.
Again, I am being extremely imprecise WRT reflection -- you can obfuscate code with reflection, you just have to make sure everything calls the obfuscated name. I don't know why I didn't mention this; it's always the main source of complexity when doing painfully obfuscated builds.
It just defeats the point, since oftentimes classnames are constructed according to a spec. It's not just a matter of "having a name." -- forgotten gentleman
I've moved a few bits of text around and re-arranged the recreational obfuscation bit and added an example the usenet. I've not touched the second part but I'm still unhappy with statements that refer to messing with the internal structure of the code. Maybe a category like "Protecting code from reverse engineering" would better cover some of these weird and wonderful binary obfuscators.
Actually, please modify as you wish. I promise to not throw a tantrum again. ;-) I was not myself; dealing with a very volatile person last week, I became irrational too.
I don't quite understand what you don't like about the internal structures part, so I'll just see what your modifications are. If you mean that breaking internal structure doesn't decrease codesize much with current obfuscators, maybe I'd better look more closely to see where the size savings come from.
Funny, I just now looked at the Jax homepage http://www.research.ibm.com/jax/ and they use language similar to this article's. I definitely should have done research, because they are more precise than my off-the-cuff revisions. -- forgotten gentleman