4

To develop an online judge for ACM competition,we should prevent some api being called in the source code that submitted by users. For example, it's not allowed to call shutdown() or socket() in the source code. If the source code calls the api, we should stop compiling it or throw errors during compiling, or throw errors during running.

I have no idea how to do this on Linux or Windows; can you guys give me some advice?

2
  • 1
    On linux you could use LD_PRELOAD and use a self compiled lib that overrides all the forbidden functions and aborts on a call. Commented Jun 30, 2011 at 13:34
  • Of course, you have to forbid inline ASM usage, or else, everything you do is pointless. Commented Jun 30, 2011 at 16:37

7 Answers 7

2

First: I recommend not to invent the wheel again. There are already judge system, maybe you should first look at them (e.g. here we used DomJudge as ACM competition judge system).

Second: You could, as already suggested, use LD_PRELOAD to link to a restricted library. An other option, which also works against some other prohibited things as protection is a sandbox: Setup a chroot environment, where you just install those restricted libraries, so no access to illegal things possible.

Sign up to request clarification or add additional context in comments.

3 Comments

DomJudge is for a specific kind of programming competition, they could be doing a ACM ICPC type contest or a UVa Online Judging contest, in which case DomJudge would return the results far too slowly for the event timeframe (which is fixed).
@Edwin: there really isn't sufficient information in the question to know what the constraints are. The only mention of C is in the tags, for example.
@Jonathan, agreed. The answer might be a good one (depending on the details), but the question is overly broad.
2

You need to use a kernel-enforced sandbox, e.g. "user-mode linux" or "capabilities".

The reason is that system calls don't require a library to be linked, LD_PRELOAD is ineffective against code that contains syscall instructions. And trying to prevent someone from putting machine code into an array and then jumping to it is incredibly difficult, there are so many ways to do that in C (function pointers, stack smashing attacks, etc.) Non-writable code segment and non-executable data segment will help, but the only safe way is to use an unprivileged user account so that the kernel fails the call with EPERM.

Comments

0
#define verboten_api(a1, a2, a3) you may not use this verboten API 

Make sure they must use the header containing the verboten APIs.

GNU provides a 'deprecated' attribute. From the GCC 4.6.1 manual:

deprecated
deprecated (msg)
The deprecated attribute results in a warning if the function is used anywhere in the source file. This is useful when identifying functions that are expected to be removed in a future version of a program. The warning also includes the location of the declaration of the deprecated function, to enable users to easily find further information about why the function is deprecated, or what they should do instead. Note that the warnings only occurs for uses:

int old_fn () __attribute__ ((deprecated)); int old_fn (); int (*fn_ptr)() = old_fn; 

results in a warning on line 3 but not line 2. The optional msg argument, which must be a string, will be printed in the warning if present. The deprecated attribute can also be used for variables and types (see Section 6.36 [Variable Attributes], page 341, see Section 6.37 [Type Attributes], page 350.)

Note that GCC provides options to refuse to compile code using deprecated functions.

These are compile-time checks - as opposed to run-time checks. They're probably also intrusive, unless you're willing to hack the system headers used. Also, if the competitors do not use the system header, then they might get away with using them.

Consider creating a static library that is linked with their code that defines the functions that are forbidden, but the implementation of each function is an assertion that will always fail:

int verboten_api(int x, int y, char *z) { assert("function verboten_api() called" == 0); return -1; } 

Link the test programs with that library.

5 Comments

would you explain to me why you are mixing german words with english instructions?
@Constantinius: why not? I like 'verboten' as a word; it is usually understood by reasonably well-read people. @MK: it depends in part how adversarial the setting is. I would expect that the 'online judge' would analyze the source code as well as the object code. It could well provide libraries that it uses to link with. But it depends on a lot more information than we currently have.
@Constantinius: Actually, Webster even lists "verboten": merriam-webster.com/dictionary/verboten
@Jonathan online judges usually just run code on some predefined inptus and verify the outputs. Analyzing code doesn't help to figure out if it works (you can't even tell if it halts ;) Limiting libraries is not going to do you any good because you can still just invoke system calls directly.
@MK: if your (automatic, online) judges are that lax, then yes. There are GCC compiler options such as -fno-asm that can be imposed on the compilation. It depends on the rules laid down.
0

Linux specific answer:

nm -D _the_compiled_binary_ | grep ' U ' 

will list you all the dynamic symbols used (called) by the binary.

1 Comment

Be sure to blacklist dlopen then.
0

Keep in mind that you don't need the socket library to access the network, you can do with open() read() and write(). So you probably need some kind of sandbox, not just limitations on what's allowed in code.

Comments

0

Filtering the source code is not enough. Even if the source code appears to not call the API, it might be using tricks to call it. For instance, a simple regex-based filter could be defeated via token pasting. And that is only at the source code level; when you start thinking about the machine code, there are a lot of other possibilities, from simple inline assembly to return-oriented programming, and it can be done in a way which is hard to see when looking at the source code, as shown by the Underhanded C Contest.

All APIs reduce to kernel APIs in the end, since the programmer can simply copy the API implementation otherwise. There are AFAIK only two safe ways to prevent a kernel API from being called: either filter it in the kernel, or statically prove the code cannot call the kernel directly. Other ways like LD_PRELOAD can be bypassed. Bypassing LD_PRELOAD is easy; just do the system call directly.

To filter an API in the kernel, the most recent way is to use seccomp filters, which allows one to restrict system calls and their parameters. With it, you can easily forbid a process from for instance ever being able to call the shutdown and socket system calls. Other mechanisms (namespaces, cgroups, chroot, etc) can be used to add other kinds of restrictions on top of the filter.

The alternative approach of statistically proving the code is safe is used by Google's Native Client. It restricts the generated assembly code in ways which allow for simple proofs that the flow of execution cannot escape the sandbox, except in a few well-defined ways. As an example of these kinds of rules, no instructions can cross a 32-byte boundary, all jumps targets are aligned to a 32-byte boundary, and indirect jumps are only allowed via a pair of instructions which masks the lower bits of the target address before the jump, so there is no way to jump to the middle of an instruction.

Comments

-1

First you have to define what API you do not want to be accessed. I guess it is probably easier to make a static code analysis and to raise an error if some unwanted #includes occur.

5 Comments

You do not need to #include in order to use an API: just declare whatever function you want to call.
Also, the function name never needs to appear in the program, with creative use of macros.
@Ben: not to say that this answer works, but your particular objection would be overcome by performing the analysis after preprocessing.
@Steve: Checking for #include, as this answer suggests, implies that the analysis is done before preprocessing.
@Ben: hence the conditional subjunctive mood "would".

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.