[Date Prev][Date Next][Thread Prev][Thread Next] [Date Index] [Thread Index] - Subject: Re: upcoming changes in Lua 5.2 [was Re: Location of a package]
- From: Mike Pall <mikelu-0802@...>
- Date: Tue, 19 Feb 2008 15:36:58 +0100
Roberto Ierusalimschy wrote: > - ephemeron tables (tables with weak keys only visit value when > key is accessible) Would this allow for weak string keys? > - tables and strings respect __len metamethod This makes sense for tables, sure. But I'm strictly against __len for strings. This has a negative impact on software composability. The standard behaviour is to return the number of bytes in the string object. This is well defined and easily described behaviour and generally relied upon by code using the # operator on strings (this is NOT just text processing). It's consistent with string.len() and lua_tolstring(). In fact, the number of bytes is the only sensible generic definition for it. If one overrides __len for strings, this will impact _all_ modules, no matter what they are using # for. Ok, so you want to get the length in UTF-8 codepoints (or glyphs or whatever). Then by all means use utf8.len() or glyph.len() and don't override #. Overriding the behaviour of # means that another module, trying to load an image file from disk and doing some operations on it, may fail. If you go one step further, you'll realize you have to change string.sub() and lots of other string.* functions to be consistent with #. This will in turn break more and more modules. This is not the way to go -- simple rule: put extra functionality for a certain type which just happens to be _represented_ by strings into an extra module. A string is an opaque container of 8-bit quantities. The Lua core should never deal with it as if it was the representation of anything other than that (e.g. ASCII, UTF-8, UTF-32 or whatever). And it should not encourage anyone to change this basic assumption. [Maybe you've followed the discussions about JS1/ES3, charAt, UTF-16 and the backwards-compatibility lockup. Or the story about Py3K and Unicode. For me, these are all big warning signs that you do NOT want to mess up the basic language definition with reliance upon individual character representations. This belongs into libraries where compatibility issues can be dealt with much more easily.] > - arguments for function called through xpcall I.e. xpcall(f, err, args...) ? > We are also considering the following changes: > > - string.pack/string.unpack (along the lines of struct/lpack) Sure, this would be very useful. One thing to consider, is the heritage of these structure definitions: they either come from C struct definitions and then you'll want the host-specific type sizes and endianess. Or they come from some network protocol definition or file format. Then you'll want to be able to specify the sizes and endianess independently of the host. A structure definition syntax needs to cater for both needs. So far, all attempts at this in other languages have grown ugly and inconsistent because this need was not anticipated in the design. There's also the problem with variable-length elements where you may need to diverge between the specs for pack and unpack. > - Mike Pall's implementation for yield (using longjmp), allowing yields > in several places not allowed currently (inside pcall, metamethods, etc.) The current lua_yield() in LuaJIT 2.x actually never returns to the caller. It unwinds the C stack back to the last resume (i.e. a longjmp) and exits there with LUA_YIELD. So far, I've not noticed any bad side-effects on existing modules. None of them seem to rely on lua_yield() actually returning to the caller before passing the -1 return value to the Lua core. > - some form of bit operations. (We are not very happy with any > known implementation. Maybe just incorporate bitlib?) I've used the same names for the bit.* functions, but not the implementation. I've also opted to put the module into package.preload and not pollute the globals with "bit" (which seems to be a popular variable name). local bit = require("bit") is needed before use. Note that most implementations out there are broken in some respect. While it's easy to get this right when lua_Number is an integer, there are some pitfalls with doubles. You want to allow both signed and unsigned 32 bit numbers as valid inputs and produce a consistent format on output (I've opted to signed, but may revise this decision later, based on user feedback). 0xffffffff either parses as 4294967295 (with lua_Number = double) or as -1 (with lua_Number = int). Conversely you'll want bit.band(0xffffffff, -1) to return either -1 or 0xffffffff, but not an error or any other value (some implementations return 0x80000000 :-) ). The conversions to and from double are tricky to get right. I'm always using the d+6755399441055744.0 cast. It yields correct results for all numbers in the range -2147483648 .. +4294967295 (look twice) and it's very fast. It needs to know the endianess of the host at compile time (not much of an issue) but is otherwise completely portable across IEEE 754 implementations. [And you really want to avoid going through 64 bit integers as intermediates or (worse yet) doing FP modulos (*argh*).] > - there is already a new function luaL_tolstring (along the lines of > the 'tostring' function). Maybe we should define a lua_rawtostring (no > coercions from numbers) and then use luaL_tolstring ("full" coercion > from other types) when we want to allow coercions. The point is where > to use one and where to use the other. (The current lua_tostring behavior > would be deprecated in the future...) Independent of this change, I'd welcome it if there were _less_ automatic coercions going on in the standard libraries. I'd ditch all of the string-to-number auto-coercions once and for all (ditto for arithmetic operators). [OTOH the number-to-string auto-coercions make sense in many cases, e.g. io.write.] --Mike