Subject: RANDOM expectations J3/01-378 From: Kurt W. Hirchert (Meeting 159) 19 Nov 2001 ========== Background ========== During the development of RANDOM_NUMBER and RANDOM_SEED for Fortran 90, the PROC subgroup developed some fairly specific expectations about how conventional random number generators would be supported through this interface. They declined to make these expectations explicit in the standard, reasoning that it was "obvious" how to interface the conventional cases, and that stating explicit expectations might interfere with unconventional implementations. As it turned out, these expectations were not obvious to many implementors, and program portability suffered as a result. Some of these expectations have since been filled in by interpretations, but not all. I believe the Fortran community would be better server if we would be fill in the missing expectations. =========== Straw Votes =========== 1. Is it appropriate to work on this at this time? [From my point of view, this is repairing a hole in the description of existing technical content, and thus "in order" during this integration period, but if the committee disagrees, I can set this aside and bring it back during the public comment period.] 2. Should these expectations be expressed in normative text or just in notes? 3. Are the expectations in the last part of this document the right ones? [If your answer is "no", we'll probably need to discuss what needs to be changed, but the yes/no vote is an appropriate starting point.] ================ The Expectations ================ [The material which follows is not written as edits to the current draft, but I hope I have written it rigorously enough that it will not be subject to misinterpretation. Paragraphs prefixed with "*" are material which I believe would be normative if we choose to make the expectations normative. Paragraphs prefixed with "+" are material which would be non-normative commentary in either case. Paragraphs prefixed with "." are internal to this document and would not appear as either normative or non-normative text in the standard.] + It is hoped that pseudorandom number generators other than the intrinsic pseudorandom number generator will also be packaged using this RANDOM_NUMBER / RANDOM_SEED interface, so a program can easily be switched from one pseudorandom number generator to another simply by adding or changing a USE statement to change which RANDOM_NUMBER / RANDOM_SEED pair is accessible. * Each element of the value returned by RANDOM_NUMBER shall be greater than or equal to zero and and less than or equal to one. . Why did we make this range [0,1] instead of [0,1) or (0,1)? With most pseudorandom number generators, one can convert a random number into a random decimal digit by the formula INT(10*X), but with our specification, this really needs to be MIN(INT(10*X),9), although the probability of 1 being returned should be miniscule. If we are going to leave the specification like this, we might put some kind of warning here. [Are there any current implementations that actually return 1?] * Collectively, the elements returned by any sequence of calls to RANDOM_NUMBER with no intervening call to RANDOM_SEED shall approximate the statistical properties of random numbers drawn from a uniform distribution. . It would not have occurred to me that there would be any question about successive calls to RANDOM_NUMBER delivering different values, except that the question came up in a comp.lang.fortran discussion. + In mosts, calls to RANDOM_SEED should not degrade the statistical approximation, but in some cases, they can cause "random" sequences to repeat, so they are excluded from the general requirement. * The generation process shall be deterministic. The state of the pseudorandom number generator at the time of a call to RANDOM_NUMBER, together with the type, type parameters, and shape of the HARVEST argument fully determine both the values returned in HARVEST and the state of the pseudorandom number generator following the call. . The current description of RANDOM_SEED describes the changing state of pseudorandom number generator as a changing seed. This implies that all internal states of the generator must of external representations as seeds. I don't think we really want to force that, so I switched to calling the state a state. . This requirement for deterministic results has some implications on the parallel generation of random numbers. If each processor has an independent generator, processors cannot simply "race" each other to deliver random numbers to different parts of the HARVEST array. Instead, processors must be allocated their part of the HARVEST array in some predictable fashion. . Should a note emphasize that a single precision request is not equivalent to a double precision request, a 10x10 request is not equivalent to a 20x5 request or a one-dimensional request of length 100, and that a request of length 100 is not equivalent to two successive requests of length 50, although all of these equivalences may hold on some specific Fortran processors? + There are no restrictions on what integer values the program is permitted to specify as seed values, so it is the responsibility of the processor to transform whatever seed value is specified into a state suitable for the operation of the generator. + Many existing programs handle seeds of more than one integer value by repeating the last value as many times as necessary to get an array of the right size. It is recommended that processors take whatever steps are needed to avoid statistical anomalies in this common case. (For example, a processor might XOR different elements of the supplied seed with different fixed values to get an effective seed, so an external seed with repeated values will not have repeated values in the effective seed.) * The initial state of the pseudorandom number generator is processor dependent, but fixed for all executions on a given processor. + This consistent initial state in conjunction with the deterministic nature of the of the generator means that, by default, successive executions of a program (or of similar programs that make the same requests to RANDOM_NUMBER) will get the same numbers back. Such reproducible results facilitate program development and debugging. * A call to RANDOM_SEED with the SIZE= argument returns the number of integer values in a seed value for this pseudorandom number generator. PUT= and GET= arguments to RANDOM_SEED shall have at least this many elements. Only the first this number of elements will be used from a PUT= argument. Only the first this number of elements will be defined in a GET= argument. The state of the pseudorandom number generator is unchanged. * A call to RANDOM_SEED with the PUT= argument changes the state of the pseudorandom number generator. The state of the pseudorandom number generator following the call is fully determined by the seed value in the PUT= argument. * A call to RANDOM_SEED with the GET= argument returns a seed value dependent on the current state of the pseudorandom number generator. The state of the pseudorandom number generator following the call is the one that would result from supplying that seed value as a PUT= argument. + If possible, the state following such a call should also be the same as the state prior to that call. This should always be possible for generators whose seed values are nothing more than simple transformations of the internal state, but it may not always be possible in cases where the transformation from seed to state is more complex. * A call to RANDOM_SEED with no arguments changes the state of the pseudorandom number generator. This change is _not_ required to be deterministic and should normally vary from execution to execution. + Typically, this functionality is implemented by using data such as the real-time clock or processor ID in creating a seed value. + Some programs execute repeated calls to RANDOM_SEED with no arguments, in the mistaken belief that this will make the numbers from RANDOM_NUMBER "more random". In practice, just the opposite may be the case. The resulting state, although not predictable from execution to execution, may be unchanging or slowly changing within a single execution, so this may result in repeated output sequences. Programs are recommended to avoid this practice. Processors are recommended to "ignore" extra no- argument RANDOM_SEED calls (i.e., when the previous RANDOM_SEED call also had no arguments). + Externally, a deterministic process running on non-deterministic data is indistinguishable from a non-deterministic process, so a processor could switch to a non-deterministic generation process in response to a call to RANDOM_SEED with no arguments, so long as it switches back to a deterministic process after the next call to RANDOM_SEED with an argument. - end - -- Kurt W Hirchert hirchert@atmos.uiuc.edu UIUC Department of Atmospheric Sciences +1-217-265-0327