Problems of testing 3D productivity by means of FRAPS
In daily work 3D testers always there are some "sick" questions. One of the most significant is that is far not in all three-dimensional games by developers the opportunity of carrying out of tests of productivity is provided. And even less games leaves, where it is made really conveniently for users where there is an opportunity of automation and reception of the fullest information on productivity of game at the certain adjustments. Opportunities of different games on testing considerably differ, the some people do not give even the elementary counter of instant frequency of the staff bound leather notebook (FPS), and the some people offer all necessary means which only can for itself be presented.
From this point of view, by and large, exists three types of games: in which tests of productivity is not present (such appendices most of all, for example, Need For Speed: Most Wanted, Condemned: Criminal Origins, Ghost Recon Advanced Warfighter, Tomb Raider: Legend and so forth, the list is practically infinite) in which the opportunity of testing is, but it is convenient not in all cases (as an example it is possible to result F.E.A.R. where there is no opportunity of automation, are impossible start of the test from a command line and use of the demos, and also set of games where realized only a conclusion of instant frequency of the staff, like TES4: Oblivion). And only small part of games possesses all necessary opportunities by bound leather notebook definition of average FPS, and sometimes and minimal with maximal, allowing to write down own demos at different levels of game and at their subsequent playing to count up all necessary figures. Examples of such games: Half-Life 2, DOOM 3, Serious Sam (in the last there are following opportunities: record of the user demos, standard demos in the complete set, a conclusion of the minimal, maximal and average values of frequency of the staff in a second, in view of and without taking into account peaks, etc.) . As games without built in benchmarkov exists more, and their variety is much wider, there is a clear bound leather notebook desire to test such 3D appendices in any accessible image. One of such methods is application of foreign utilities which allow to measure instant FPS, displaying its value and also to do the analysis of productivity on the basis of parameters of instant frequency of the staff for the certain time interval set by the user. The minimal, maximal and average frequency of the staff, reached in this time interval is usually counted up. Most known of such utilities is program FRAPS which gives similar opportunities on measurement of productivity including in the free-of-charge version which it is possible skachat' from the specified site.
The given method of measurement of productivity is, perhaps, a unique opportunity of carrying out benchmarkov in bound leather notebook appendices without the integrated support, but it is not free from lacks. Besides, these lacks can bring to nothing all its advantages. We shall try to understand this clause, whether has sense carrying out of measurements of productivity by means of FRAPS in 3D the games which are not having built in tests of speed of rendering, and we shall try to understand the basic lacks of the given method the Theoretical sight
Even approaching to a question cleanly theoretically, at once it is visible a little bit greater problems, which can (and should) to arise before testers. In our opinion, they can quite eclipse all advantages of a method. For example, one bound leather notebook of the greatest lacks is the human factor. In fact the person cannot press with absolute accuracy a hot key of start and a stop benchmarka. To receive authentic figures of average frequency of the staff in different tests, it is necessary, that the moments of the beginning and the end of calculation of average, minimal and maximal FPS were identical! The person, by virtue of the natural reasons, cannot reach ideal accuracy, its opportunities are limited by time of reaction, attentiveness and other factors. Though author FRAPS also has made steps in a correct direction, having realized a stop of the timer benchmarka on bound leather notebook the expiration of certain time that reduces influence of weaknesses of the person, it does not exclude them absolutely. In fact the reference mark all the same is entrusted to the person, and the end if duration of the test is rigidly fixed depends on the beginning also. Possibly, adjustment at which testing would begin with that moment when value of instant frequency of the staff passes through the certain value that happens in many cases when there is a start of a demo from the menu of game here could help. Frequency of the staff in the menu it is usual considerably above (exceptions happen) and it could be used for the beginning readout of gauging of productivity. Such adjustment by the current moment in the program is not present, it only our imaginations, we shall return bound leather notebook to a reality.
Rather the absolute necessity in manual work - start and a stop of the test (except for an opportunity described above) upsets. Any automation. Present to obtain more or less correct data about average frequency of the staff, it is desirable to start the test some times, rejecting obviously wrong values (received as a result of the above-stated lacks of the person or technical problems) and averaging remained. And if tests a little? In different sanctions, with different adjustments, with different stages, with different hardware configurations, at last! The Tester should be close and to not make mistakes during several hours (!), Not only changing adjustments and bound leather notebook sanctions, but also pressing keys of start and the termination of the test in places strictly allocated for it. Time of carrying out of such tests can be stretched for some days who then will be charged what the tester under influence of human weaknesses will do all precisely and constantly equally?
Let's consider bright examples of mistakes which the person can make at similar testing. Look at figure, possible mistakes of a tester on the basis of real results of game F.E.A.R there are schematically presented. Red color allocates a site which should be used for measurement of average, minimal and maximal frequency of the staff (it bound leather notebook is used in game built in benchmarkom), and the others - examples of sites which the tester under influence of the human factor can use.
In the first case (pink color) our tester hardly has pressed a key of the beginning of the test and much later - the terminations that has led to the big distortion of results in greater party as in the end the parameter instant FPS has strongly grown later. In the second case (green color) it in advance has included the test, having included in it that part which should not be used for tests and is reduced with calculated parameter FPS. Also the test also in advance has been switched off, the final part has not entered into it animated 3D stages. And the third case noted by brown color, shows mistakes of early inclusion and late deenergizing of bound leather notebook the test. It also is far from an ideal though can show the results close to correct, of that in beginning FPS there was low, and in the end of a demo it big, these two slices grasped from edges can counterbalance each other. In a practical part we shall consider similar mistakes, having estimated, how much strongly they can affect result of calculation FRAPS in real 3D appendices. But all this has been written about games in which though there is no counter of productivity (average frequency of the staff, at least), but there are constants skriptovy'e scenki and-or an opportunity of record and playing of the user demos. Even in such cases the received figures can differ from real game productivity as game is not skriptovy'e stages and not playing of demos considerably. And if also such opportunities are bound leather notebook not present... For example, some testers use such focus, in case of absence of an opportunity of use of demos and skriptovy'h stages - they load the certain level of game and simply pass it some times, with different program and hardware adjustments. Probably, including, that average frequency of the staff all the same will be authentic statistically. We also shall necessarily check up this question in a practical part of clause.
In this part of clause we shall consider the problem on that, it is how much possible to trust the figures received at testing with application FRAPS. We shall try to use bound leather notebook different conditions of testing and even to recreate some of probable mistakes of testers. For the beginning we shall look, how much the figures received from FRAPS, correspond to that we have got used to measure. For this purpose we have taken known benchmark 3DMark06, its first game test in a mode of construction of schedules of productivity. At the same time have started and FRAPS to compare the parameters, measured by two different methods, while only instant frequency of the staff. With this part FRAPS has consulted not bad, to received figures of average FPS for every second to trust it is possible. Instant bound leather notebook values of frequency of staff FRAPS for some reason does not measure, and because of similar usrednennosti some details on the schedule 3DMark from it have escaped. These details are not essential, besides time FRAPS measures time spent for construction of the staff (there is at it and such opportunity), that is a hope, that for measurement maximal and minimal FPS it uses them.
Now we shall look, as results of the average reached frequency of the staff according to FRAPS correlate with data built in in appendices benchmarkov. For this purpose we shall check up a couple of appendices, which are able to measure average FPS. For reliability we shall make measurements on three times, at the same time having estimated and bound leather notebook the contribution of the human factor in case of close (in this case the human mistake was small) works.
The test Serious Sam 42. 42. 42. FRAPS 42. 42. 42. Difference of 1. % of 1. % of 0. % Perhaps, and here FRAPS consults not bad, its results differ from figure of the built in measuring instrument of productivity less than on %. On the other hand, results obviously turn out less stable, in comparison with the results received from the game, all the same the person introduces the influence. Besides concerning quite good result FRAPS could turn out because of the big duration of a test demo when influence of the beginning and the end of a demo where there is an error, is bound leather notebook insignificant. We shall look, that will turn out in other cases...
The test Far Cry 115. 49. 45. FRAPS 118. 48. 44. Difference of 2. % of 2. % of 1. % We have tried some test demos in Far Cry, for different levels of this game, and in this case the difference was a little bit more that, that has shown Serious Sam 2. Probably, so it has turned out that demos were noticeably more shortly, and the greater difference has been reached on initial and final time pieces of used demos. That is, easier speaking, the similar difference has turned out that start and a stop of calculation of average quantity of the staff in a second are carried out manually, increasing influence on an end result.
To check up the come out assumption in a theoretical part of clause, we shall look at bound leather notebook the comparative figures shown built in in game F.E.A.R. The test, also we shall compare them to results of program FRAPS. In one of tests we have specially stopped calculation FRAPS hardly after the termination of a test demo, as though having grasped these a part of that does not enter into a game demo, and is behind its edge. In the menu which appears after the test, higher is reached FPS, therefore the difference should turn out more essentially. It we try to recreate real conditions when after a stage used as test, much more simple can begin renderit'sya (the menu, prompt, etc.) . The test FEAR FRAPS 62. 62. 71. Difference of 1. % of 2. % of 15. % and it has turned out, if in tests the bound leather notebook person, their spending, it is concentrated to an event on the screen, and constantly there looks, pressing a key of the beginning and a stop of calculation of average FPS during correct time result FRAPS differs that the program shows a little (that is, the result is close to real), and in case of emulation of the braked reaction of the person the result noticeably differs. In last test calculation FPS stopped absolutely nenamnogo later, less than for a second after usual, and the difference received as a result already is too great that right at the end of measurement, frequency of the staff has considerably risen. The next problem which we have described in a theoretical part of complexities of measurement by means of FRAPS, caused is shown by the so-called human factor: decrease in attentiveness, change of time of reaction, etc. Let's check up the bound leather notebook same and in the first game test 3DMark06. 3DMark06/GT 13. 13. 13. FRAPS 13. 16. 13. Difference of 3. % of 22. % of 2. % It is possible to confirm the test only written above - in case of a carelessness of a tester and a stop with it of test FRAPS after the put term literally on shares of second, it is possible to receive the result mismatching the valid values of productivity. Similar results are necessary for rejecting, and for fidelity to spend tests some times (a minimum of times five, as a last resort - three), averaging their values as the additional error of measurements, privnosimaya such method of testing, proves to be true and this time - 2- %.
The bound leather notebook similar error (2- % in both parties) would be admissible in the certain cases if in all games there was an opportunity of record and playing of the user demos. Though also they can not reflect a real game situation, it is even close to that occurs during game. Use skriptovy'h stages in the real time, using the same game cursor is sometimes admissible, but these data reflect the game productivity even less, the received figures will differ considerably from FPS in game and sense in them not so much. Yes, they can show comparative productivity of different videocards and adjustments at rendering skriptovy'h scenok, but in any way will not reflect bound leather notebook productivity of the game. We shall consider the most disputable opportunities used by some testers for measurement of productivity in such games, as Oblivion, Need For Speed: Most Wanted and so forth In them opportunities of record of the demos are absent, are not present also long skriptovy'h the stages reflecting game productivity which can be used for tests. Therefore the some people use such decisions, as passage of levels on some times. We shall look, that will turn out if simply to load the certain level of game and to pass its part three times. Naturally, 3D stages in that case will differ all the same and whether there will be an average bound leather notebook frequency of the staff in that case even close?
We shall begin with game Serious Sam 2, from a level under name Greendale. On the schedule are presented reached srednesekundny'e parameters of frequency of the staff during three different passages of the big flat site of this level. Time of its passage is rather great, from till minutes, therefore average values will differ hardly too strongly.
Indeed, the difference in frequency of the staff exceeds % that can seem a small error, but in our opinion nevertheless significant only a little. On the schedule the greater difference in reached maximal and minimal parameters FPS in different passages of a level, and their different duration is visible as very much. It and is clear, the bound leather notebook game situation each time develops differently, and 3D a stage each time to not reach identity, especially, during enough long time. We specially investigated cases of possible mistakes of testers and consequently schedules even in general do not repeat each other, and only average value of frequency of the staff at them is similar. And, for example, minimal values FPS were in disorder from up to 31, that is already inadmissible roughly for tests. Maximal FPS also differed very strongly, between and 1 - a precipice. We shall consider one more shooter from the first person, game Far Cry, the beginning of level River. Game bound leather notebook differs what to repeat a game situation of times to time here even more difficultly, each attempt brings something, original. It often happens in modern games when opponents possess the certain intelligence and do not appear in the same places, each time operating in a new fashion. Besides, it will be interesting to look at shorter stage with which averaging FPS will help not so strongly.
The disorder of values of frequency of the staff has appeared is very great, the maximal difference between values of average FPS almost in % does similar testing senseless. Differ both the minimal and maximal values, and any repeatability of schedules it is observed bound leather notebook only in a small initial part where also opponents still are not present... One of the most interesting games of last time, from the point of view of applied graphic technologies, is The Elder Scrolls IV: Oblivion. Unfortunately, in the game the counter of the staff in a second which can be included in consoles is accessible only, and opportunities of record and playing of demos are not present, therefore in game some testers apply the method described by us above to measurement of productivity - load preservation and pass a part of a level each time anew. Accordingly, equally to pass each time it will not turn out, therefore and results should be with the big error.
Tests in Oblivion repeat the previous results. Even, in spite of the fact that frequency of the staff in this bound leather notebook game is rather stable and on schedules similar peaks in all passes are visible, the maximal difference at us has turned out equal %. In our opinion to compare productivity of videocards on the basis of such rough tests it is useless, as the difference between competing videocards of two basic manufacturers will be less than this figure more often. Even if to spend ten measurements and closely to watch carrying out of tests, it is impossible to be assured in received final figures, that they reflect a real state of affairs. It is the basic lack of tests by means of FRAPS in appendices without an opportunity of record and reproduction of the user demos. Last game which will be considered in a practical part of clause, is Need For Speed: Most Wanted. This game, as a matter of fact, just also was bound leather notebook a push to a writing of the given material as differed practical impossibility of a stable reconstruction of a game situation for tests by means of FRAPS. In clause about game I already marked, that application of utilities similar FRAPS with measurement of average FPS during passage of the same races with one configuration and machine strongly is at a loss because of weather and other conditions of race. The disorder of results has appeared even even more, than specified in that clause of %. One line and the same cars with identical adjustments each time give a differing result. Results of three trial arrivals are presented on the schedule: Some repeatability of schedules is, all of them are equally wavy, but it does not help with bound leather notebook achievement of close values of the average frequency of the staff, it differs almost on % for two extreme cases! Much influences frequency of the staff in NFS:MW very much: Dynamic weather, time of day of race, density of a transport stream, level and mistakes AI of contenders, skill and luck of the player. Each time a situation on road other and consequently frequency of the staff always differs. Similar "tests" can be not helped even by repeated testing and rejection of obviously abnormal results. And present, that to the testers using similar methods of measurement of productivity also it is necessary on to play some times the same game, keeping thus attention to press in time a key of a stop of the bound leather notebook test..
The tests of productivity lead within the limits of a material, have shown, that measurement FPS by means of programs of type FRAPS with the certain clauses and restrictions can be used for those games where there are no built in opportunities for measurement of productivity (calculation of average, minimal and maximal frequency of the staff in a second), but there is a record and playing of demos or even an opportunity of a conclusion constant each time 3D animations in the real time, a using game cursor.
But even in case of such games where there is an opportunity of playing of demos or skriptovy'h rollers in real time to concern to the results received by means of FRAPS, it is necessary cautiously as the disorder even in this case can reach 2- bound leather notebook % in both parties. Therefore, to receive though any authentic results, it is necessary to repeat necessarily the test some times (we would advise a minimum five times), that imposes on a method considered in clause additional restrictions in the form of necessity of processing of results (rejection of obviously abnormal data and averaging of the others), sharp increase in labour input of tests, without an opportunity of automation, except for installation of the fixed duration of the test, and, hence, and increases in necessary time at tests. We add to all told rather rather big error even in case of repeated recurrences and bound leather notebook we receive low reliability of a similar method of measurement 3D productivity.
For games without an opportunity of playing of rollers (and it Need For Speed: Most Wanted, TES4: Oblivion and some other), in our opinion, testing with FRAPS is not meaningful, as similar tests by eye give disorder of results very significant, even % are figure unacceptable, let alone greater. Certainly, our tests have shown weak places too brightly, it is possible to think up methods of mitigation of such by-effects, as influence of the human factor and a various game situation at each testing that results of different videocards were similar to the truth. For example, in NFS: Most Wanted it is possible to choose an empty line, without cars of the traffic and contenders, but whether there will be to correspond the received result to a real state of affairs in game? And bound leather notebook in Oblivion it is possible to find deserted space, any plain, and to go after it each time on a straight line, not turning off and not meeting enemies. Only testing such too similar on synthetic will be, it already 3DMark any turns out. And in fact it moves as testing of game productivity... It is necessary to note one more moment. At times, even in cases with completely automated tests and as much as possible lowered influence of the human factor, results of productivity can be abnormal and difficult to explain. Because of rare mistakes in a code bound leather notebook of drivers or games, for example. That can be expected from similar considered and rather inexact tests? In that case mistakes put in possible explanations of anomalies with results of tests and human weaknesses of a tester, and, there will be they on the deserved first place both on influence and on frequency, it is absolutely exact. It would be desirable, that developers understood, that from them wanted by the people who are were interested 3D grafikoy and productivity of hi-tech games on different hardware platforms and with different adjustments. It is very desirable, that even the most technological games got the built in opportunities of testing, and that all has been made is user-friendly and with the maximal opportunities on automation. Eventually, developers and are interested in it. First, convenient means of the analysis of bound leather notebook productivity are necessary to them. Secondly, work with testers of game at different stages of debugging and operational development (public test beta-versii in fact let out to take the same F.E.A.R.) can become simpler, And, thirdly, application of hi-tech games as appendices for testing productivity in 3D environment of enthusiasts can lift their popularity. And the most important - to make convenient opportunities on testing productivity in games for developers not too difficultly, they need to want it simply. We while shall continue to use those appendices in which is built in benchmarki and opportunities on their convenient arrangement. The utilities similar FRAPS to use it is possible, but only in the certain (individual) cases, on some times spending tests, checking reliability of the received results, whenever possible using game demos. Publishing data about how tests of productivity are spent that they have bound leather notebook been lead with use of utility FRAPS, a method which the human factor influences and which reduces reliability received as a result of figures. For constant application such method is not necessary. Alexey Berillo aka SomeBody Else (sbe@ .com)
It is published on July, 26th, 20 Other discussions in conference: Comments? Amendments? Additions? anvakams@ .com