Our first in-depth foray into the VR benchmarking scene is focused on testing the AMD R7 1700 and the Intel i7-7700K CPUs on both the HTC VIVE and the Oculus Rift headsets. We’re plotting delivered frame times, dropped and synthesized frames, warped misses, frame rate overall, and playing back hardware capture of the benchmarks in this set of tests. VR benchmarking is still new and challenging to perform, so we’ve limited the test to five games and four configurations. But even just running those 20 total configs takes significantly longer than benchmarking an entire GPU suite, and must be preceded with an explanation of VR testing.
Before I get into that, this coverage is brought to you by the 1080 SC, which has a new MSRP that is lowered with the launch of the 1080 Ti series. The 1080 and 1060 SC cards come with For Honor or Ghost Recon: Wildlands, which you can choose at checkout. Learn more at the link in the description below.
Interpreting the data in these charts today isn’t necessarily obvious. This is a different type of chart, and the data has a different significance than our normal tests with traditional benchmarking. So part of this is going to be explaining what we’re actually looking at. The first game benchmark that shows up on the screen will explain what some of the numbers are on the lines and the bars and all of that. And then going forward, we’ll be dumping a lot of the charts from the game as following the first one, which is Dirt Rally, just because if we put every single chart in this video there would be 50 charts. It would take an hour to go through them one minute at a time. So instead we’re gonna put only the most pertinent data sets in this video. The rest, as always, will be linked in the article in the description below, which will have pretty much every chart you could possibly want including individual breakouts of the two CPUs, of the two CPUs overclocked, hardware and software monitoring, and then the comparative analysis, which is what we’re mostly focusing on today in the video.
Intro to VR Benchmarking
As an intro to VR benchmarking, our setup is as follows: We’re using hardware capture of the headsets to intercept the footage and send it to another machine. This is done in a way that does not impact performance of the benchmarking system, and that’s important since it’s just splitting the data. A splitter box sits between the capture machine and the gaming machine, and that feeds into a $2,000 SC-HD4 card from Vision, which is capable of accepting the high bandwidth from VR headsets, and also splitting things. We then use VirtualDub to capture the playback in the headset and on the gaming machine, and run color overlays that bake into the output from the FCAT VR suite. This can be later extracted to analyze delivered and dropped frames on a hardware level. And note also that you need a very fast SSD or rate SSDs in order to keep up with the data, because one minute of testing can easily equal a 50 GB file. We finally feed that file into our own compression script, which is creating the files that you see in this video, and those are compressed down to a level of hundreds of MBs rather than tens of GBs. On the gaming machine, we use FCAT VR to intercept frame time, delivered frame rate drop frames and warped miss data. Of note for today, frame times represent the time in milliseconds between each frame delivery, just like always.
VR headsets range from roughly 11 to 13 milliseconds for the time required to deliver a frame before encountering some sort of dropped frame or warped miss or other unpleasant action. And the extra two milliseconds there at the end is really just an approximation because the Rift can do some funny things with run time to help lengthen the amount of time before the frame is required, which is generally 11 milliseconds to hit that 90Hz refresh, but there is some stretching room after that.
The captured files look something like what’s on screen now, at which point we feed these into a spreadsheet and into FCAT VR, filter the data and output the data you’ll see in a test later. There are thousands of individual data points for each test, which is one minute long. And then the top row is several variables long as well. So there’s plenty of data there to analyze and interpret. The hard part is figuring out what to do with all that data once we’ve captured it.
We have a previous video explaining drop frames and warped misses and what those are, if you’re curious about those two specific names as it pertains to VR testing. And in that video we also talk about the VR pipeline. So that goes through this 11 millisecond window, where do you hit the run time, and when do you need to have the frame ready and dispatched, and things like that. So that’s in a previous content piece. As for the benchmarking itself for today, VR test execution is a big challenge. So VR testing in general is brand new. We’ve been working on this for a few months with FCAT VR and early iterations, but even with that experience it’s not a perfect setup. So a few things here to note.
First of all, there’s a major human element with VR testing that cannot really be easily resolved. So it’s, not only do we have to, as usually, execute test passes which are fairly identical, more or less, from one pass to the next, but we have to do that while in VR with a headset where you’ve got head movement. So the level of accuracy diminishes compared to a standard benchmark, where you’re just using a keyboard and mouse, and that’s something we acknowledge and basically use error bars in the graphs and the bar graphs to show a margin of error, which right now is a bit wider than I want it to be. We’re working on sort of tightening the variance between tests, but there’s only so much we can do before you run into issues with just VR in general not being a very friendly platform to benchmark compared to normal benchmarking.
So another example of this, outside of the human element, there’s also a randomized element with the games. A lot of the VR games right now are King of the Hill-style. Stand in the middle, you attack enemies that spawn around you. Those enemies are normally randomly generated, and depending on your performance one run to the next, it could be that you see more enemies or fewer enemies or whatever than in the previous pass, so that’s another challenge. Now fortunately, we have enough data from, again, the months of working on this, to know what the variance is and what the margin of error is one pass to the next with the games, and so we have that margin of error bar on the charts. It’s not too wide, it’s plus or minus 1.5% more or less, but that will give you an idea of where the numbers fall, or where they could fall one pass to the next.
Games used for the Benchmark Tests
The games benchmarked for today include the Oculus Rift version of Dirt Rally, Oculus Rift version of Elite Dangerous, and then we have the HTC VIVE running Raw Data, Arizona Sunshine, and Everest. If you’re curious about the test platform used and the specs of the machine, you can check the article link in the description below, but the most important element is the GPU and that was our GTX 1080 Ti hybrid model that we built ourselves.
For the first test we’re starting with the Oculus Rift and then moving onto the VIVE. Our Rift games include, again, Dirt and Elite. We’re testing Dirt Rally configured to high settings with the advanced blending option enabled. This first set of charts will contain all the data we have, while subsequent games will only contain head-to-head data, so they will be simplified and faster to get through. For all of it, if you want it for all of the games, check the article below.
Let’s start with the R7 1700 CPU at stock settings only. The left axis on this chart shows frame times in milliseconds. Lower is better, here. And we have a rough 11-millisecond window to deliver the frame in the Rift. Sometimes it can go up a bit, maybe around 13, as your max. The magenta line represents the hardware capture, while the red lien represents the software capture. The hardware capture cannot see what’s going on at a software level, and so only validates findings by illustrating dropped frames never delivered to the headset. The software line is more of what we’re interested in. The lower third of the chart, meanwhile, is an interval plot. This one helps us visualize delivered synthesized frames, dropped frames, and delivered new frames. With everything explained, now we can start the data analysis.
For this chart with the 1700 stock, we can see that the 1700 encounters a few dropped frames, or frames that were synthesized by updating head tracking and position without a full update to the scene. And right now, we’re playing video playback of the hardware capture for this run. The experience overall, as seen in the playback, is smooth and the dropped and synthetic frames go unnoticed during use. We don’t have enough to detect, as a human here, but we can detect it with tools, and we’ll talk about this more going forward. Because again, as a reminder, if you got 60 seconds for a test and 90Hz, that means you have 5,400 intervals. So a couple dropped frames is not a big deal. We’ll show average frame times and unconstrained FPS at the end of this charted section.
Next, the R7 1700 overclocked shows mostly the same performance, with one pretty bad drop at the end of the capture, but overall nothing too critical. The time to deliver frames is now shorter, with the 3.9GHz overclock, but there’s no major difference in user experience.
This next chart shows the 1700 stock and overclocked at the same time, where we can observe that the red line representing the overclocked values, is consistently faster in frame delivery than the yellow line, or the stock 1700. We are generally around 10 milliseconds for both devices, but we’ll show that value more explicitly in a moment.
And now, here’s Intel’s i7-7700K CPU with stock settings. The Intel CPU has a few drops, but just like the R7 chip, these are not appreciable to the end user. Frame times are closer to the eight to nine millisecond mark, with the hardware frame time spikes validating the software measurements exactly at the same moment.
We’re playing some gameplay of the Intel i7-7700K benchmark now, where you’ll notice that user experience is smooth and without any significant or noticeable hitches. Both CPUs can deliver a smooth experience, which is subjectively, to the human eye, the same. But let’s look at how they compare objectively.
Here’s a chart showing the 1700 and 7700K head-to-head in stock frequencies. Because we have standardized this benchmark with the same head movements, we can see that the frame time spikes generally align between both CPUs. Intel is faster overall in frame time delivery in this test, with each CPU dropping fewer than 20 frames throughout the entire run. It’s not until dropped frames are significantly greater on one config than the other that we’d actually notice them, so for all intents and purposes, the experience on each CPU is the same. That said, Intel is faster in its frame times on this particular title, and experiences shorter spikes when the going gets tough. But let’s look at a few more charts, first. One more frame time chart, and then we’ll look at the bar graphs.
This one shows the two CPUs overclocked, with Intel at 4.9, and AMD at 3.9GHz. Both overclocks are achievable on the majority of the respective chips. The dropped frames are similar, again, with the experience, again, being effectively equivalent. AMD is closer to the 11 to 13 millisecond cutoff window, but still within bounds, and so the experience is the same. Interestingly, Intel looses ground in overclocking compared to the stock benchmarks, and we’ll see that this trend repeats throughout the tests.
Let’s get a bar graph on the screen for better illustration. This chart plots delivered FPS to the headset, which is the most important metric, then dropped frames is the second most important metric, and then unconstrained FPS as a calculation. Unconstrained FPS is an imperfect prediction of how many frames would have been delivered per second if the HMD did not have a fixed update interval of 90Hz. Since they are fixed at 90Hz, really the most important item is delivered FPS. The unconstrained value is calculated by taking 1,000 milliseconds and diving it by the average frame time, which is done in the new FCAT VR tool automatically. The two hard metrics are, again, delivered FPS, which we can validate with an effectively infallible hardware capture setup, and dropped frames, also validated by hardware capture. Dropped frames are an absolutely measure in total frame count over the test period, which is 60 seconds. At 90Hz, a 60-second pass will produce 5,400 refresh intervals on the headset.
A final note: We currently have a test variance of roughly plus or minus 1.5%. In this chart for Dirt Rally, we immediately see that both the 1700 and 7700K are capable of delivering 90 FPS to the headset. That’s what we want. And we next see that the dropped frames have a range of 8, going from 3 to 11 dropped frames per test pass. As we saw in our previous charts, the dropped frames are not clustered tightly enough to be noticeable by the user. In the absolute worst case, the R7 1700 encounters 11 dropped frame over its 5,400 refresh intervals. To put that into perspective, that yields a 0.2% dropped frames for the test period. A user would not notice this, particularly when the dropped frames are spaced out over the entire period.
We next see that unconstrained FPS lands around 135 to 137 for the 7700K, while the R7 1700 is in the 105 to 115 FPS range. Again, this an extrapolation. We’re still seeing 90 on either device through the HMD. And let’s now move to the average frame time chart.
Here, we see where that number is calculated. This chart’s scale is set to 12 milliseconds, at which point you’ll probably start encountering drops, warped misses, or reprojection issues. The 7700K stock and overclocked CPUs perform effectively equally at 7.3 to 7.4 milliseconds. The R7 1700 shows a bigger gain from the overclock, just outside tolerances, landing at 8.7 milliseconds from 9.54 milliseconds. Comparatively, the 7700K stock experiences a 22% reduction in average frame time over the 1700 stock, and about a 14.8% reduction versus the overclocked R7 1700.
Let’s move onto the next game, finally. The next tested game is Elite Dangerous, which as hardware capture footage on the screen, now. We’re going to simplify the frame time charts, here, and only show comparative data. If you want all of the charts, check the article linked below. Elite has some issues with the VIVE, so we’re using the Rift. And this was configured with VR high settings in the game, and played in the VR training level.
This first chart shows the 1700 versus the 7700K both at stock frequencies. The R7 1700 tends to run slower in average frame time delivery, sometimes running against the limit, before we start encountering the Rift’s 11 to 13 millisecond refresh interval. Again, note that Oculus does some things in runtime to stretch that 11 millisecond refresh to add a little bit, hence the extra buffer at the end, there. As illustrated in the interval plot at the bottom, the R7 1700 and the i7-7700K are dropping and synthesizing a similar amount of frames. You can see that with the green, yellow, and red colors on the lower two charts. Neither of these is appreciable worse than the other. As a user, again, this experience is equal within the confines of human perception and with what the headsets allow. That said, the difference is statistically significant. You could make an argument that the extra headroom is valuable, but we have not yet found a scenario where we begin encountering noticeable jarring or stuttering on either the 1700 or the 7700K.
This chart shows that overclocking the R7 1700 tends to close the gap versus Intel, which is also now overclocked on the frame time and interval plot graph. That’s the same as we saw in Dirt Rally. It appears that overclocks don’t benefit Intel quite as much as they’re benefiting AMD, here, likely because of the R7’s lower staring frequency.
We’re seeing a range of six for the dropped frames, from 11 to 17 in the best and worst cases. And the i7-7700K OC and 7700K stock perform equally when looking at the bar graphs. That’s within variance, and further illustrates that VR benchmarking is not yet repeatable enough to analyze with tight margins and the hard statements as to what data we’re seeing. Even in the worst case of the 17 dropped frames, we’re still at 0.3% of all frames delivered as dropped, whereas 11 dropped frames would be 0.2%. Completely imperceptible to the user. The dropped frames are also dispersed enough to not matter,
With regard to actual delivered frames, we’re at 90 FPS for all four tests. Intel holds a lead in unconstrained FPS, as it has shorter frame latencies overall, as shown in the next chart that we’re on now for average frame times. We’re at roughly 8.6 milliseconds between the two Intel tests, and at 9.9 and 10.4 for the AMD tests. 10.4 is starting to push the limit of what we’re comfortable with, but still delivers a smooth experience in this game. The overclock keeps us reasonably distanced from the 11 to 13 millisecond mark. And again, there’s no perceptible difference between these skews, given that they all hit 90 FPS without any significant dropped frame counts that are noticeable.
On the screen now are test passes for the next game, Raw Data. We’re moving onto the HTC VIVE using Raw Data as a VR-specific title that’s shown some promise. This is another King of the Hill title, as you can see from the footage we’ve got on the screen. And these first results are with the game configured to high settings with zero MRS.
This frame time plot shows the 1700 versus the 7700K, both stock. Neither CPU struggles here, with the 1700 generally being faster in frame deliver. That said, both are below the six millisecond line on average. And the interval plot below shows that Intel encountered zero dropped or synthesized frames. AMD encounters a few, but they are not significant enough to be perceived by the wearer. There seems to be a trend between these two CPUs ’cause they’re both pretty good at what they’re doing, here. These two, for all intents and purposes, are again, effectively equivalent in this game.
Here’s the overclocked data. Same story, here. Intel drops no frames, but tends to run average frame times a bit over the 1700. Let’s just move straight to the bar graphs.
Delivered frame rate posts 90 FPS for all devices with AMD dropping six frames in each pass, and Intel dropping zero. This is, again, not really noticeable, but it’s worth plotting, because we can measure it. The Intel CPU is, again, showing limited gains from overclocking with both line items performing effectively identically in unconstrained frame rate, and that’s both 7700Ks. The same is true for AMD, this time, where overclocking doesn’t really change the fact that we’re around 229 unconstrained FPS.
Looking at average frame times, we see that the 1700 CPUs stick around 4.36 milliseconds, with Intel around 4.96 milliseconds. Looking back to the video capture of the game, which we can play on the screen, we see that both of these are so distant from an 11 millisecond refresh interval, as to be effectively equivalent in visual output. You could buy either CPU and experience an identical experience in the headset for this Raw Data pass we did on high, based on our initial testing. And more charts in the article, if you wanna see the rest of Raw Data’s data.
We’re now testing using Arizona Sunshine, which is another King of the Hill title on the VIVE and the Rift. We’re using the VIVE, here, and testing with advanced CPU features enabled, and very high texture quality. As you’ve seen while introducing this game, this is a zombie shooter game, and this test is conducted during the first swarm wave.
Here’s the first chart. Both CPUs perform at an equal frame rate, particularly when considering the certainty of test variance in this game. The R7 1700 encounters more synthed and dropped frames this time, but we’ve still not encountered a warp miss, which is important. We have not seen a single warp miss on any of these tests. Without any warp misses between these dropped frames, or without heavier dropped frame saturation in the interval plot, it remains fair to say that the experience between the two CPUs is really not that much different visually. Objectively, Intel does have fewer dropped frames in this title. We’re just facin’ a question of when does that become noticeable to the end user. And since VR testing is still new, it’s kinda hard to say. We need more experience in the game to really make a definitive statement on that.
Overclocking it helps the R7 1700, as seen here, and posts an effective zero change for the 7700K, again. And then finally, let’s move onto the next chart.
Here’s the FPS output. The R7 1700 drops 100 frames over 5,400 intervals, or 1.85%. That’s the most out of everything we’ve seen so far, but still not a big deal. And that’s reflected in the delivered frame bar, as well, where we’re seeing 87.5 FPS average versus 90 FPS delivered frames to the headset. The overclock gets us down to 38 dropped frames, but the gap between 257 FPS and 262 FPS on the R7 CPU shows that again, this test is imperfect. This is precisely why we added those error bars. 87 versus 89 FPS is also imperceptibly different, and the unconstrained frame rate does not post a statistically significant difference, either, though objectively, Intel is better in this title, at least with regard to dropped frames.
Very quickly, here’s the average frame time chart that supports that statement. Again, no statistical significant in the differences between frame times with this test configuration. They’re basically all the same.
Finally, Everest is more of a tech demo than the game. That’s the one we’re showing now. It’s got some high-quality visuals that are stitched together with photogrammetry. Last we spoke to the Everest devs, the demo used more than 30,000 photos of the actual mountain to build this environment. We’re using Khumbu Icefall for the test course, with setting configured so that the bars are two ticks down from max settings, and so that they’re all equal length in the graphics menus, which is really not a great graphics menu.
Here’s the frame time chart between the 1700 and 7700K stock. It’s also the most boring chart we’ve looked at thus far. The interval plot show really no activity other than mostly successful frame deliveries. Frame times look about the same.
Looking at the FPS chart quickly, we see the same thing. These numbers are more or less equal.
The average frame time chart, finally, blasting through these, reinforces that. Everything is running around six milliseconds, and the two CPUs produce an equal experience in this game, with neither superior to the other.
For the most part, there’s no real significant, appreciable difference between these two CPUs in these games. And normally, that would be kind of a boring thing, but because VR testing is so new, this data is good to have, because we can actually start building an understanding, as a community, as to what is a good experience in VR. And as we move toward testing things like R5s and I5s, we’ll probably start seeing more dropped frames and limitations in VR with CPUs. Which, of course, VR, being such a high-resolution thing, is almost natively a GPU bottleneck, but by controlling the settings in the games and lowering them down to things like high, with a 1080 Ti hybrid, we’re able to eliminate a good amount of that and still show some of the CPU limitations. It’s just that the R7 and the i7 do fine in this. They don’t really have an issue. In general, for the two Rift games we tested, Intel tended to do better, but it was not in a way that really mattered, ultimately. And the same was true when AMD did better in the one or two games where it posted superior frame times, whether that was a big or not swing. The difference to a user was the same.
so it’s an interesting challenge to test these things. The headsets are locked at 90Hz, your frame rate’s locked at 90 FPS, and unconstrained metrics are really interesting and cool to have, but it’s more for GPU testing, where you might have six GPUs on the chart, seeing an unconstrained FPS, in theory, helps tell you what the GPU is capable of in a more familiar metric beyond the 90 FPS limit, which is effectively a locked [inaudible 00:22:35] setting. Still, it’s an imperfect metric because it’s trying to extrapolate something that you’re not gonna see. Now, unconstrained FPS does kinda tell you, “Hey, this GPU or this CPU can prepare more frames for delivery,” so in theory, going forward, it might indicate that that particular product performs better, but it’s really too early to say if that’s how it’s going to play out. We’ll see.