Primate Labs has just released Geekbench 6, a new version of its popular benchmarking app. The new version includes new tests and new datasets to better measure performance. The new tests include background blur, akin to the tech used during video conferences; photo filters, similar to those used by modern social media apps; and object detection for AI workloads. The new datasets include higher-resolution photos to align with those captured by the best phones of today (12 to 48MP), and bigger and more modern PDF examples.
One big change for Geekbench 6 compared to Geekbench 5 and other versions is in the way that multi-core scores are calculated. Previously multiple individual tasks were created and measured to see how quickly they would complete. The more cores you had the quicker they would complete. However, in Geekbench 6, one workload is used and all the cores work together on that one shared objective. It is still true that the more cores you have, the quicker it will complete. However, there is now interaction between the cores.
To coincide with the launch we had the chance to chat with John Poole, the CEO and Founder of Primate Labs, and the original author of the early versions of Geekbench, to discuss all things Geekbench 6.
For Geekbench 6 benchmark results, be sure to check back in with Android Authority throughout the coming days and weeks as we’ll be putting it through its paces with the latest and greatest devices. You can also check out a full transcript of our interview with John Poole at the link, or by watching the video above.
Geekbench 6: Is it a synthetic benchmark?
Back in 2003, Apple released the world’s first 64-bit desktop computer, the Power Mac G5. Poole bought one, but once he got it home it felt that it wasn’t much faster than the previous generation. So he downloaded some standard benchmarks of the time, but after some testing, he realized that the existing benchmarks weren’t doing a very good job. So he decided to write his own! Fast forward three years and Geekbench 1.0 was released to the public. Nowadays, Geekebench is the de facto standard for testing consumer computing devices, everything from laptops and desktops, to Android and iOS phones.
Despite its popularity, some people still have a deep mistrust of benchmarks as they claim they are synthetic and don’t represent real-world usage cases. I put this question to John. “So in Geekbench 6 we’ve got fifteen separate workloads that we used to measure CPU performance, and we’ve tried to pick a variety of different tasks that reflect, we think at least, what people use their computers for day-in-day-out or what they use their smartphones for a day-in-day-out,” he told me. The focus for Geekbench 6, Poole says, is to, “really narrow-in to what people are actually going to do with their computers.” He continues:
So we really trying to narrow-in to what people are actually going to do with their computers. So something like compression is important because when you download apps on your smartphone, Android will unpack and then install them. Other things like HTML tests are in there because people spend so much time in their web browsers today, that’s an important metric to capture. Other things that came out of the pandemic, things like video conferencing and we have a background blur workload for that Zoom effect of where your face is visible but your background’s not, that suddenly become a new workload that wasn’t even relevant three or four years ago.
He added that, “We try and look at what’s going to be interesting to users, what is actually CPU-intensive, what’s actually going to matter for the device day-in-day-out. We really don’t want Geekbench to exist in a vacuum, we want it to be representative of what people actually do.”
Can we compare Geekbench 5 scores with Geekbench 6?
Poole confirmed to me that you can’t compare Geekbench 5 scores to those of Geekbench 6 as it is a completely new benchmark. For Geekbench 5, the scores are calibrated against a reference score of 1,000, which is the score of an Intel Core i3-8100. A higher score indicates improved performance, and a doubling of the score means a two-fold enhancement in performance. The baseline changed for Geekbench 6, it is calibrated against a baseline score of 2,500, which is the score of an Intel Core i7-12700.
Interestingly, Poole also pointed out that you can’t necessarily compare one point release (say 5.0) to another point release (like 5.1):
There is always going to be feedback that we’re going to get after we ship a benchmark, someone’s going to point something out, and we go ‘oops,’ we made a mistake there, we should fix that. We always try to do that in the first month or two, so 6.0 to 6.1, will it be comparable? It’s hard to say, but after that point, we really try and keep the benchmark comparable for the 6.1, 6.2, 6.3 etc. up Usually when we do a point release it is because we are adding support for new hardware. So if you’re benchmarking new hardware you might want to just use the newer version. For the most part, it’s comparable, we try and call out explicitly where it is or isn’t comparable in the release notes.
Can we compare desktop and mobile performance based on Geekbench scores?
I sometimes get comments on Gary Explains that Geekbench is better optimized for one system and not another, leading to a disparity in scores between desktop and mobile. I asked Poole if Geekbench is equally optimized for all systems, “absolutely, we spend a lot of time [on that].”
“Let’s say as an example we’ve gone and written a NEON version of a function, we don’t want to take that NEON version and try and grafted on to an SSE version,” he explains. “We try to write things in a way that’s natural for the specific instruction set, that leverages the advantages, and is mindful of the disadvantages of that instruction set. So that we get something that should be comparable across both platforms.”
Hardware acceleration, optimization, and the ‘Hardware Computer Museum’
Processors, whether in desktops, laptops, or smartphones, tend to have hardware acceleration for different tasks like cryptography, or video encoding/decoding. Plus there are special instructions sets like SSE and AVX on x86-64, or NEON and SVE on Arm chips. I asked Poole, what Geekbench’s approach to hardware acceleration is. The first point he made was that Geekbench doesn’t include any specific video encoding tests. This isn’t because they don’t want to include them, but because all the modern video encoding systems need to be licensed and have patents attached to them. So for the moment, Primate Labs has steered clear of them. But for other tests like an Instagram-style filter test, the engineers use what a common application would use, so for Arm that would be NEON (and SVE is coming soon, maybe in Geekbench 6.1), and for x86-64 that means SSE and AVX2.
Primate Labs takes the development of Geekbench seriously — and Geekbench 6 is no exception.
“We work with hardware companies, the ones who authored or implement the instructions, we work with them to make sure that what we’ve got is not necessarily the very best that it can be, but that it’s a fair and representative sampling of what the instructions usage might be,” Poole explained. “We do that with all the various instruction sets that we support, so whether it’s NEON on the Arm side, whether it’s AVX on the x86 side, we try and make sure that what we have written is fair and reasonable.”
All of the big decisions are made in Primate Labs’ testing and development environment — nicknamed “The Hardware Computer Museum” — which houses over 150 test devices, from an Intel Core Duo system right up to Raptor Lake systems (i.e. using an Intel 13th-generation of Core processor). I joked with Poole that I would really like to see a tour of that lab! He agreed that a tour of the lab and their development process would be useful “because I think it would swage a lot of those fears that people have about Geekbench being a black box, ‘who knows what goes into it?’”
Tour or no tour, Poole is very clear about how seriously they take the development of Geekbench — and Geekbench 6 is no exception.