Nehemiah – new VIA C3 core and its prospects

I can’t say that the whole world was waiting with trepidation for the new processor core from the well-known and diversifying manufacturer of various peripheral controllers, mainboards, wall and table radiators and even chipsets – Taiwanese company VIA Technologies. Nevertheless, there was a certain interest toward this event, especially from reviewers of computer hardware who were tired to describe and measure in scores, fps and seconds their delight about another several hundreds of Megahertz of an ordinary top processor which however hadn’t suffered much without them. And they were interested just for the sport of it. You can even stake on it: will all x86-applications finally work on the originally x86-compatible processor? And other questions alike.

We have satisfied our interest and now suggest that you do the same.

VIA C3 «Nehemiah»

c3-1ghz-nehemiah-back c3-1ghz-nehemiah-front

It’s impossible to recognize the new core according to the top cover mark (the suffix «A» implies only the presence of L2 cache – like the first Celeron processors which were coupled with the L2 cache did) 

To brush up the details of the architecture of the VIA C3 processors look through the comparison of the low-end solutions which, in particular, demonstrates the 1GHz VIA C3 processor based on the Ezra-T core (which is participating in the today’s tests as well).

The new-comer has some fundamental differences from its predecessors: it’s the first time since Samuel2/Ezra/Ezra-T when the core has really significant differences. The basic characteristics of the Nehemiah are the following:


  • Package: standard CPGA or EBGA for Socket 370
  • Compatible with mainboards of the respective processor socket
  • Full x86 compatibility (I hope so :))
  • MMX and SSE processor instructions sets supported
  • 64KB 4-channel data & instructions L1 caches
  • 64KB (exclusive) 16-channel L2 cache working at the full core’s frequency
  • Two 8-channel translation lookaside buffers (TLB) for 128 entries with two PDC for 8 entries each
  • Branch Target Address Cache for 1024 entries
  • Traditional for VIA processors :) unique Advanced Branch Prediction Algorithm (StepAhead trademark)
  • 16-stage pipeline
  • FPU working at the full core’s frequency
  • APIC support in future core steppings
  • Bus frequency: 133 MHz (can also operate at 66 or 100 MHz)
  • 0.13-micron manufacturing process with usage of copper connections
  • Core size: 52 mm2
  • Frequencies (at the time of the announcement): 1.0, 1.06, 1.13, 1.20 GHz
  • Core voltage: 1.4 V (can alter in the future models)
  • Max operating case temperature: 70°C for CPGA package and 85°C for EBGA package
  • Dissipated power: 5.0-5.3 W in the standby mode, 15.0-18.4 W max. for CPGA (18.5-21.3 W for EBGA because of the higher max operating case temperature, the measurements are made at)

[Nehemiah CPUID standart flags] [Nehemiah CPUID extended flags]
The core has some more architectural peculiarities but in our case too detailed description will be hardly interesting.

Now let’s look at the differences between the Nehemiah core and what it can be compared to. It looks perfect when compared with the Ezra-T: extended and improved caches, full-speed FPU unit, and SSE support promise a good performance gain. (Note that possibility of the processor to execute SSE instructions does not guarantee a certain speed of their execution, but, still, we can count on a certain performance increase, otherwise such support makes no sense.) The extended pipeline implies a higher frequency potential (also meaning overclocking).

However, there are some unpleasant moments. First of all, the heat dissipation of the processor has increased markedly: although the 1GHz model dissipates only 15 W (comparing with 27.8 W of the Celeron Tualatin 1A GHz and 46.1 W of the Duron Morgan 1 GHz), it requires active cooling, thus killing the first key point of the C3 processors. Secondly, the Nehemiah requires Tualatin-compatible mainboards, i.e. it doesn’t suit, for instance, for upgrade of the i440BX based system. Theoretically, it’s possible to construct adapters for attaching Nehemiah processors to old boards, but I think no one will mess up with it.


Now, to make up for the sad news from the architectural front let’s turn to the favorite entertainment of computer fans: overclocking. The Celerons were twiceclocked so many times already, that kind bridges on the Athlons had really entered our lives, and now we have the VIA’s products to mock at! Forget about overclocking effectiveness – we are going to work, not to calculate the percentage gain. :)

Apart from FSB overclocking which mainly depends on capabilities of the mainboard (if other components, for example, memory, do not put obstacles), there is also a CPU multiplier at our disposal. It’s locked in the C3 processors on the MSR level (special internal CPU registers), not on the hardware level, and, actually, it’s possible to use software to unlock. The changes will remain until the next reboot, that is why it’s rational to make them through the BIOS (the edited one) or through the program that starts with the system.

But there is a better way! On the CPU’s back, right above the marked die, there are some peculiar bridges. Linking up the bridges (with a lead pencil, special conductive adhesive, varnish, solder etc.) will set a respective bit in the MSR register to «1» when the power is supplied, while disconnecting them (not recommended for pins linked up at the factory because of possible mechanical damages) will set it to «0». After some experiments with the bridges we managed to lift up the multiplier: when all the bridges were joined together the internal multiplier turned into «x16», but since it’s out of use (at least this moment) our tested model started at 133×9=1200 MHz.

Increasing the bus frequency, we reached on the ASUS TUSL2-C mainboard a maximum of 148×7.5=1100 MHz, and 144×9=~1300 MHz for the processor unlocked to x9 multiplier. So, the overclocking potential of the new core is higher than of the old one, but there’s nothing to compare with Tualatins that start with the 100 MHz bus, but end with the 150+ MHz, thus giving the 50+% effect.

Performance estimation


  • Processors:
    • VIA C3 (Nehemiah) 1 GHz (133×7.5), Socket 370
    • VIA C3 (Ezra-T) 1 GHz (133×7.5), Socket 370
    • Intel Celeron (Tualatin) 1.3 GHz (100×13), Socket 370
  • Mainboards:
    • VIA C3M266-L (BIOS 1.03) on VIA CLE266
    • ASUS TUSL2-C (BIOS 1012 beta 008) on i815EP B-step
  • Memory:
    • 2×256 MB PC166 SDR SDRAM DIMM Tonicom, CL 2 (worked at 100 MHz in case of Intel Celeron (Tualatin) and at 133 MHz in case of VIA C3)
    • 2×256 MB PC3200(DDR400) DDR SDRAM DIMM Winbond, CL 2 (used as PC2100(DDR266) in the tests)
  • External video card: ASUS 8200T5 GeForce3 Ti500
  • Hard drive: IBM IC35L040AVER07-0, 7200 rpm


OS and drivers:

      • Windows XP Professional SP1
      • DirectX 8.1
      • VIA 4-in-1 4.45
      • VIA CLE266 VGA Drivers 01.19.01
      • Intel chipset software installation utility 4.00.1009
      • Intel Application Accelerator 2.2
      • NVIDIA Detonator v28.32 (VSync=Off)

Test applications:

    • CPU RightMark 2002 RC2
    • RazorLame 1.1.4 + Lame codec 3.92
    • VirtualDub 1.4.9 + DivX codec 5.02 Pro
    • WinAce 2.11
    • Discreet 3ds max 4.26
    • BAPCo & MadOnion SYSmark 2002
    • MadOnion 3DMark 2001 SE build 330
    • Gray Matter Studios & Nerve Software Return to Castle Wolfenstein v1.1
    • DroneZmarK

Note that we didn’t test the overclocked version of the new C3, because it really interesting to compare Nehemiah with its predecessor, which was tested of 1 GHz frequency. As for comparison with the Celeron (Tualatin)… well, you’ll see it with you own eyes on the diagrams. We will return to this point later to make a conclusion.

Also, it was quite interesting to use in tests a new board from VIA (VPSD) based on the CLE266 chipset. Our regular readers are familiar with the performance of this chipset already, and now VIA demonstrates the general-purpose model of the mATX format. Due to the peculiarities of the CLE266 (weak integrated video (for games) and lacking support of an external AGP port), this board is doomed to budget office or niched home computers, though only tests will clear all the things up.

c3m266-sb c3m266-nb

Test results

Let’s start with CPU RightMark and see what talents the new C3 core can boast of.

Differential equations solving which use only ordinary x87 and MMX instructions shows the effect of the twice speeded Nehemiah FPU perfectly clear (almost 1.5 times), but you can also notice that even the new C3 has still a long way to go to catch up with the Tualatin.

At rendering the Nehemiah’s FPU has a 1.5 times better performance again; the SSE instructions are executed indeed, with the speed comparable to the Celeron. Sorry, but even with SSE, even considering a higher frequency of the Intel’s processor, Tualatin is still twice faster, no doubt.

Note that the C3 Ezra-T failed this test as the Lame codec encountered some errors (according to the programmers of the codec – on the standard MMX code). Here C3 demonstrates a weak calculation speed comparing with the Celeron Tualatin (as we already estimated in CPU RightMark) and the CLE266 chipset has a negative effect on the speed, even though video is not used in this test and the DDR bandwidth on this chipset is theoretically twice higher.

The scores obtained in video stream encoding into the MPEG4 format are not dull at all. Two hours on the Ezra-T, an hour on the Nehemiah and half an hour on the Tualatin give a clear picture. There is one more factor that affects performance of the tested systems: either core of the C3 has a very low memory exchange rate. Writing memory keeps all models look equal, and even Celeron is a looser (~200 MB/s against ~230), while for reading – Intel’s processor reads data at 750 MB/s which is almost 2.5 times faster: C3 doesn’t get a real benefit even of the theoretical CLE266+DDR266 bandwidth. Moreover, continuous videomemory calls under Windows GUI, though they take less than a percent of the overall time, contribute to the low speed of the system. It’s obvious that the chipset would work much better with Intel’s processors (of course, when there’s no need in gaming or other 3D performance).

We’ll comment results of these two very different applications together because there is one odd fact that brings them together. The Celeron is, as always, far ahead, though the gap is not so big in WinAce, because the test highly depends on the memory speed. The CLE266 slows down the Nehemiah, very usual, though the slowdown never exceeds 5%. But the last couple performs in quite a strange manner: Ezra-T is almost on the same level as Nehemiah. Hmm, the new core must be superior, at least, due to the FPU speed – final rendering is craving for resources of the coprocessor.

Equal results mean that the processor has nothing to do with it. Well, I can assume that productivity of these two applications is limited exactly by the low memory read speed on the C3 processor. As we pointed in our previous reviews of Tualatin core, there was very close situation with rendering, though the scores were higher. OK, now we see Tualatin is ~2.5 times (~750/~300 ?) faster in 3ds max (that require quick reads from memory mostly) and a bit lower in WinAce (where memory writes take considerable time).

In the synthetic benchmark SYSmark the Nehemiah raised the standard of its ally who fell down in the fight against the test packet’s script. Results look pretty good, but they don’t have much common sense, because the final scores are just scores, not the real seconds of application response on the user’s actions.

Comparison of the C3 and Celeron in games will hardly give us more material to discuss. Now it’s time to test and discuss the integrated video of CLE266 (CastleRock). The lack of Hardware T&L is well noticeable in the general 3DMark scores and it allows us to estimate a performance level in modern games developed for HTCL. However, in certain games with the software emulation it’s possible to reach the level of a certain, quite low, degree of playability.

As there is no standard OpenGL driver for CastleRock, Return to Castle Wolfenstein and DroneZ refused to start running on this system. Besides, I would attract your attention to the fact that high resolution and hard graphic settings in DroneZ slightly drops performance of the Nehemiah thanks to increased computing power of the new core comparing with the old one.

VIA C3M266-L coupled with the Nehemiah hit only 11.5 fps in Serious Sam: The Second Encounter (Speed@800x600x16), while we were able to see any movie either DVD or MPEG4 at highest quality without any delays. So, taking VIA’s solutions as a single whole, the C3 «Nehemiah» + VIA C3M266-L can make a good budget system or a TV console.


We didn’t aim to maintain purity and academic precision of the tests comparing processors of the different architecture running at different frequencies. The matter is that the VIA C3 processor comparing with the Intel Celeron, remains «faceless» (it’s noticeable even without frequency adjustment), though it was significantly improved. Moreover, I don’t think it really makes sense to compare the new C3 with the old one, because considering Celeron scores I don’t think that it makes any sense at all to pay attention to performance of VIA’s processor. This is the only, unique, modern x86 processor which de facto doesn’t have this attribute – performance. :)

The price of the C3 Nehemiah tends to be about $30, that is too close to $40 for the Celeron Tualatin already available at 1.4 GHz. At the same time, the new C3 has much lower… OK, has NO performance, and only gives a chance to save electric power.

But we were really happy that VIA finally released a processor which executes the x86 code flawlessly. The SSE instructions work and do this perfectly! Had this processor entered market a year ago, it could have become a noticeable player.

But today we can only discuss technical advantages and disadvantages of the core, not speak about market share and production news… OK, let’s take the added SSE instruction execution unit and the removed (according to flags of the MSRs) 3DNow! instruction execution unit. We still don’t have complete version of the VIA’s documentation at the moment, but we are aware of the modern tradition to use internal processor’s units which translate external instructions to the quite different internal ones to execute, so linking this tradition to the fact that one execution unit appears while another disappears, as well as considering VIA’s tending to REuse everything over and over again, we finally want to ask: whether anything appeared or disappeared at all? Maybe, C3 has one special universal unit that executes an internal RISC-like microcode and Nehemiah use new translator for it that differs from the Ezra’s one? This is just an assumption, but, still… Think of it, not of market prospects.