The Trouble With Models

BY ED SPERLING, December 20, 2017 in Semiconductor Engineering

Models are becoming more difficult to develop, integrate and utilize effectively at 10/7nm and beyond as design complexity, process variation and physical effects add to the number of variables that need to be taken into account.

Modeling is a way of abstracting the complexity in various parts of the semiconductor design, and there can be dozens of models required for complex SoCs. Some are tied to a specific IP block, while others are abstractions of an entire system. There also are models that are meant to be lightweight and very fast, such as those used to model heat in a complex design or those used to determine which architectural approach will improve performance. Models also can be cycle-accurate, meaning they are direct representations of a particular function or even an entire chip.

The problem is that there is a lot going on at 7nm—multiple power domains, physical effects such as leakage and heat, RC delay, a very large number of possible use cases, and variation of all sorts—so not all models work equally well. And perhaps even more confusing, not all models work equally well together.

“The biggest challenges are in advanced circuits on advanced nodes,” said Oliver King, CTO at Moortec. “Accuracy is a relative term there. If you take a model of a device, it may be accurate for some elements. But that same model may not be useful to another person working on a different part of the device.”

There are some fundamental changes at each new node, as well, that substantially weaken the value of certain types of models, while leaving others as relevant as ever. So at 28nm and above, it was enough to model inductance to determine how signals would move through a device. That began to change at 16/14nm, and it completely changed at 10/7nm, where various types of noise—power, thermal, electromagnetic interference—are much bigger concerns.

“You now need to re-validate models,” said Magdy Abadir, vice president of marketing at Helic. “Electromagnetic crosstalk wasn’t even on anyone’s radar at 28nm, but we’re seeing it in finFETs. The problem is that it may or may not go through the substrate. EM waves can move in many mysterious ways. How do you model that?”

There are other issues, too. Models tend to identify what works and what doesn’t, but not all models are updated frequently enough to capture all of those changes. In complex devices, problems may not even show up until devices are in use in the market.

“With advanced nodes there is not a lot of data, so you basically have to make your own data,” said Mike Gianfagna, vice president of marketing at eSilicon. “We see this especially with lower-level models like SPICE models. You take a design, you get the measurement data and you send it off to the fab, and when you get it back the processor is not where the SPICE model said it would be. Then you have discussions about how to fix it, either by creating new models or re-centering the design. This wasn’t a problem at 28nm and above. The 1.0 PDK from the fab was considered the gold standard. Below that, it’s not that easy.”


Fig. 1: Chip thermal model. Source: ANSYS

Different types of models

There are models for almost everything in the design flow, from IP blocks to security to reliability and failure predictability.

“Each type of model has its own strengths,” said Tim Kogel, senior manager of applications engineering at Synopsys. “There are software models, but now power management is being dragged up into software. There also are system-level models, power models, and different models for emulation, which have more functional detail. One challenge everyone is facing is to figure out which models to use and how much to invest in creating those models. Then there is the problem of mixing and matching models.”

In general, models based on the most up-to-date and consistent data fare best. A model is only as good as the data used to create it.

“If a product is constantly changing, it’s difficult to build a model,” said Michael Schuldenfrei, CTO at Optimal+. “If you have high-running, long-term products, the model can’t understand and predict those changes. In the end, there’s no magic bullet to find everything that can possibly go wrong. You can’t run every possible scenario and catch every variable.”

It gets worse as the number of variables increases, too. That includes use cases, interactions with other IP blocks or chips in a system and security flaws. The need to constantly update those models increases, but so does the risk of something going awry.

And it gets worse again as the model sizes increase. “If you’re looking at the entire SoC, that’s where modeling really gets tricky because now it’s interacting with the chip, the package and the board,” said Arvind Vel, director of applications engineering at ANSYS. “It spans three different domains. And then you can add in time. Time-dependent modeling for something like power is very complex.”

At the system level, this can become cumbersome very quickly. To capture an Android phone’s boot up, for example, would require modeling the entire system, which is a gargantuan task.

“Every model comes with its own inherent shortcomings,” Vel said. “The thermal time constants for every system are different. For a phone, you’re typically looking at milliseconds of performance, but it actually takes seconds to reach a steady state.”

Modeling vs. brute force approaches

The result is that models are being used in a variety of ways—and not for everything.

Arm, for example, backs both cycle-accurate and fast models for analyzing how embedded processors work in complex designs. The company’s approach recognizes the pros and cons of both cycle-accurate models, which faithfully reproduce the complete functionality but which are too slow for many system-level tasks, and fast models, which are functionally correct but lack correct bus timing.

“There is always a desire for something faster,” said Bill Neifert, senior director of market development at Arm. “When you deliver something faster, customers want it to be faster again. No one is willing to trade off the functional accuracy of how the instruction set is executed, because without it a device won’t work. But if you’re just writing application software you may want to trade off certain features for speed, and lightweight models are a response to that. You will not be able to come up with an architectural decision or see other lower level activity such as caching or security based on those. So you can accelerate the execution, but not represent what the actual IP will do. To faithfully represent all of the IP behavior, a lightweight model won’t do, and for accurate bus cycles a step up to a cycle-accurate model is required.”

Maintaining accuracy requires diligence, though, and successive refinement. “It’s a continuous integration because you want to see if your initial assumptions hold,” said Frank Schirrmeister, senior group director for product management and marketing at Cadence. “Sometimes brute force techniques are a good alternative to abstraction. With emulation, you can run one second or one minute in real time and not have to abstract the data at all. This is particularly useful at hardware boot, where extreme parallelism can be competitive.”

And this is where things also can get really fuzzy. Bart Vanthournout, senior manager for R&D at Synopsys, said companies are regularly asking whether to use a brute force approach with emulation or simulation, or whether to utilize abstractions in models.

“What we’re finding is that we need to train users to know what to model, and more importantly, what not to model,” Vanthournout said. “This is all about speed and fidelity. With fidelity, what features does a model need to provide? And what can you run in emulation, simulation and with an FPGA prototype? Often there is value in combining use cases, if it’s all in RTL or on an FPGA accelerator. And you may accelerate this, for example, by putting the host processor in a simulator and the rest on an emulator. But then what do you do with it?”

There is no single answer that fits all designs. The reality is that additional work is required no matter what route chipmakers take.

“The key is successive refinement,” said Cadence’s Schirrmeister. “It’s continuous integration. You want to see if the initial assumptions hold. But you also need more data and more accurate models within the balance of your assumptions.”

Safety-critical models

A big unknown here is how models will fare in markets such as automotive, where requirements for safety and security depend on accuracy.

“One of the big issues is degradation modeling,” said Roland Jancke, head of the department for design methodology at the Fraunhofer EAS. “This depends on electrical loads, thermal loads and mechanical loads. How do you account for vibration? If you put a PCB into a car, there are only a few ways to verify it. And with external mechanical stress, it might no longer work. In addition, there are critical parts that generate noise. You need to model for cross-coupling in circuits.”

This is something rarely understood outside of the analog world, let alone in the automotive market. But it has crossed over into the digital world at 10/7nm and beyond because of thinner insulation layers, thinner wires, and the same or higher clock frequencies.

“We’re finding companies are struggling with complexity of finFET models now,” said Moortec’s King. “This used to be something analog designers worried about because analog modeling was never very good. The concept of pure analog is long gone at these nodes because many IP vendors are using digital power at 7nm. But because process variation is decoupled, the traditional method of simulating circuits is not holding up. So for the analog portion you still do Monte Carlo models. But for timing, you do a clock tree at one process corner and timing at another.”

That’s very difficult to model, and it’s one of the issues that has cropped up advanced nodes. It also makes it hard for other design team members to utilize those models because they don’t mesh with other models being created for a design.

“It turns into an integration nightmare,” said Kurt Shuler, vice president of marketing at ArterisIP. “Each model has to be at the right level of abstraction. There are big tradeoffs you have to make between having enough data and stripping away too much data from models, which can make them useless. So there are three things you have to worry about here. One is the speed of the model. The second is the abstraction, which is a reflection of how much data it gives you. And the third is visibility into the model. So the model gives you more visibility than the RTL, but if it’s not at the right speed, visibility and accuracy level, it’s not useful.”

On top of that, it’s becoming harder to tell exactly what is useful because it varies by node, by project and by market. And there are so many new markets cropping up these days that it’s not exactly clear what will become critical factors within these markets because there is little to no data available upon which to base some of these models.

“We’re seeing more and more segments that are new,” said Anush Mohandass, vice president of marketing and business development at NetSpeed Systems. “Think about automotive, hyperscale storage and 5G. These are all emerging segments, and there is little legacy.”

Models based on historical data tend to be more accurate. But historical data doesn’t necessarily work at new nodes with some of the various physical effects that are now first-order problems, either. “Once you have all of the parasitics, you can create better models,” said Helic’s Abadir. “Crosstalk, power, timing and thermal all depend on parasitics and models. But models are developed with assumptions, and some of the assumptions were developed in the 1990s. Those are still being used for some of the models being built today.”

This can cause issues in any design, but it’s particularly problematic in safety-critical markets, where models need to tie into the actual implementation at some level.

“In that case, a cycle-accurate model from an actual implementation is needed,” said Neifert. “There is increased software content too, so virtual models play a larger and larger role here. In many cases, this is a new methodology that is being put in place. With the design cycles associated with automotive in the past, they could take their time and still meet their deadlines, but there was much less software content there. This is the first time we’re seeing SoCs and processing tasks for some automotive functions, and that needs a whole new level of modeling.”

There seems to be a fair amount of agreement on that point, but how that actually is implemented isn’t entirely clear. “We abandoned highly accurate fault simulation years ago because while it worked on designs of several thousand gates, it became unwieldy with millions or hundreds of millions of gates,” said Mark Olen, product marketing manager at Mentor, a Siemens Business. “Models are no longer quite so accurate by definition, and when we run tests we assume they are mostly correct by construction. That’s fine if you drop a cell phone call. It’s not okay if it causes an accident in a car.”

Olen said the Portable Stimulus may help in this regard—particularly a lesser known part of this standard. While most of the focus with PSS has been on re-using stimuli for IP blocks on a simulator, it also includes a way to intelligently solve problems across a wider range for a specification. “As it generates more test cases, it provides a mathematical way to keep track of what’s been done before. It also can adapt, because as it traverses specs it can understand the resources available at any point in time and make changes based on the number of DMAs (direct memory access engines) or interrupts or how much memory is accessible.”

Conclusion

Put this all together and it becomes harder to determine where models work, when and how to use them, and which models work well with each other. Joe Davis, product marketing director at Mentor, said DFM models have consistently been able to deal with more corners and rising complexity. But the big advantage there is that foundries are sharing data much more than in the past. “Good models are not necessarily the hardest part. It’s coming up with a model that design teams can utilize,” Davis said.

In other parts of the design, the data used to create models is much weaker, and this is particularly true at the most advanced process nodes. And to make matters worse, the siloed nature of designs tends to favor creation of models for specific purposes, without regard to how they can be integrated with other models by different groups within the design flow.

In all cases, models need to be seen as useful tools, but ones that also have some flaws. That doesn’t necessarily make them less valuable. Still, the key is understanding those flaws and working around them. As statistician George Box famously wrote in a 1978 book (with co-author William Hunter), “All models are wrong; some models are useful.”