AI Infrastructure Is Not One Trade

Product exposure, process-control exposure, and the physical bottlenecks behind the AI capex wave

AI Infrastructure Notes | Part 3

Sinclair Huang

A reader recently left a comment on my ABF substrate piece that stayed with me.

His point was simple: CoWoS and HBM get most of the attention, but substrate materials often sit below the level where many equity models even begin.

I think that observation captures a broader problem in AI infrastructure analysis.

The market usually prices the visible bottleneck first.

GPUs are visible. HBM is visible. CoWoS is visible. CPO is becoming visible. Liquid cooling is visible. Power equipment is becoming visible.

But the durable constraint often sits one layer lower.

Not always in the branded product. Not always in the headline technology. Not always in the part of the supply chain with the cleanest narrative.

Sometimes it sits in substrate material qualification.

Sometimes in CMP.

Sometimes in hybrid bonding.

Sometimes in metrology.

Sometimes, in liquid-cooling reliability.

Sometimes in utility interconnection.

Sometimes in the boring field-service problem that only appears after the system leaves the factory.

That is why I do not think AI infrastructure should be analysed as one trade.

It is a sequence of constraints.

And the most important question is not:

Is this company exposed to AI?

The better question is:

Which constraint layer does it actually control?

The 60-second version

AI capex is no longer only a GPU story.

It is moving into networking, optical interconnect, advanced packaging, substrates, cooling, power delivery, grid infrastructure, and manufacturing process control.

But not all AI infrastructure exposure is the same.

I think it is useful to separate the three types.

Product exposure is the easiest to see. These are the visible things the market can narrate: servers, optical modules, switch ASICs, cooling units, power shelves, racks, cables, cabinets, and equipment boxes.

Process-control exposure is harder to see. It sits inside the manufacturing windows that decide yield: ABF substrate complexity, fine-line routing, warpage control, CoWoS process stability, SiN waveguide loss, Cu-Cu hybrid bonding, CMP, inspection, metrology, and reliability qualification.

Infrastructure exposure is slower and often underestimated. It includes data-centre power, substations, transformers, grid interconnection, cooling deployment, field service, energy availability, permitting, and physical construction bottlenecks.

The market tends to price product exposure first because it is easier to explain.

But the bottleneck often migrates toward process-control exposure and infrastructure exposure, because those are the layers where time, yield, qualification, and physical deployment cannot be compressed easily.

So the due diligence question should change.

Do not only ask:

Who sells into AI?

Ask:

Who controls the bottleneck that prevents AI infrastructure from scaling?

What the “AI infrastructure trade” gets wrong

There is nothing wrong with saying that AI infrastructure is a powerful industrial and investment theme.

It is.

The problem starts when the entire theme gets compressed into one label.

A server assembler, an optical-module supplier, a substrate manufacturer, a CMP tool vendor, a liquid-cooling integrator, a transformer manufacturer, and a utility-facing data-centre developer may all be connected to AI capex.

But they are not exposed to the same constraint.

They do not have the same margin structure.

They do not face the same qualification cycle.

They do not scale on the same clock.

They do not fail in the same way.

The market likes simple categories:

AI servers.

Optical communication.

Advanced packaging.

Liquid cooling.

Power infrastructure.

Those categories are useful as a first screen.

They are not enough for real analysis.

The more useful framework is to ask what kind of exposure each company has.

Is it selling a visible product into the capex cycle?

Is it controlling a narrow process window that determines yield?

Or is it sitting inside a slow physical infrastructure constraint that cannot be solved by simply placing a larger purchase order?

Those are very different businesses.

They may all benefit from AI.

But they are not the same trade.

1. Product exposure: easiest to see, easiest to over-narrate

Product exposure is where the market usually starts.

This includes servers, racks, switch ASICs, optical modules, transceivers, cooling units, CDUs, power shelves, cables, cabinets, and other visible pieces of the AI data-centre buildout.

Product exposure is attractive because it is easy to understand.

If hyperscalers build more AI clusters, they need more hardware.

If clusters get larger, they need more networking.

If rack power rises, they need more cooling.

If data centres grow, they need more power equipment.

This is the part of the story that can be explained quickly in a chart or a news headline.

That does not make it wrong.

It just makes it incomplete.

The risk with product exposure is that the market often confuses shipping into the theme with controlling the bottleneck.

A company may sell a product into the AI infrastructure and still have weak pricing power.

A company may grow revenue quickly and still compete in a modular, replaceable, or customer-concentrated part of the stack.

A company may be close to the customer but far from the real constraint.

That is why I would not stop at the question:

Does this company have AI exposure?

I would ask:

Is the product scarce because it is technically hard, operationally hard, qualification-limited, supply-constrained, or simply in temporary demand?

Those are different answers.

And they lead to different conclusions.

2. Process-control exposure: where the visible story becomes manufacturing reality

Process-control exposure is harder to see because it often sits below the product layer.

It is not always the part that appears in the headline.

It is the part that determines whether the headline can be produced at yield.

This is where I think many AI infrastructure discussions are still too shallow.

The market sees CoWoS capacity.

But beneath that are process stability, warpage, bumping, substrate availability, thermal-mechanical reliability, inspection, and yield learning.

The market sees HBM.

But beneath that are stacking TSVs, thermal limits, test complexity, packaging integration, and supply-chain coordination.

The market sees CPO.

But beneath that are SiN waveguide loss, optical-grade CMP, Cu-Cu hybrid bonding, Ge photodetector dark current, metrology, reliability, and package-level yield.

The market sees ABF substrates.

But beneath that are material qualification, fine-line / fine-space capability, layer count, warpage control, laser drilling, plating uniformity, inspection, and long customer qualification cycles.

These are not cosmetic details.

They are the places where volume plans either become production or remain roadmap language.

This is why process-control exposure can be more valuable than it looks.

It does not always announce itself with a clean AI product name.

It may appear as equipment, materials, consumables, inspection, metrology, process integration, reliability testing, or a supplier with deep manufacturing history.

The market often underprices this layer because it is difficult to narrate.

It is easier to say:

This company sells optical modules.

It is harder to say:

This company controls the process window that allows an optical engine to yield at scale.

But the second statement may matter more.

ABF is a good example of the hidden layer

ABF substrates are a useful example because they sit below several more visible AI narratives.

When investors discuss AI accelerators, they usually talk about GPUs, HBM, advanced packaging, and sometimes CoWoS.

That is reasonable.

Those are visible bottlenecks.

But the substrate layer is where the package becomes physically routable, mechanically stable, and manufacturable.

As package complexity rises, substrate requirements do not just scale linearly.

Routing density increases.

Layer count rises.

Warpage becomes harder to control.

Thermal-mechanical stress matters more.

Reliability windows tighten.

Inspection and yield become more important.

In other words, the substrate does not become important because it is a fashionable AI component.

It becomes important because the rest of the system is asking it to absorb more complexity.

That is a different kind of importance.

It is not the most visible layer.

But it can become one of the layers that decides whether the visible layer scales.

This is the pattern I think repeats across AI infrastructure.

The market first finds the named product.

Then the constraint moves into the process layer underneath it.

CPO is another example: not optics alone, but a manufacturing window

CPO is often framed as optics moving closer to the switch ASIC because copper is running out of room.

That framing is directionally right.

But it is incomplete.

The harder question is whether the manufacturing window survives.

SiN waveguide loss is not just an optical-design problem.

It depends on deposition, film quality, sidewall roughness, top-surface roughness, CMP, annealing, and wafer-scale variation.

Cu-Cu hybrid bonding is not just a packaging slogan.

It depends on BEOL copper, Cu recess control, surface oxidation, void suppression, bonding yield, and interface reliability.

Ge photodetector dark current is not just a detector-spec problem.

It depends on epitaxy, implant damage, threading dislocations, anneal history, and BEOL thermal budget.

Each module may look mature by itself.

The difficulty appears when all of them have to coexist inside one manufacturable package.

That is why I think CPO should not be analysed only as:

Who has optical exposure?

The better question is:

Who controls the process window that the optical story depends on?

This is the difference between product exposure and process-control exposure.

Product exposure tells you who is near the narrative.

Process-control exposure tells you who may control the yield.

3. Infrastructure exposure: the slow clock

The third category is infrastructure exposure.

This is the slowest and often the most misunderstood.

AI data centres do not scale only by ordering more chips.

They require power, land, grid connection, substations, transformers, cooling systems, construction labour, water or liquid-cooling infrastructure, permits, and operational reliability.

Some of these constraints move on a semiconductor clock.

Many do not.

A chip roadmap may move in quarters.

A packaging capacity expansion may move in years.

A grid interconnection, substation buildout, transformer procurement cycle, or site-level power upgrade may move on an even slower clock.

This matters because AI infrastructure is increasingly a power-delivery and heat-removal problem.

The bottleneck does not always remain inside the chip.

It can migrate outward.

From GPU allocation to advanced packaging.

From advanced packaging to substrates.

From substrates to networking.

From networking to cooling.

From cooling to power delivery.

From power delivery to grid connection.

At each stage, the market looks for a new product category.

But the real question is again:

Which constraint layer is hardest to expand?

That is where the value may concentrate.

Cooling is not just a product category

Liquid cooling is another example where the first-order market narrative can be too simple.

The product story is easy.

Higher rack power creates more heat.

More heat requires liquid cooling.

Therefore, cooling suppliers benefit.

That is directionally right.

But it is not enough.

The harder questions are operational.

How reliable are the connectors?

How is leakage detected?

How is field service handled?

Who owns maintenance risk?

How does the cooling loop behave under partial load, failure, or maintenance?

How does the cooling architecture interact with facility design, rack layout, power density, and customer qualification?

A cold plate or CDU can be a product.

A deployed liquid-cooling system is an operating environment.

Those are not the same thing.

This is why I would separate product exposure from execution exposure.

The market may first reward the product label.

But long-term value may depend on reliability, serviceability, integration know-how, and the ability to support large-scale deployments without unacceptable field failure.

Again, the bottleneck moves from the visible object to the system that makes the object usable.

Power infrastructure is where AI becomes least abstract

Power is the least abstract layer of AI infrastructure.

A model can be discussed in parameters.

A GPU cluster can be discussed in tokens or FLOPS.

A data centre eventually has to be discussed in megawatts.

Once the conversation reaches power, software language stops being enough.

The constraints become physical.

Transformers.

Substations.

Switchgear.

Cables.

UPS systems.

Power distribution.

Grid interconnection.

Generation availability.

Utility planning.

Construction timelines.

This is why I think the AI infrastructure story is gradually moving from compute scarcity to delivered-power scarcity.

That does not mean chips stop mattering.

It means the system constraint expands.

A data centre full of accelerators is not useful if it cannot receive enough reliable power, remove enough heat, and connect enough network capacity.

At this layer, the relevant clock is slower.

The market can revise a capex narrative in one day.

A utility interconnection queue cannot always move on that same schedule.

That time mismatch is part of the bottleneck.

A practical due diligence framework

When I look at any AI infrastructure company, I now try to ask five questions.

The first is:

What kind of exposure is this?

Is it product exposure, process-control exposure, infrastructure exposure, or some combination of the three?

The second is:

What exactly becomes scarce?

Is it units, yield, qualification capacity, engineering talent, equipment availability, material supply, permitting, field-service capacity, or power connection?

The third is:

What is the clock speed of the constraint?

Can it be expanded in quarters, years, or decades?

The fourth is:

Where does failure show up?

Does the failure appear as device performance, package yield, field reliability, customer qualification delay, power unavailability, or margin compression?

The fifth is:

Who has the right to say no?

This is the most important question.

A true bottleneck is not merely a supplier that benefits from demand.

A true bottleneck is a layer where the system cannot move faster unless that layer works.

That is where pricing power, strategic importance, and durability can appear.

What this means for investors

For investors, the temptation is to sort companies into AI winners and non-AI companies.

I think that is too crude.

The better segmentation is:

Narrative exposure: the company is close to a market theme and may move when the theme is hot.

Revenue exposure: the company actually ships products into AI-related demand.

Bottleneck exposure: the company controls a constraint that limits how fast the system can scale.

These are not the same.

A company can have strong narrative exposure but weak bottleneck control.

A company can have visible revenue exposure but limited pricing power.

A company can have modest visibility but deep process-control importance.

The market usually finds narrative exposure first.

Then it looks for revenue exposure.

Only later does it understand bottleneck exposure.

That sequence creates both opportunity and risk.

It can overprice companies that are easy to narrate.

It can underprice companies that sit inside boring but necessary process or infrastructure layers.

So the question is not:

Which stock is an AI infrastructure stock?

The question is:

Which company controls a constraint that the AI buildout cannot bypass?

That is a much harder question.

But it is also the more useful one.

What this means for operators

For operators, the lesson is different.

The AI infrastructure stack is becoming too interconnected for single-module optimisation.

A packaging decision can affect substrate stress.

A substrate decision can affect signal integrity and reliability.

A networking architecture can affect optical requirements.

An optical integration decision can affect thermal budget and metrology.

A rack-power decision can affect cooling and facility design.

A cooling decision can affect reliability and service operations.

A data-centre site decision can affect power availability and deployment schedule.

The old boundaries are still useful for organising teams.

They are less useful for understanding where the system breaks.

This is why cross-module thinking is becoming more valuable.

The most important people in the AI infrastructure stack may not be the ones who know one module perfectly.

They may be the ones who can see how a small tolerance problem in one layer becomes a system-level constraint somewhere else.

What would make this framework wrong?

A useful framework should have failure conditions.

The first would be a broad AI capex slowdown.

If hyperscaler spending slows materially, many product-exposure layers would feel the impact quickly. In that case, the market may stop caring about downstream bottlenecks and return to a simpler demand-cycle view.

The second would be faster standardisation.

If certain AI infrastructure components become modular, commoditised, and interchangeable faster than expected, product exposure may matter less and margin pressure may arrive earlier.

The third would be process-window expansion.

If manufacturing learning, equipment improvement, design simplification, or architectural shifts make certain process-control problems easier, then the bottleneck may move elsewhere.

The fourth would be substitution.

If one architecture avoids a constraint that another architecture depends on, the bottleneck may not disappear; it may migrate.

The fifth would be policy or power-market changes.

Grid access, energy availability, permitting, and regional incentives can shift the infrastructure layer in ways that are not purely technical.

That is why I do not think this framework should be used as a static map.

It is a way to ask better questions as the bottleneck moves.

The boring questions are usually the useful ones

The more I study AI infrastructure, the more I think the least glamorous questions are often the most revealing.

What does the substrate yield map look like?

How long does material qualification take?

Where does warpage show up?

Which process step controls the defect Pareto?

What does the CMP distribution look like across the wafer?

How is void inspection done after bonding?

What happens when the cooling system fails in the field?

Who owns the service call?

How long is the transformer lead time?

Which site can actually receive power?

These questions are not as exciting as asking which model will win or which chip is fastest.

But they are closer to the physical truth of the AI buildout.

AI infrastructure is not one trade.

It is not one product category.

It is not one supply chain.

It is a sequence of bottlenecks that migrate as the system scales.

The market prices the visible layer first.

The edge is often in finding the layer beneath it.

Selected references and background sources

These references are not meant to be a complete bibliography. They are public background sources and technical starting points for readers who want to explore the layers discussed above.

Author note

I write about AI infrastructure, semiconductor manufacturing, and supply-chain bottlenecks from the perspective of process integration and value distribution. My goal is not to turn every technical constraint into an investment thesis. It is to understand where the system loses the ability to move faster. That usually requires looking below the most visible product layer. All views are my own.

Disclaimer

This article is for research and educational purposes only. It is not investment advice, financial advice, technical qualification advice, legal advice, or a recommendation to buy or sell any security.

The analysis is based on publicly available information, non-confidential technical context, and my personal analysis and opinions. It does not rely on proprietary, confidential, or material non-public information.

Any company, technology, or process mentioned may have internal capabilities, production data, or design choices that are not publicly disclosed. Actual production specifications may differ across vendors, process generations, and customer programs.

Errors or omissions are my own. Corrections with public references are welcome.

Disclosure

No company mentioned in this article has reviewed, sponsored, approved, or compensated me for this work.

I may currently hold, and may buy or sell, securities of companies mentioned in this article or in the broader AI infrastructure and semiconductor ecosystem at any time without further notice.

Nothing in this article constitutes investment advice, a financial recommendation, or a solicitation to buy or sell any security. Readers should conduct their own due diligence.

Hashtags

#AI Infrastructure#Semiconductors#Advanced Packaging#Data Centers#Power Infrastructure

Product exposure, process-control exposure, and the physical bottlenecks behind the AI capex wave#

The 60-second version#

What the “AI infrastructure trade” gets wrong#

1. Product exposure: easiest to see, easiest to over-narrate#

2. Process-control exposure: where the visible story becomes manufacturing reality#

ABF is a good example of the hidden layer#

CPO is another example: not optics alone, but a manufacturing window#

3. Infrastructure exposure: the slow clock#

Cooling is not just a product category#

Power infrastructure is where AI becomes least abstract#

A practical due diligence framework#

What this means for investors#

What this means for operators#

What would make this framework wrong?#

The boring questions are usually the useful ones#

Further reading/topic areas#

Selected references and background sources#

Author note#

Disclaimer#

Disclosure#

Hashtags#