CMU physicist challenges what we know about particle physics with machine learning

September 22, 2025

6 minutes

By Mackenzie Devereux

The Standard Model, depicting the fundamental building blocks of the universe, including top quarks. **Courtesy of Cush via Wikimedia Commons**

Sitong An, a physicist at Carnegie Mellon University, has made a groundbreaking discovery that could change how we understand the universe. Using revolutionary machine learning techniques, he has found evidence that four top quarks (4TQs), one of the rarest and heaviest subatomic particles, can be produced simultaneously at rates far higher than current physics models predict. His research not only points to the discovery of new particles or forces, but also could signal a fundamental shift in how we explore and interpret reality.

Out of the six members of the quark family, the top quark is the heaviest and rarest by far. Imagine squeezing an entire atom of gold into a point roughly 300 trillion times smaller than its original size, and you’d roughly get a top quark. With more mass, they require more energy to produce and are most sensitive to decaying into other, smaller particles. Additionally, the mass of these quarks makes them most sensitive to ‘new physics,’ physical processes that are happening but that we don’t yet know about or understand. The size of the top quark makes it the most likely in the quark family to interact with these mystery particles and forces. That’s why we care so much about four top quark production: this process opens a window to probe at an unexplored area of the universe like no other particle can.

These particles — and the forces that govern them — are best described by the Standard Model (SM), the foundation of modern particle physics. The culmination of centuries of work by countless physicists, the SM is currently our best framework to understand how particles smaller than atoms act as the building blocks of the universe. The SM describes 17 subatomic particles, including the top quark and its sibling, the anti-top quark. While the SM has been remarkably successful in describing particles and predicting their interactions, it also has limitations, especially when it comes to rare events like 4TQ production.

The production of 4TQs is significantly rarer than the production of a single top-quark pair, happening only once in one trillion particle collisions. Why? Because rather than only one pair of top quarks being produced, four top quarks are being created (two quark/anti-quark pairs total).

The SM makes statistical predictions on how often we should see production of this special quartet based on what we know about proton annihilation and how other particles interact. Thus, when experimental results differ significantly from what the SM predicts, many scientists interpret this as a clue that there may be more to physics than what we currently understand, and that the SM is no longer an accurate representation of the whole physical picture.

So, if these particles are so important to our understanding of physics and beyond, why haven’t we already figured them out? That’s because looking for 4TQs in our currently available data is like looking for four needles in a haystack made of needles, i.e. extremely difficult.

Most of the data that physicists rely on to study particle collisions, including 4TQ production, comes from the Large Hadron Collider (LHC), the world’s most powerful particle accelerator. Since becoming operational in 2008, the LHC has produced petabytes of data by accelerating protons and their partners (uncreatively named anti-protons) to 99.9999991 percent of the speed of light, or approximately 299,792,458 meters per second. But with such a vast dataset, 4TQ events remained elusive.

Because top quarks decay almost instantly, we don’t observe them directly. Instead, the way we look for them is by tracing the particles that have come from their decay, called jets. However, the production of all these jets blends into the background noise of our current data collection capabilities. This makes finding and distinguishing the already rare event signals incredibly more challenging.

This may seem like a dead end for our study of 4TQs — and it was. After all, how can we study something we can’t even detect? That was, until Sitong An recently published his Ph. D. dissertation in Experimental High Energy Physics at Carnegie Mellon University on this subject. An’s solution to the impasse? Bridge machine learning and particle physics to go where we’ve never gone before. Machine learning and decision-making models had previously been used to successfully identify the production of single top quarks; however, An adapted this strategy to specifically identify the 4TQ signal from all of the background noise. This novel and powerful approach also allowed An to predict not only the occurrence of background events, but also the shape of their distributions, giving him a more precise way to search for noise and distinguish it from real quark production. Through a multi-phase training and implementation process, An expanded the applications of AI in physics to solve this nearly impossible problem.

So, after all of this work, was he successfully able to detect 4TQs? In fact, he was able to do much more than that. Using his machine learning strategies, An found the 4TQ signal in Runs 2 and 3 of the LHC — two major data collection periods spanning from 2015 to 2018 and from 2022 onward.

During these runs, An found that the 4QT production occurred at a rate 2.7 times higher than the SM prediction, signifying something else is happening in this process that can’t be explained with our current model. But how do we know this isn’t just a fluke measurement? To measure significance, physicists use this special scale called the sigma (σ) scale. Scoring a 5 on this scale is like scoring 1600 on the SATs; you’ve hit a jackpot and are officially allowed to declare a discovery. The reason physicists can make such lofty claims at 5σ is because it represents only a 1 in 3.5 million chance of the result coming from random variation or statistical noise. 4σ means you can claim you have strong evidence, but not quite enough to claim a discovery, with a 1 in 16,000 chance your result is a fluke. An’s sigma value was 3.95σ, which is extremely close to the discovery threshold. Especially when compared to the predicted value of 1.65σ, An’s results point strongly to something big.

What is happening beyond the SM that could cause these significant deviations of experimental results from the predicted values? In short, physicists haven’t come to a widespread agreement about what lies beyond the current theories. Some physicists hypothesize that there is an undiscovered particle lurking below our detection. While this would contradict the foundations of particle physics, string theory, and how we understand the universe, an increasing number of physicists are suggesting this mystery particle might actually exist.

The discovery of 4TQ production at a rate far exceeding predictions isn’t just a statistical anomaly; it’s a deep clue that points towards a new chapter in particle physics. An’s discoveries not only allow us to better detect and study four top-quark production, but have also exemplified how artificial intelligence is reshaping the way we conduct science. The growing prevalence of medical imaging, renewable energy, drug discovery, and now particle physics is just the beginning of applying machine learning to further scientific discovery. So, what’s next for quarks and particle physics? Nobody knows. Is there a mysterious phantom particle we have yet to discover, an unknown force interaction, or something beyond what we can imagine? Thanks to physicists like An, we just might find out.

OC

CMU physicist challenges what we know about particle physics with machine learning

CMU physicist challenges what we know about particle physics with machine learning

关于《CMU physicist challenges what we know about particle physics with machine learning》的评论

发表评论

摘要

相关新闻

相关讨论