# Introduction to Parton Distribution Functions

Joël Feltesse (2010), Scholarpedia, 5(11):10160. | doi:10.4249/scholarpedia.10160 | revision #91386 [link to/cite this article] |

The momentum distribution functions of the partons within the proton are called **Parton Distribution Functions**.

## Contents |

## Definition of PDFs

The **Parton** name was proposed by Richard Feynman in 1969 as a generic description for any particle constituent within the proton, neutron and other hadrons. These particles are referred today as **quarks** and **gluons**.

There are six types of quarks, known as flavours: up (u), down (d), charm (c), strange (s), top (t) and bottom (b). The antiparticles of quarks are antiquarks. Quarks have various intrinsic properties, including electric charge, spin, mass and colour charge.

Quarks carry a fractional electric charge of value, either -1/3 or +2/3 times the elementary charge (where the electron has -1 unit), depending on flavour. Up, charm and top quarks have a charge +2/3, while down, strange and bottom quarks have -1/3. The quarks which determine the quantum numbers of hadrons are called **constituent or valence quarks**. For example the proton is composed of two up quarks (referred below as u_{v} quark) and one down quark (referred below as d_{v} quark), and the neutron of two down quarks and one up quark.

Quarks are spin 1/2 particles. The spin direction is called **polarisation**.

Quarks possess a property called colour charge. There are three types of colour charge. Each quark carries a colour. The system of attraction and repulsion between coloured quarks is called strong interaction, which is mediated by force carrying particles known as gluons. Gluons, like the photons are massless, have a spin of 1 and no electric charge but carry colour charge. The theory that describes strong interactions is called **Quantum Chromodynamics (QCD).**

The three-quark model assuming
that a proton or a neutron is made of three free non-interacting quarks in a bag is too simple. It cannot match a scattering process like the inelastic scattering of electrons off protons. Those valence quarks are imbedded in a **sea** of virtual quark-antiquark pairs generated by the gluons which hold the quarks together in the proton. All of these particles - valence quarks, sea quarks and gluons- are partons.

The partonic structure of a nucleon is best probed in scattering processes like Deep Inelastic Scattering (**DIS**) of leptons (electrons, muons or neutrinos) off nucleons, where the lepton acts as a probe which transfers a four momentum of modulus **\(q\)** to the nucleon in the collision. The Nobel Prize was awarded to Jerome Friedman, Henry Kendall and Richard Taylor in 1990 for their
pioneering electron-proton DIS experiment at SLAC in 1966 which first provided evidence for a partonic structure of the nucleon.

In DIS the resolving power of the probe is approximately \(\hbar/q\) and so the level of structure revealed increases with \(q\ .\)
For \( q = 100 \; GeV\ ,\) the resolution is roughly \( 0.02 \; fm \ ,\) sufficient to probe the internal structure of the nucleon.
It is convenient to consider a frame in which the target nucleon has a very large momentum. In such a frame the momentum of the parton is almost collinear with the nucleon momentum, so that the target can be seen as a stream of partons, each carrying a fraction **\(x\)** of the longitudinal momentum. The momentum distribution functions of the partons within the proton are simply called **Parton Distribution Functions** (**PDFs**) when the spin direction of the partons is not considered. They represent the **probability densities** (strictly speaking they rather represent number densities as they are normalised to the number of partons) to find a parton carrying a momentum fraction \(x\) at a squared energy scale **\(Q^2\)** (\(=-q^2\)). DIS experiments have shown that the number of partons goes up at low \(x\) with \(Q^2\ ,\) and falls at high \(x\ .\) At low \(Q^2\) the three valence quarks become more and more dominant in the nucleon. At high \(Q^2\)
there are more and more quark-antiquark pairs which carry a low momentum fraction \(x\ .\) They constitute the sea quarks. A salient finding of the DIS experiments is that the quarks and antiquarks only carry about half of the nucleon momentum, the remainder being carried by the gluons. The fraction carried by gluons increases with increasing \(Q^2\ .\)

The central feature of QCD is the **asymptotic freedom** discovered in 1973 by David Gross, David Politzer and Frank Wilczek (Nobel Prize in 2004). It implies that interactions between partons within a nucleon becomes arbitrarily weak at shorter distances. QCD gives quantitative predictions about the rate of change of parton distributions when the \(Q^2\)
energy scale varies. It is governed by the QCD evolution equations for parton densities from (Gribov and Lipatov 1972), (Altarelli and Parisi 1977) and (Dokshitzer 1977) (**DGLAP**) in the domain where perturbative calculations can be applied, that is in the limit where the running coupling constant of \(\alpha_s(Q^2)\) of QCD is much smaller than one (\(\alpha_s(Q^2)\ll 1 \)).
The equations have been formulated at different level of approximations, relative to different power of \(\alpha_s(Q^2)\) in the perturbative development, usually named as Leading-Order (**LO**), i.e.
first order in \(\alpha_s(Q^2)\ ,\) Next-to-Leading-Order (**NLO**) and Next-to-Next–Leading-Order (**NNLO**). In the following we will consider Parton Distributions Functions obtained with evolution equations at the most widely used order.

The DGLAP differential equations give the \(Q^2\) dependence but cannot make a definitive prediction of the \(x\) dependence of the parton distributions at a given \(Q^2\ .\) It has to be extracted from the data. The parton distributions are related to the observable cross sections by the QCD factorisation theorems (see for example (Collins, Soper and Sterman 2009)). The cross section of a hard process can be written as a calculable parton interaction convoluted with the parton densities. The factorisation theorems are the whole basis to extract the PDFs from some processes and to apply perturbative calculations to many important processes involving hadrons. In DIS it reads
\[
\sigma(x,Q^2) \approx \Sigma_{a}\; C_a \otimes f_{a/A}(x,Q^2) + remainder
\]
Here \( C_a \) is the calculable part and \( f_{a/A} \) is the parton distribution of parton \( a \) in a hadron of type \( A \ .\) The sum is over all type of partons, \( a \ .\) It is conventional to call the first term on the right of the above equation the leading twist contribution. The remainder is called the higher twist correction. It is formally of order \( 1/Q^2 \) but not precisely known. The correction is often neglected in extracting PDFs from the cross sections. The convolution of the \( C_a \) coefficient functions with the parton distributions is not uniquely defined at NLO. Usually the \( C_a \) coefficient functions and the parton distributions are written in the Modified Minimal Subtraction Scheme of factorisation called **MS-bar** (see MS-bar definition of parton distribution functions).

## Method of determinations

PDFs sets are obtained by a fit on a large number of cross section data points in a large grid of \(Q^2\) and \(x\) values from many experiments. The most commonly used procedure consists of parameterising the dependence of the parton distributions (quarks, antiquarks, gluon) on the variable \(x\) at some low value of \(Q^2=Q^2_0\ ,\) which is large enough that the unknown terms of the perturbative equations are assumed to be negligible, and evolving these input distributions up in \(Q^2\) through the DGLAP equations. The number of unknown parameters is typically between 10 and 30. The factorisation theorems allow to derive predictions for the cross sections.. These predictions are then fitted to as much of the experimental data together as possible, to determine the parameters and to provide parton distributions.

## Examples of PDFs

An overview of parton distributions in the proton is shown in the figures below at two scales \(Q=2 \; GeV\) ( Figure 1) and \(Q=100 \; GeV\) ( Figure 2).

As naively expected, at small \(Q^2\) and large \(x\) values above 0.1, the u quarks are the dominant distributions, more than twice as large as the d quarks at high \(x\) and much larger than the heavy quarks. At low \(x\) value the sea is not flavour symmetric. There are significantly less strange quarks than up and down quarks. The charm density is null below the charm threshold (\( m_c = Q \approx 1.5 \; GeV\)) and increases slowly as energy increases. At higher \(Q^2\) ( Figure 2) the shape of the quark and gluon distributions changes quickly at very low \(x\ .\) The sea becomes more flavour symmetric, since at low \(x\) the evolution is flavour-independent, and there are more and more sea quarks and gluons. The rise of the parton densities at low \(x\) and high \(Q^2\) values is a foundational prediction of QCD (DeRujula et al., 1974) which was clearly verified at the HERA electron-proton collider at DESY in 1993.

There is however not a unique set of Parton Distribution Functions commonly accepted. There are several groups in competition to provide the best parametrisation of parton distributions. The groups do not use the same input data. They differ mainly in the way the PDFs are parameterised, in the treatment of heavy quarks and in the value of the coupling constant \(\alpha_s\) as well in the way the experimental errors are treated and the theoretical errors are estimated.

## Data sets in fit

The very extensive and precise DIS data from fixed-target lepton-nucleon scattering experiments at SLAC, FNAL, CERN and from the electron-proton HERA collider at DESY provide the backbone of parton distribution analysis. The lepton-nucleon data include electron, muon and neutrino DIS measurements on hydrogen, deuterium and nuclear targets. DIS data however are insufficient to determine accurately flavour decomposition of the quark and antiquark sea or the gluon distribution at large \( x \ .\) In inclusive DIS, the gluon is only probed via the rate of evolution. The additional physical processes which are used in the fits are :

- The single jet inclusive production in nucleon-nucleon interactions, selecting jets with large transverse energy; this quantity is dependent on the gluon distribution.
- Dilepton production in the virtual photon Drell-Yan process \( p N \rightarrow \mu^+ \mu^- + X \ ,\) which is a probe of the sea quark distribution.
- Electroweak Z and W boson production \( p \bar p \rightarrow W^+ (W^-) + X \) at the Tevatron collider which is sensitive to the up and down quark and antiquark distributions.

The determination of the most global fits like CTEQ6.6 (Nadolsky et al. 2008), MSTW08 (Martin et al. 2008), GJR08 (Glück et al. 2008) and more recently NNPDF2.0 (Ball et al. 2010) are based on data from DIS and proton-nucleon fixed target experiments as well as results from the HERA and Tevatron colliders. The four groups do not fit exactly the same data sets, e.g. GJR has no W and Z production data. ABKM09 (Alekhin et al. 2009) fit combines DIS and fixed target Drell-Yan data. HERAPDF1.0 (Aaron et al. 2010) fit uses only HERA data.

## Treatment of experimental errors and estimation of uncertainties

The experimental errors are divided into two classes: those which are due to the limited statistic of the measurement and those which are due to systematic effect, like for example wrong energy calibration. Modern DIS experiments have very small statistical uncertainties so that the contribution of systematic uncertainties becomes dominant. Many systematics are correlated between the various data points used in the fit. Almost all up-to-date analyses include a full treatment of all correlated experimental errors. Two main methods have been used to propagate uncertainties on the fitted data points to the PDF uncertainties: the Hessian method which is based on linear error propagation through the covariance error matrix and the Monte-Carlo sampling method which has been used in conjunction with neural networks. In both methods the global accuracy of the fit is defined as a \( \chi^2 \) summed on all data sets. In fits not including correlated errors, one would minimize a simple \(\chi^2\) function defined as \[ \chi^2=\Sigma_{expt}\;\Sigma_{i=1}^{N_e}\;(D_i-T_i)^2/\sigma_i^2 \] where \(N_e\) is the number of cross section data points in experiment \(e, D_i\) is a data value, \(T_i\) is the corresponding theoretical value of the cross section calculated from the parameterised parton distributions function and \(\sigma_i\) is the experimental errors. Ideally we would expect the errors to be given by the choice of tolerance \(\Delta \chi^2 = 1 \) for the 68 % (one-sigma) confidence level limit and \(\Delta \chi^2 = 2.71 \) for the 90 % (two-sigma) confidence level limit, which are the well grounded statistical expectations. This is appropriate when fitting consistent data sets with well-defined theory like in HERAPDF analysis where only DIS data from HERA experiments have been used. In global fits from CTEQ and MSTW analyses which include DIS and hadron-hadron data, the fitting procedure tends to provide an unrealistically small uncertainty. There are discrepancies much larger than the error band between the fitted cross section and individual experiments. This is likely due to some failure of the theoretical and model approximation to work properly over the whole range of the data and may be due to some sources of experimental data errors not being properly quantified. Instead a much weaker “hypothesis-testing” criterion has been appealed. The tolerance \(\Delta \chi^2 \) is determined from the criterion that each individual data set should be described within its 68 (or 90) % limit. In MSTW, one of the most commonly used PDF extraction, the tolerances are tuned for each data set. They are in the region \(\Delta \chi^2 \leq 50 \) for a 90 % confidence limit. In CTEQ, the other most commonly used global analysis, the 90 % confidence limit errors correspond to the choice \(\Delta \chi^2 = 100 \) which is an average over all data sets. Recently, the NNPDF global analysis based on an input distribution parameterised with a neural-network as a set of very flexible parton distribution functions with about 200 parameters and using the same input data as CTEQ and NNPDF, has obtained an error band with very good consistency of all data sets with statistical expectations. This may indicate that the large tolerances in the CTEQ and MSTW analyses is partly due to the lack of flexibility of the parameterisations and that the large tolerances do not only reflect the experimental errors of the input data but also some model and theory uncertainties.

## Model uncertainties

The model uncertainties are related to the assumptions made in PDF extraction. There is not a unique way to estimate the model uncertainties. They may contribute implicitly to the large tolerances as defined by CTEQ and MSTW. Often the groups prefer to illustrate the model uncertainties as a variant of PDF sets rather than including them as errors added to the experimental errors. A few examples of uncertainties are due to:

- Input distributions

The usual choice of the parametric form of the input distributions is arbitrary. Varying the analytic form of the input distribution can be quite sizeable and even dominant over all other errors (see Aaron et al., 2010). The error is estimated by comparing various parameterisations which give a good description the DIS data and have the smallest number of parameters. It is however difficult to fully quantify this uncertainty as far as the analytic shape of the parton distributions is not known. The GJR group has used a dynamical model to limit the uncertainty from this source. In the model the evolution of the input distributions starts at very low \( Q^2\; (Q^2_0\approx 0.4\;GeV^2)\) where the nucleon only consists of valence quarks. Sea quarks and gluons are generated through the DGLAP equations. It provides a much narrower uncertainty at small \( x \) but this seems a very low scale to use DGLAP equations. All other groups have preferred not to use this additional physical assumption. The opposite approach is the very flexible parameterisation of the neural network (NNPDF) analysis. It has some ambiguity from the procedure to determine the central values and uncertainties and it provides larger errors but it is free of debatable physical assumptions.

- Flavour symmetry

There is no reason to assume that at low \(x\) the sea is flavour symmetric. It has been commonly assumed that \(s= \bar s= (\bar u+ \bar d)/4 \) at the input scale \( Q_0^2\ .\) The suppression factor of the strange sea is due to its larger mass. More sophisticated assumptions, not assuming that the strange sea has the same shape as the up and down quarks at high \(x\ ,\) have also been used (CTEQ and MSTW). DIS and Drell-Yan experiments have shown that the density of anti-quarks \( \bar u \) and \(\bar d\) quarks are not equal at \(x\) above 0.01. There is however not a unique description of the difference \( \bar u- \bar d\ .\) NNPDF has a very flexible parametrisation of the strange sea at small \( x \ .\)

- Heavy quark treatment

It is usually assumed that all heavy quarks (charm, beauty and top) are radiatively generated from the gluon and light quarks by QCD evolution to large \(Q^2\) starting from a null distribution below an energy threshold at approximately the relevant quark mass. It has become clear in recent years that the treatment of the heavy quark (charm and beauty) threshold is a delicate issue. Several heavy quarks treatments (commonly called schemes) have been considered. In all schemes the choice of the heavy quark masses is arbitrary. So far, the scheme dependence is not systematically taken into account in the fits. See for example the detailed discussion in MSTW08 (Martin et al. 2008). It has little impact on the valence quark distributions, but affects directly the gluon and heavy quark distributions.

- Coupling constant \( \alpha_s(Q^2) \)

A fundamental uncertainty is the value of the QCD coupling constant, \( \alpha_s(Q^2) \ .\) The PDF fits are sensitive to the value of \( \alpha_s \) and can be used as a mean of determining its value. Much more information on the value of \( \alpha_s \) than present in the data sets used in the PDF fits exists, for example from the precision measurements at LEP (Bethke 2009). Some of the groups have chosen to use a fixed value in the fit, a kind of world average value, but have not yet agreed on a common value and on its uncertainty. Other groups prefer to consider \( \alpha_s \) as an additional free parameter of the fit. The value of \( \alpha_s\) has a strong impact on the gluon distribution. A figure of the values of \( \alpha_s(M^2_{Z^0}) \ ,\) at the energy scale of the \(Z^0\) mass and its uncertainty, used in the most up to date fits is shown in Figure 3.

## Theory uncertainties

The genuine theory uncertainties are due to the terms used to write the cross sections and to the truncation of the DGLAP perturbative expansion to formulate the evolution equations and the perturbative expansion of the coefficient functions. Theory uncertainties are commonly not included in the published uncertainties because they are difficult to quantify a priori until a better calculation or prescription is available. Although NLO QCD fits give a good description of the data down to \(Q^2\) values in the range \( 1-3 \; GeV^2\ ,\) such fits neglect higher-order QCD terms in power of \( \alpha_s \ ,\) including enhanced \( \log (1/x) \) or \( \log(1-x) \) terms and other higher twist corrections. Rigorous calculations to NNLO are now available in DIS but they are usually used as a variant, since they have not been fully worked out for all non-DIS processes.

## Results

Since about four decades ago a lot of effort has been put in measuring DIS processes and then in extracting PDFs from the data. This effort is still ongoing. It is remarkable that there is a broad agreement of all PDFs and uncertainties despite many differences in input data, methods of analysis and model assumptions. There are however discrepancies in predictions of the PDF groups which can be locally bigger than the uncertainty of each. A working group to understand the commonalities and differences between the predictions and uncertainties of the PDF groups has been set up at CERN (see PDF4LHC [4]). A benchmarking exercise was carried out to evaluate important cross sections including well established processes as W, Z and top cross sections which will be measured in the coming months at the LHC. This exercise was very instructive. For example let consider two production processes at 7 TeV energy in the centre of mass of proton-proton collision:

- Z production is mainly sensitive to the quark and antiquark densities in the proton. The impact from the variation of the value of \(\alpha_s\) is relatively small ( Figure 4). The overall spread of the central value of the predictions is about nine percent and is reduced to an impressive four percent if we only consider the most global fits (CTEQ, MSTW, NNPDF).
- Production of a \( t\bar t \) pair is mainly sensitive to the gluon density. It is also very sensitive the value of \(\alpha_s\) ( Figure 5). The overall spread of the central values is about eighteen percent and is of eight percent for the three most global fits. Clearly the agreement could be even better if the cross sections were evaluated at a common value of \(\alpha_s\ .\)

## Applications and prospects

In the near future new data from the Tevatron and final data from HERA, along with model and theoretical development, will allow for more precise determinations of PDFs and a better understanding of their uncertainties. Further standardizations are planned for future updates. The main application of the new parton distribution functions is at present to make predictions for physical cross sections at the proton-proton collider LHC, such as the Higgs boson production or the search for new physics in jets production at large transverse momentum. Conversely, measurements from the LHC should improve knowledge of parton distributions in a large part of the \( x \) range. Further contributions could be brought by the Relativistic Heavy Ion Collider RHIC at Brookhaven. In a few years DIS measurements at JLAB 12 GeV should bring new important constraints at large \(x\) value. More future DIS machines like LHeC at CERN and EIC in the US are being studied.

## References

- F.D. Aaron et al.,[H1 and ZEUS Collaborations], JHEP 01 (2010) 109, [arXiv:0911.0884[hep-ph]].
- S. Alekhin, J. Blümlein, S. Klein and S. Moch, Phys. Rev. D81, 014032 (2010), [arXiv:0908.2766[hep-ph]].
- G. Altarelli and G. Parisi, Nucl. Phys. B126, 298 (1977).
- D. Ball et al., [The NNPDF Collaboration],[arXiv:1002.4407[hep-ph]].
- S. Bethke, Eur. Phys. J. C64, 689 (2009), [arXiv:0908.1135[hep-ph]].
- J.C. Collins, D.E. Soper and G. Sterman,[arXiv:0409.313[hep-ph]].
- A. DeRujula et al., Phys. Rev. D10, 1649 (1974).
- Yu. L. Dokshitzer, Sov. Phys. JETP 46, 641 (1977).
- M. Glück, P. Jimenez-Delgado and E. Reya, Eur. Phys. J. C53, 355 (2008), [arXiv:0709.0614[hep-ph]].
- V.N. Gribov and L.N. Lipatov, Sov. J. Nucl. Phys. 15, 438 (1972).
- A.D. Martin, W.J. Stirling, R.S. Thorne and G.Watt, Eur. Phys. J. C63, 189 (2009), [arXiv:0901.0002[hep-ph]].
- P.M. Nadolsky et al., Phys. Rev. D78, 013004 (2008), [arXiv:0802.0007[hep-ph]].
- J. Pumplin et al., JHEP 0207 (2002) 012, [arXiv:0201195v3 [hep-ph].

## External links

## See also

Bjorken scaling, Asymptotic freedom, Longitudinal polarization functions, Generalized parton distributions, Transverse polarization functions