Äîêóìåíò âçÿò èç êýøà ïîèñêîâîé ìàøèíû. Àäðåñ îðèãèíàëüíîãî äîêóìåíòà : http://top.sinp.msu.ru/lev/d0note_3612.ps
Äàòà èçìåíåíèÿ: Mon Oct 15 12:47:50 2001
Äàòà èíäåêñèðîâàíèÿ: Sat Feb 2 21:40:25 2013
Êîäèðîâêà:
Dü Note 3612
Singularities of Feynman Diagrams and
Optimal Kinematic Variables for Neural
Networks.
E. E. Boos and L. V. Dudko
Skobeltsyn Institute of Nuclear Physics, Moscow State University
119899 Moscow, Russian Federation
boos@theory.npi.msu.su, dudko@npi.msu.su
Abstract
Neural Network (NN) are often used in experimental and phenomenological analysis
of a signal and its backgrounds in high energy physics. Making the best choice of
kinematic variables is one of the main problems in using these networks. In this
note, we propose a step towards solving the problem, based on an analysis of the
singularities in the Feynman diagrams which contribute to the signal and backgrounds.
As an example, we present the NN results for the simplest process of single top quark
production at the Tevatron, u ¯
d ! t ¯ b with t ! W + b, and for the corresponding W+2
jets background.

1 The basic idea
The Neural Networks method is becoming more and more widely used in various data anal­
yses, where it is trained on Monte Carlo (MC) events and then used on data [1],[2],[3]. The
discrimination between a signal and its corresponding backgrounds by the NN is especially
remarkable when the data statistics are limited [4].
One of the main questions which has arisen in the use of NNs for high energy physics
searches for rare processes is which, and how many variables should be chosen for network
training [5] in order to extract a signal from the backgrounds in an optimal way. The
general problem is rather complicated and finding a solution depends on having a concrete
process for making the choice, because usually it takes a lot of time to compare results from
different sets of variables.
One observation which helps in making the best choice of the most sensitive variables is
to study the singularities in the Feynman diagrams of the processes. In fact, mapping and
smoothing of singularities are very important steps for any MC calculation to get stable
results, especially when many particles are involved. These procedures [7] are applied in
the CompHEP package [6] which we use in our calculations. However, one might consider
singularities from a different point of view, namely for discrimination between a signal and
its backgrounds. Let us call those kinematic variables in which singularities occur ''singular
variables''. One can compare the positions of these singularities and their corresponding
singular variables in the Feynman diagrams which contribute to the signal process under
consideration, and to both reducible and irreducible backgrounds. What is important to
stress here is that most of the rates for both the signal and for the backgrounds come from
the integration over the phase space region close to these singularities. Because of that, it is
obvious that if some of the singularities are different for the signal and for the backgrounds,
than we may expect that the distributions for the corresponding singular variables will
differ most strongly. Therefore, if one uses all such singular variables in one's analysis, then
the largest part of the phase space where the signal and backgrounds differ most will be
taken into account. Such a set of variables will be the most sensitive set with which to
discriminate between signal and background.
One might think that it is not a simple task to list all the singular variables when
the phase space is very complex, for instance for reactions with many particles involved.
However, in general, all the many singular variables can be of only two types, either s­
channel or t­channel. The s­channel variables are the invariant masses of pairs or clusters
of final particles f1 and f2, where:
M 2
f1;f2 = (p f1 + p f2 ) 2 (1)
here p f1 and p f2 are the four momenta of f1 and f2. The t­channel variables are the
momenta transferred between the initial particle (parton) and a final particle or cluster of
particles:
“ t i;f = (p f \Gamma p i ) 2 (2)
1

where p f and p i are the momenta of the final particle (or cluster) and the initial parton. The
position of the singularity in each of the singular variables is determined by the denominator
of the corresponding Feynman propagator.
If, in a real experiment we could directly measure the singular variables M 2
f1;f2 and “ t i;f ,
than choosing the set of these variables for the NN analysis would be the most sensitive
method for signal/background discrimination. However, in many cases we are not able to
measure these invariant variables directly, due to, say, the presence of the initial partons
energy spectrum or to neutrino production in the final state. In fact, this situation is very
common at hadron colliders. In this case, it is better to use some other variable closely
related to the corresponding invariant singular variable. For example, in the production
of any light particle (or jet) `f ', the momentum transfer is expressed in terms of p f
T and
rapidity y f , and:

t i;f = \Gamma
p
“ se Y p f
T e \Gammajy f j ; (3)
where “ s is the total invariant mass of the system produced, and Y is the total system
rapidity (rapidity of the parton's center of mass).
It is clear from the above equation that there is a singularity at zero in the “ t i;f variable,
which can occur if either p f
T goes to zero, or if y f goes to \Sigma1.
To conclude this section we make a general recommendation. In order to discriminate
between a signal and its backgrounds, one should choose those singular variables which are
different for the signal and backgrounds. These variables will be the most efficient ones for
a NN analysis. For a standard analysis using cuts on variables, one should cut hard against
the regions of singularities in the background singular variables while keeping the regions
with singularities for the signal singular variables (see e.g. [8]).
Of course, analyzing singularities in the denominators of Feynman diagrams does not
produce a complete list of sensitive variables. Other variables such as spin correlations,
could be related to the numerators of the diagrams, or, like “
s or the rapidity of a parton
in its center of mass frame, to different thresholds for the signal and backgrounds. If it
is possible to find such variables for the particular process under consideration, then they
should also be added into the NN set of variables.
2 Single top in W \Lambda mode as an example
In this section, we demonstrate how the above method works with an example. Let us
consider the simplest single top quark production process at the Tevatron:
p¯p ! t ¯ b +X (4)
with the subsequent decay of the top quark to a W boson and a b quark, and the W
decaying to a lepton and neutrino. 1 One of the main backgrounds to this process is:
p¯p ! W jj +X (5)
1 Single top production has been intensively studied in the past (see e.g. [11])
2

where j denotes a jet from fragmentation of a light quark, or gluon. Typical Feynman
diagrams for one of the background subprocesses and for the single top signal are shown in
Fig. 1.
u g
g1
g2
u
¯
d
W+
1.1
u g1
u
g2
u
¯
d W+
1.2
d
¯
d
g
d
d W+
¯
u
1.3
u
¯
d
W+
¯ b
t W+
b
2.1
Figure 1: The Typical Feynman diagrams for the subprocesses for the production of a
W boson with two light jets (1.1 ,1.2, 1.3) and the single top signal in s­channel mode
(2.1).
As explained in the previous section, one should compare the singularities in the dia­
grams for background and for signal. There are two singularities in the first diagram (1.1):
M 2
g1;g2 = (p g1 + p g2 ) 2 ! 0 (6)
“ t u;(g1g2) = (p g1 + p g2 \Gamma p u ) 2 ! 0 (7)
In the second diagram (1.2) there are three singularities, but one of them ( “
t u;(g1g2) ) is the
same as for the first diagram:
“ t u;g1 = (p g1 \Gamma p u ) 2 ! 0 (8)
“ t u;g2 = (p g2 \Gamma p u ) 2 ! 0 (9)
“ t u;(g1g2) = (p g1 + p g2 \Gamma p u ) 2 ! 0 (10)
The signal diagram (2.1) has only one singularity, at the top quark mass:
M 2
t = (p b + p W ) 2 ! m 2
t (11)
All of the above singular variables are candidates for inclusion in a NN analysis. Also here
it is obvious that the single top production threshold starts at much higher “ s than the
background, and therefore the variable
“ s = (p g1 + p g2 + p W ) 2 (12)
3

discriminates between the signal and background and should be added to the NN. One can
also add the rapidity of the system produced (rapidity of a parton center of mass) which is
also significantly different for a signal and background because in a region close to the top
production threshold the parton center of mass is practically at rest while for the signal it
has a significant boost for the background.
The above variables are at the parton level, however, and we need to modify them
for a real experimental situation. The processes under consideration have been calculated
using CompHEP [6] at the parton level, then decayed and processed with PYTHIA [9]
in order to include initial­state and final­state radiation, and to fragment the final state
partons into jets, for the NN training we use JETNET package [10]. Detector smearing
of the jet energies has been included in our model. The M 2
g1;g2 parton variable tranforms
into two actually observed variables, namely, the jet width w jet and the invariant mass of
the two most­energetic jets. The variables M t and “
s are reconstructed using the measured
missing transverse energy, which corresponds to the p T of the neutrino, and the longitudinal
momentum of the neutrino chosen as the smallest absolute value solution of the equation
M 2
W = (p š + p lepton ) 2 . Each of the singular momentum transfer variables t i1 ; t i2 ; t i12 are
represented by two variables P f
T and y f as shown in Equation 2.
Finally, the set of singular variables which we call ''basic set'' is:
Set1 : M j1;j2 ; M t ; “ s; Y tot ; P T j1 ; y j1 ; P T j2 ; y j2 ; P T j12 ; Y j12
where Y tot is the total rapidity of the center of mass of the initial partons reconstructed from
the final state particles with the use of the reconstructed neutrino momentum as mentioned
above. The result of NN training using these variables is shown in Fig. 2. The line width
is proportional to the weight w ij between two nodes and the circle diameter is proportional
to the threshold to the next layer. The line thicknesses and node thresholds correspond
the significance of each input node. In order to show how good our choice of kinematic
variables is, we construct two other, simpler sets of variables for the NN training which are
often used, and compare the results.
The first set of variables is:
Set2 : P T j1 ; P T j2 ; H all ; H T all
Here H all =
P E f , and H T all =
P P Tf , where the sums are ever all final­state particles
For the next set, we include one more signal variable, M t :
Set3 : P T j1 ; P T j2 ; H all ; H T all ; M t
We chose the standard training parameter:
ü 2 =
1
N test
N test
X
i=1
(d i \Gamma o i ) 2 (13)
4

to determine whether our NN training has led to an improvement in its discrimination.
Where N t is the number of test patterns, d i is the desired NN output (1 for the signal and
0 for background), and o i is the real NN output.
Using actual this criterion, we can check each set of input variables described above.
The results for the three input sets are shown in Fig. 3. The best network has the lowest ü 2 ,
because the real output from such a network is closer to the desired output ( for example
NN(signal) = 1 and NN(background)=0 ). From this plot, one can see that the ü 2 for the
complete Set 1 of optimal singular variables is lower then for the others, and therefore the
corresponding NN is a better analysis tool.
We tried to check the Set 1 for completeness by adding a few more standard kinematical
variables to it, and then looking to see if there was any improvement. We aded the scalar
sum of the final particles energy H all , and the scalar sum of it transvers energy H T all and
we call this set as the Set 4. Two upper curve on the Fig. 4 shows what happens: the
NN gets worse relative to the middle curve for the original network without these two
additional variables. This means that the additional kinematical variables do not add a
nontrivial additional kinematical information but the number of degrees was increased and
therefore the total ü 2 became worse. But we can still try to find other possible variables
which contain other , nonkinematic information which will be helpful for separating signal
from background. It could be that some topological variables will reflect the numerators of
the Feynman diagrams, or maybe some other kind of variable. In our case, where signal is
single top quark production, we have a different efficiency to reconstruct a tagging muon
in a jet from a b decay and for the misidentification of a light jet as a b by muon in the
jet. In fact in such a way the NN method in some sense plays a role of the b­taging. We
have modeled the fragmentation, and have realistic detector smearing for our jets in the
MC, and so we can use this information in the NN. We introduce the information into the
NN by using the transverse momentum of the tagging muon, P (
T tag ¯), which is zero for
untagged events. In addition, we include one more useful variable, the width of of the two
highest E T jets. The final set of variables is the Set 1 together three additional variables:
Set5 : Set1 + P tag ¯
T ; w jet1 ; w jet2
The ü 2 curve for this final set is shown in Fig 4. It is the lowest curve of the three. Therefore
we can choose this set of variables for our analysis. Finally, we present a comparision of
the outputs from the two networks, trained with two different input vectors. Fig 5 shows
the NN output distributions for variable Set 2 (top plot) and for the final variable Set 5
(bottom plot). We can see that there is very good improvement for the signal/background
separation between these two networks.
3 Conclusions
We have demonstrated a method which can help in choosing an optimal set of input variables
for use with Neural Network analysis in high energy physics. The method is rather general
5

and could be applyed for any type of backgrounds, reducible or irreducible. The complete
set of kinematic input variables corresponds to the singularities in the Feinman diagrams.
Therefore, the minimal set of input variables which reflects the differences between signal
and background kinematics is equal to or greater than the number of different singularities
in the phase space of the signal and background processes. We note that when we increase
the number of input nodes of neural networks, its training becomes more difficult, and
the result could become worse if care is not taken. One must to balance the value of
the information which is being used to reflect the main differences between signal and
background against having too many inputs to be able to train the network successfully.
Acknowledgments
We are grateful to members of the Single Top Group of the Dü Collaboration for their
interest in the study presented, and especially to A. Heinson for careful reading of the
manuscript and important comments.
We would like to thank Thorsten Ohl and Alexander Pukhov for useful discussions. We
acknowledge the financial support of the Russian Ministry of Science and Technologies, and
the Sankt­Petersburg Grant Center.
References
[1] http://www1.cern.ch/NeuralNets/nnwInHep.html ;
M. Nelson and W. Illingworth, ''A Practical Guide to Neural Networks'', Addison
Wesley, Reading Mass (1991);
P.C. Bhat , ''Search for the Top Quark at Dü Using Multivariate Methods'', FERMILAB­
Conf­95/211­E.
[2] Harrison Prosper, Some mathematical comments on feed forward neural nets, Dü
Note 1606
[3] Measurement of the Top Quark Pair Production Cross Section in p¯p Collisions using
Multijet Final States. Phys. Rev. Dxx ppp (1998), FERMILAB PUB­98/130­E, hep­
ex/9808034.
[4] ''Dü Optimized Search for First Generation Leptoquarks in the evjj Channel with
Run I Data'' P. Bhat et al., Dü Note 3308 (1998)
[5] P. Bhat and A. Mirles, ``Studies of the effect of architecture and training parameters
on the performance of neural networks'', Dü Note 2669 (1995)
[6] E..E.Boos et al., hep­ph/9503280,SNUTP­94­116;
P.Baikov et al., in Proc. of the Xth Int. Workshop on High Energy Physics and Quan­
tum Field Theory, QFTHEP­95, ed. by B.Levtchenko, and V.Savrin, (Moscow, 1995),
p.101
6

[7] V.A. Ilyin, D.N. Kovalenko, and A.E. Pukhov, Int. J. Mod. Phys. C7, 761 (1996)
D.N. Kovalenko and A.E. Pukhov, Nucl. Instrum. and Methods A 389, 299 (1997)
[8] E. Boos, L. Dudko, and T.Ohl, Preprint INP­MSU 99­4/562, IKDA­99/02, p.23
[9] T. Sj¨ostrand, Comp.Phys.Comm. 82, 74 (1994)
[10] C. Peterson, T. Rognvaldsson and L. Lonnblad, JETNET 3.0 A versatile Artificial
Neural Network Package, LU TP 93­29 (1993)
[11] D. Dicus and S. Willenbrock, Phys. Rev. D 34, 155 (1986)
C.­P. Yuan, Phys. Rev. D 41, 42 (1990)
S. Cortese and R. Petronzio, Phys. Lett. B253, 494 (1991)
G.V. Jikia and S.R. Slabospitsky, Phys. Lett B295, 136 (1992)
R.K. Ellis and S. Parke, Phys. Rev. D 46, 3785 (1992)
G. Bordes and B. van Eijk, Z. Phys. C57, 81 (1993)
D.O. Carlson and C.­P. Yuan, Phys. Lett. B306, 386 (1993)
G. Bordes and B. van Eijk, Nucl. Phys. B435, 23 (1995)
D.O. Carlson, E. Malkawi and C.­P. Yuan, Phys. Lett. B337, 145 (1994)
T. Stelzer and S. Willenbrock, Phys. Lett. B357, 125 (1995)
R. Pittau, Phys. Lett. B386, 397 (1996)
M. Smith and S. Willenbrock, Phys. Rev. D 54, 6696 (1996)
D. Atwood, S. Bar­Shalom, G. Eilam and A. Soni, Phys. Rev. D 54, 5412 (1996)
C.S. Li, R.J. Oakes and J.M. Yang, Phys.Rev. D55, 1672 (1997)
C.S. Li, R.J. Oakes and J.M. Yang, Phys.Rev. D55, 5780 (1997)
G. Mahlon, S. Parke, Phys.Rev. D55, 7249 (1997)
A.P. Heinson, A.S. Belyaev, E.E. Boos, Phys. Rev. D56, 3114 (1997)
T. Stelzer, Z. Sullivan, S. Willenbrock, Phys. Rev. D56, 5919 (1997)
T. Tait and C.­P. Yuan, in hep­ph/9710372
D. Atwood, S. Bar­Shalom, G. Eilam and A. Soni, Phys. Rev. D 57, 2957 (1998)
T. Stelzer, Z. Sullivan, and S. Willenbrock, Phys.Rev.D58:094021, 1998
A. Belyaev, E. Boos, and L. Dudko, INP­MSU­98­24­525, Jun 1998. 15pp., hep­
ph/9806332 to appear in Phys. Rev. D
7

Figure 2: The Neural Network architecture for the input Set 1.
8

c 2 for training with different set of input variables
0.14
0.145
0.15
0.155
0.16
0.165
0.17
0 10 20 30 40 50 60
Ncycle (training time)
c
2
Figure 3: The ü 2 for the different set of input variables. The Ncycle is number of the Neural
Net training cycles, it's proportional the training time
9

c 2 for training with different set of input variables
0.13
0.135
0.14
0.145
0.15
0.155
0.16
0 10 20 30 40 50 60
Ncycle (training time)
c
2
Figure 4: The ü 2 for the different set of input variables.
10

jn_tbg1
The second net output for the different set of input variables
0
100
200
300
400
500
600
700
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NN output
dN/N
jn_tbg4
0
200
400
600
800
1000
1200
1400
1600
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
NN output
dN/N
Figure 5: The NN output for the two set of input variables.
11