New approaches in user-centric job monitoring on the LHC Computing Grid [Elektronische Ressource] : Application of remote debugging and real time data selection techniques / Tim dos Santos

icon

152

pages

icon

English

icon

Documents

2011

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

icon

152

pages

icon

English

icon

Documents

2011

Le téléchargement nécessite un accès à la bibliothèque YouScribe Tout savoir sur nos offres

FACHBEREICH C - FACHGRUPPE PHYSIKBERGISCHE UNIVERSITATWUPPERTALNew approaches inuser-centric job monitoringon the LHC Computing GridApplication of remote debugging andreal time data selection techniquesDissertationbyTim dos SantosJuly 25, 2011Diese Dissertation kann wie folgt zitiert werden: urn:nbn:de:hbz:468-20110727-113510-6 [http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:hbz:468-20110727-113510-6] ContentsI Introduction 91 Context: On High Energy Physics (HEP) 131.1 Current research in HEP . . . . . . . . . . . . . . . . . . . . . . 131.1.1 The Standard Model . . . . . . . . . . . . . . . . . . . . 131.1.2 Examples for open questions . . . . . . . . . . . . . . . . 171.2 CERN and the LHC . . . . . . . . . . . . . . . . . . . . . . . . 181.2.1 The Large Hadron Collider . . . . . . . . . . . . . . . . . 181.2.2 The ATLAS Experiment . . . . . . . . . . . . . . . . . 191.2.3 Data ow in ATLAS . . . . . . . . . . . . . . . . . . . . 211.2.4 Real-time data reduction: Triggers . . . . . . . . . . . . 221.3 Software in HEP . . . . . . . . . . . . . . . . . . . . . . . . . . 221.3.1 High-performance maths and core services: ROOT . . . 231.3.2 Event generators and detector simulation tools . . . . . . 241.3.3 ATLAS’ main physics analysis framework: Athena . . 252 Grid Computing 292.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.1.1 De nition of the term \Grid Computing" . . . . . . . . . 292.1.
Voir icon arrow

Publié le

01 janvier 2011

Langue

English

Poids de l'ouvrage

3 Mo

FACHBEREICH C - FACHGRUPPE PHYSIK
BERGISCHE UNIVERSITAT
WUPPERTAL
New approaches in
user-centric job monitoring
on the LHC Computing Grid
Application of remote debugging and
real time data selection techniques
Dissertation
by
Tim dos Santos
July 25, 2011Diese Dissertation kann wie folgt zitiert werden:

urn:nbn:de:hbz:468-20110727-113510-6
[http://nbn-resolving.de/urn/resolver.pl?urn=urn:nbn:de:hbz:468-20110727-113510-6]

Contents
I Introduction 9
1 Context: On High Energy Physics (HEP) 13
1.1 Current research in HEP . . . . . . . . . . . . . . . . . . . . . . 13
1.1.1 The Standard Model . . . . . . . . . . . . . . . . . . . . 13
1.1.2 Examples for open questions . . . . . . . . . . . . . . . . 17
1.2 CERN and the LHC . . . . . . . . . . . . . . . . . . . . . . . . 18
1.2.1 The Large Hadron Collider . . . . . . . . . . . . . . . . . 18
1.2.2 The ATLAS Experiment . . . . . . . . . . . . . . . . . 19
1.2.3 Data ow in ATLAS . . . . . . . . . . . . . . . . . . . . 21
1.2.4 Real-time data reduction: Triggers . . . . . . . . . . . . 22
1.3 Software in HEP . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.3.1 High-performance maths and core services: ROOT . . . 23
1.3.2 Event generators and detector simulation tools . . . . . . 24
1.3.3 ATLAS’ main physics analysis framework: Athena . . 25
2 Grid Computing 29
2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1.1 De nition of the term \Grid Computing" . . . . . . . . . 29
2.1.2 Virtual Organisations . . . . . . . . . . . . . . . . . . . . 31
2.1.3 Components and services of a Grid . . . . . . . . . . . . 32
2.1.4 Security in the Grid . . . . . . . . . . . . . . . . . . . . 33
2.2 The WLCG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2.2 The middleware: gLite . . . . . . . . . . . . . . . . . . 36
2.2.3 Computing model . . . . . . . . . . . . . . . . . . . . . . 39
2.2.4 Data storage and distribution . . . . . . . . . . . . . . . 40
2.3 gLite Grid jobs . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.3.1 Input- and outputdata . . . . . . . . . . . . . . . . . . . 42
2.3.2 Grid job life cycle . . . . . . . . . . . . . . . . . . . . . . 43
2.3.3 Job failures . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.4 WLCG software . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.4.1 Pilot jobs and the pilot factory . . . . . . . . . . . . . . 46
2.4.2 The user interfaces: pAthena and Ganga . . . . . . . 47
3 Conclusion 50II Job monitoring 51
4 Overview 51
4.1 Site monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2 User-centric monitoring of Grid jobs . . . . . . . . . . . . . . . . 52
5 The Job Execution Monitor 53
5.1 Concept . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
5.2.1 User interface component . . . . . . . . . . . . . . . . . 56
5.2.2 Worker node component . . . . . . . . . . . . . . . . . . 58
5.2.3 Data transmission . . . . . . . . . . . . . . . . . . . . . . 59
5.2.4 Inter-process communication . . . . . . . . . . . . . . . . 61
5.3 Acquisition of monitoring data . . . . . . . . . . . . . . . . . . . 62
5.3.1 System metrics monitor (\Watchdog") . . . . . . . . . . 62
5.3.2 Script wrappers . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 User interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.4.1 Command-line usage . . . . . . . . . . . . . . . . . . . . 64
5.4.2 Built-in interface . . . . . . . . . . . . . . . . . . . . . . 64
5.4.3 Integration into Ganga . . . . . . . . . . . . . . . . . . 65
5.5 Deployment strategy . . . . . . . . . . . . . . . . . . . . . . . . 68
5.6 Shortcomings of this version of the software . . . . . . . . . . . 68
6 Conclusion 71
III Tracing the execution of binaries 73
7 Concept and requirements 73
7.1 Event noti cation . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2 Symbol resolving and identi er lookup . . . . . . . . . . . . . . 74
7.3 Application memory inspection . . . . . . . . . . . . . . . . . . 75
7.4 Publishing of the gathered data . . . . . . . . . . . . . . . . . . 75
7.5 User code prerequisites . . . . . . . . . . . . . . . . . . . . . . . 75
8 Architecture and implementation 77
8.1 Event noti cation . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.2 Symbol and value resolving . . . . . . . . . . . . . . . . . . . . 78
8.3 A victim-thread for safe memory inspection . . . . . . . . . . . 79
8.3.1 Concept and architecture . . . . . . . . . . . . . . . . . . 80
8.3.2 Usage by the CTracer . . . . . . . . . . . . . . . . . . 80
8.4 Resulting monitoring data . . . . . . . . . . . . . . . . . . . . . 819 Usage 83
9.1 Stand-alone execution for custom binaries . . . . . . . . . . . . 83
9.2 Integration into JEM . . . . . . . . . . . . . . . . . . . . . . . . 85
9.2.1 Con guration and invocation . . . . . . . . . . . . . . . 85
9.2.2 Insertion of CTracer-data into JEMs data stream . . 86
9.2.3 Augmentation of the JEM-Ganga-Integration . . . . . 86
9.3 Application for HEP Grid jobs . . . . . . . . . . . . . . . . . . 87
9.3.1 Preparation of the user application . . . . . . . . . . . . 88
9.3.2 Activation and con guration in Ganga . . . . . . . . . 88
9.3.3 Results and interpretation in an example run . . . . . . . 89
9.4 Performance impact . . . . . . . . . . . . . . . . . . . . . . . . . 90
10 Conclusion 92
IV A real time trigger mechanism 93
11 Concept and requirements 93
11.1 Extendible chunk format for monitoring data . . . . . . . . . . . 93
11.2 Chunk backlog and tagging . . . . . . . . . . . . . . . . . . . . 94
11.3 Inter-process communication in JEM revised . . . . . . . . . . . 99
12 Architecture and implementation 103
12.1 General JEM architecture changes . . . . . . . . . . . . . . . . 103
12.2 High-throughput shared ring bu er . . . . . . . . . . . . . . . . 104
12.2.1 Working principle . . . . . . . . . . . . . . . . . . . . . . 105
12.2.2 Ring bu er operations . . . . . . . . . . . . . . . . . . . 109
12.3 Triggers and event handling . . . . . . . . . . . . . . . . . . . . 111
12.3.1 Trigger architecture . . . . . . . . . . . . . . . . . . . . . 111
12.3.2 Trigger scripting APIs . . . . . . . . . . . . . . . . . . . 112
12.3.3 Example trigger scripts . . . . . . . . . . . . . . . . . . . 115
12.4 Memory management . . . . . . . . . . . . . . . . . . . . . . . . 116
12.4.1 Management of shared memory . . . . . . . . . . . . . . 116
12.4.2 Shared identi er cache . . . . . . . . . . . . . . . . . . . 116
13 Application in JEM 119
13.1 Changes in JEM execution . . . . . . . . . . . . . . . . . . . . 119
13.2 Refactored Ganga-JEM integration . . . . . . . . . . . . . . . 120
13.3 CTracer . . . . . . . . . . . . . . . . . . . . . . . . . 120
14 Testing 123
14.1 Functional tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
14.2 Performance tests . . . . . . . . . . . . . . . . . . . . . . . . . . 125
15 Conclusion 127V Summary 129
16 Use cases and testing 129
16.1 Testing framework . . . . . . . . . . . . . . . . . . . . . . . . . 130
16.1.1 Unit tests . . . . . . . . . . . . . . . . . . . . . . . . . . 130
16.1.2 User tests . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16.2 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
16.2.1 User perspective: Hanging Grid job . . . . . . . . . . . . 132
16.2.2 Admin perspective: Excess dCache mover usage . . . . 133
17 Outlook 135
17.1 Open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
17.2 Further development . . . . . . . . . . . . . . . . . . . . . . . . 136
18 Conclusion 140
VI Appendices 141
A Module structure 141
B Example trigger implementations 142
C List of Figures 145
D List of Tables 146
E List of Listings 146
F Acronyms and abbreviations 147
G References 149Part I
Introduction
There has been one development in science in the last decade that a ected
almost all elds of knowledge, ranging from humanities, economics and social
science to natural sciences. This development is the huge rise in the amounts
of data that are created and have to be handled, and in the complexity of the
operations and analyses the scientist has to perform on that data. Both issues
lead to the conclusion that the single most important aspect of modern science
probably is computing.
Computer-assisted analysis and data processing now is an integral compo-
nent of the scienti c process. The need for computing power and data storage
capacities rises exponentially, without a conceivable limit. New methods and
forms of computing are developed regularly to cope with the increasing require-
ments of scientists. This process is paralleled with a similar growth of computing
in the industry; both areas depend on each other and new developments usually,
eventually are shared between them.
In addition to the need for raw computing power and data storage, an
equally growing requirement to computing nowadays is communication, or data
transfer. The large amounts of data generated and stored have to be distributed
all over the world to facilitate the international

Voir icon more
Alternate Text