LB and JP cleanup

From EgeeWiki

Contents

Motivation

  • to reduce the number of module dependencies (both internal and external)
  • to make the gLite really light :)
  • get rid of legacy pieces of LB code that are worth rewriting

Dependency Challenge

As part of the gLite restructuring process, there has been introduced the official Dependency Challenge.

For its purposes we created two reports:

See also status of last phases of gLite restructuring.

Workplan

Phase 1 (conservative)

  1. setup functional ETICS branch configurations (currently named module_branch_RC31_3)
  2. start removing redundant dependencies with the following preferencies:
    1. in configurations
    2. in Makefiles
    3. in the code
  3. be able to build all modules separately as well as to build the whole subsystem, both LB and JP
  4. define procedures for cloning current branch configuration into release configuration
    1. we hoped for working etics-tag configuration for long time
    2. finaly we use "poor etics" replacement

Timeline

  • Feb 6, 2007: started
  • Mar 7-9, 2007: first report at the JRA1 All-hands meeting in Catania, see slides
    • long time spent on distinuishing Etics bugs from features
    • several release configurations prepared manually (something had to be delivered)
  • May 15, 2007: checkpoint, phase 1 mostly done
    • branch configurations mostly done, final check wrt. last release ones scheduled
    • poor-etics-tag basically working, to be extended with CVS tagging
    • static dependencies should be minimized to lb.build
    • one more overall check of dependencies scheduled
  • Phase 1 FINISHED, Jan Pospisil 23:16, 25 May 2007 (CEST)

Phase 2 (aggressive)

Code rearangements targetted to the overall glite restructuring goals

  • less complicated inter-component dependencies
  • thin client
  • cleaner code structure

Timeline

See also Status of last phases of gLite restructuring.

TODO phase 1

To be done on RC31_3 branch & configurations

poor-etics environment

  • replacement of etics-tag
  • full funcionality for a single module only
  • some support for hierarchies welcome
  • DONE, Ales Krenek 19:00, 18 May 2007 (CEST)

cleanup static dependencies

  • the only acceptable is lb.build
  • should be solved by re-introducing glite-properties configuration
  • depends on bug 25331 - solved May 22.
  • DONE, Jan Pospisil 19:16, 22 May 2007 (CEST)

new JP branches & configurations

  • DONE, created all glite-jp*_branch_RC31_3 Ales Krenek 19:00, 18 May 2007 (CEST)

merge RC31_3 -> HEAD

TODO phase 2

to be done on HEAD

handle *.T files and supported perl scripts

  • create a new module org.glite.lb.types (tentative name)
  • change *.pm to be able to 'install' (stage) them somewhere into the lib/perl directory structure
  • at3 goes to sbin, rename to glite-lb-at3 (tentative)
  • *.T staged to share/lb/at3 (tentative)
  • DONE, Ales Krenek 19:00, 25 May 2007 (CEST)
    • created org.glite.lb.types, including HEAD configuration
    • modified Makefile's and configurations of other lb modules to use it instead of lb.build
    • removed lb.build from glite-lb_HEAD configuration
  • DONE, remove all these files from org.glite.lb/project (on HEAD) to avoid misleading duplicity failures, Jan Pospisil 12:03, 9 August 2007 (CEST)

check_versions.pl

  • leftover from lb.build elimination
  • client version now follows the version of common (it used to be the other way round)
  • to be done after client/client-interface/common cleanup
  • honik
  • Remaining TODOs and FIXMEs
    • DONE: set it properly both in common and client, the script is now part of org.glite.lb.type and it is staged as glite-lb-check_version.pl to ${stagedir}/sbin, in common we create common_version.h and in client we check against it, Jan Pospisil 11:59, 9 August 2007 (CEST)
    • DONE: build & test - enable by default, Jan Pospisil 11:59, 9 August 2007 (CEST)

Remove globus dependencies

  • the only alowed is probably B(uild)-time dep. for org.glite.security.gsoap-plugin, nowhere else
  • (salvet): may not be so easy as it looks
  • no need to have working legacy build here but don't remove what is already in place now
  • DONE: split modules: May 25, valtri
  • wrap globus calls and remove direct dependencies: later, salvet+kouril
  • maybe substitute gridsite by something else?
    • to be decided
  • Currently used globus headers/calls:
globus_common.h (common, client, logger, server)
globus_config.h (common, client)
globus_libc_gethostname() (common/src/param.c, server/src/bkserverd.c)
globus_module_activate() (client, logger, server)
gss_cred_id_t in structure _edg_wll_ConnPool in common/interface/context-int.h
  • Remaining TODOs and FIXMEs:
    • DONE: create org.glite.security.gss module both in CVS (Frantisek Dvorak 20:25, 17 May 2007 (CEST)) and in ETICS (Jan Pospisil 21:30, 25 May 2007 (CEST))
    • DONE: move the GSS realted stuff from org.glite.security.gsoap-plugin to org.glite.security.gss, Frantisek Dvorak 17:18, 11 June 2007 (CEST)
    • DONE: wrap globus calls and remove direct dependencies - both in the code (Daniel Kouril 11:45, 7 August 2007 (CEST)) and in the ETICS configurations (Jan Pospisil 16:39, 9 August 2007 (CEST))
    • TODO: remove the dependency from server - now required due to gridsite and VOMS
      • TODO: wrap VOMS calls in org.glite.gss Daniel Kouril
      • TODO: ask gridsite people to provide ACL-only package Ales Krenek
    • TODO: build & test - so far so good, Jan Pospisil 16:39, 9 August 2007 (CEST)

resolve wms-utils.jobid and exception dependence

  • first reaction from our IT friends is promissing, details to be negotiated
  • org.glite.jobid module -- why new subsystem? (org.glite.common seems be dead)
  • splitting C and C++ code in different modules -- we are not against if someone insists but we won't do it actively (currently only in RGMA)
  • no need to depend on "ITCZ common" exception base (never happened actually)
  • exception independence status
  • ASAP, Michal
  • Remaining TODOs and FIXMEs:
    • DONE, new LoggingExceptions.h (remove exception dependency on org.glite.wms-utils.exceptions), Michal Vocu 11:40, 4 June 2007 (CEST)
    • DONE create org.glite.jobid module both in CVS (Michal Vocu 18:18, 30 July 2007 (CEST)) and in ETICS (Jan Pospisil 15:09, 2 August 2007 (CEST))
    • DONE, move all code from org.glite.wms-utils.jobid there, Michal Vocu 17:02, 9 August 2007 (CEST)
    • TODO, switch LB and JP to use it
      • done: lb.common, lb.client, lb.logger, lb.server, lb.proxy, lb.utils
    • TODO, solve the problem with GLITE_WMS_LOGGING_ERROR_BASE used in lb.common/interface/context.h
    • TODO, switch WMS to use it - Francesco Giacomini
    • TODO, build & test

cleanup LB client/client-interface/common

  • .T's are separate (see above)
  • event + jobstate specific code (including .h's) go to common
    • resolves common -> client-interface dependence
  • producer + consumer go to client
  • completely remove client-interface
  • June 10, honik
  • Remaining TODOs and FIXMEs:
    • DONE, CLIENT-INTERFACE: completly removed (removed all files from it on HEAD), Jan Pospisil 19:16, 4 June 2007 (CEST)
    • DONE, CLIENT-INTERFACE: remove the glite-lb-client-interface_HEAD configuration, Jan Pospisil 09:46, 9 November 2007 (CET)
    • DONE: COMMON: lb_perftest - builds fine, Jan Pospisil 15:30, 5 June 2007 (CEST)
    • DONE: COMMON: xml_conversions, xml_parse - depends on purge/dump/load/notification + CLIENT: dump.c, purge.c, load.c, notification.c - solved by moving the contents of purge.h dump.h and load.h into common/query_rec.h and client/query.h, dtto for notification.h -> common/notif_rec.h, Jan Pospisil 16:27, 7 June 2007 (CEST)
    • DONE: CLIENT: logevent.c, builds fine, Jan Pospisil 16:27, 7 June 2007 (CEST)
    • DONE: CLIENT: PLUSLIB - CPP does not build at all - fixed (wrong headerfiles defines), Jan Pospisil 15:00, 5 June 2007 (CEST)
    • DONE: CLIENT: FAKE_HDR - do we need them? They builds fine, kept as is. Jan Pospisil 16:27, 7 June 2007 (CEST)
    • DONE: COMMON: param.c - default values for SetParamTime - now in timeouts.h, probably put all possible timetouts there?, Jan Pospisil 15:30, 5 June 2007 (CEST)
    • DONE: ETICS: update the configurations, Jan Pospisil 13:05, 19 June 2007 (CEST)
    • DONE: COMMON + CLIENT: strictly use "#include <glite/lb/...h>", solved by creating symlinks in the build directory, Jan Pospisil 13:55, 2 August 2007 (CEST)
    • DONE: COMMON + CLIENT: do not use "__GLITE_LB__" symbols, avoid double-underscore at the beginning, Jan Pospisil 13:19, 9 August 2007 (CEST)
    • DONE: COMMON + CLIENT: check_version.pl - reenabled, see here, Jan Pospisil 12:16, 9 August 2007 (CEST)
    • TODO: COMMON + SERVER: split the context
      • not critical due to padding (API & ABI won't change)
      • to be done after 3.1->HEAD merge
  • revise common code, move client and server specific code to these
    • split context to client context and server context
    • long term issue
  • TODO: review producer, remove duplicated code Jan Pospisil Zdenek Salvet

lbjp-common

  • formerly known as lb-utils
  • common stuff for LB and JP
  • included components:
    • trio -- include escaping functions, should resolve dependence on LB
    • db -- common DB access functions, unification and cleanup needed, mostly done in lb-utils
    • server-bones -- take as is (hopefully)
    • maildir from lb.common -- it's used by JP too
    • logging library -- after the approach is agreed at AH in Helsinki
    • interlogger NT -- later
    • communication layer (plain, gss etc.) -- later, after ILNT requirements are clear
  • common context handling -- no sense, too little really common functionality
  • setup and populate the subsystem: June 1, valtri
  • switch LB and JP to use it: June 8, valtri
  • move ILNT here: michal
  • Remaining TODOs and FIXMEs:
    • DONE: create org.glite.lbjp-common.db module both in CVS (Frantisek Dvorak 20:25, 17 May 2007 (CEST)) and in ETICS (Jan Pospisil 10:57, 2 August 2007 (CEST))
    • DONE: move related code from LB/JP to org.glite.lbjp-common.db
    • DONE: wrap mysql calls under RTLD_LOCAL to avoid openssl clash with globus (Frantisek Dvorak)
    • DONE: switch LB/JP to use org.glite.lbjp-common.db (Frantisek Dvorak)
    • DONE: create org.glite.lbjp-common.maildir module both in CVS (Frantisek Dvorak 20:25, 17 May 2007 (CEST)) and in ETICS (Jan Pospisil 10:57, 2 August 2007 (CEST))
    • DONE: move related code from LB/JP to org.glite.lbjp-common.maildir
    • DONE: switch LB/JP to use org.glite.lbjp-common.maildir
      • DONE: restruct a little the mill and purge scripts too (the same purging in both)
    • DONE: create org.glite.lbjp-common.trio module both in CVS (Frantisek Dvorak 20:25, 17 May 2007 (CEST)) and in ETICS (Jan Pospisil 10:57, 2 August 2007 (CEST))
    • DONE: move related code from LB/JP to org.glite.lbjp-common.trio
    • DONE: switch LB/JP to use org.glite.lbjp-common.trio
    • DONE: create org.glite.lbjp-common.server-bones module both in CVS (Frantisek Dvorak 20:25, 17 May 2007 (CEST)) and in ETICS (Jan Pospisil 10:57, 2 August 2007 (CEST))
    • DONE: move related code from LB/JP to org.glite.lbjp-common.server-bones
    • DONE: switch LB/JP to use org.glite.lbjp-common.server-bones
    • TODO: add ILNT
    • TODO: build & test

Standalone module for LB state-machine

LB Plugin is:

  • not good to be provided by lb.server -- run-time dependence is too heavy
  • to be moved to a new module, including the whole state machine
    • state machine library, mostly processEvent() and what it needs
    • JP plugin -- relatively thin wrapper on the library
  • both lb.server and jp.primary depend on this module
  • the only contra is adding another module
  • building jp.primary because of plugin interface needed by lb.server (or elsewhere) is too heavy

TODOs:

  • DONE create a module org.glite.lb.state-machine in CVS, Jan Pospisil 16:58, 29 Jan 2008 (CEST)
  • DONE move all LB state machine relevant code there, Jan Pospisil and Ales Krenek 18:34, 1 Feb 2008 (CEST)
    • provides static-only library with the state machine, and dynamic plugin for JP -- intended B-only depedence for lb.server.
    • sequence code manipulation extracted to seqcode_aux.c
    • internal jobstat
      • interface exported from this module via intjobstat.h
      • init, free etc. included in process_event.c
      • database enc/dec retained in jobstat_supp.c in lb.server (not needed here)
    • XSD interface (JP attributes) moved here
  • DONE "stolen" files from lb.server tagged with copy_to_state_machine to transfer patches later eventually, Ales Krenek 18:34, 1 Feb 2008 (CEST)
  • DONE create Etics configuration, Jan Pospisil 10:37, 26 Feb 2008 (CEST)
  • DONE Etics-build, Jan Pospisil 20:38, 28 Feb 2008 (CEST)
  • DONE convert lb.server to use it, Jan Pospisil 15:43, 29 Feb 2008 (CEST)
    • test functionality there :-)
    • get rid of files or their parts that were moved to lb.state-machine
  • TODO update lb.plugin for the new set of attributes (AKA attr2)
  • TODO test functionality in JP
  • TODO extract plugin interface from jp.primary, i.e. make B-dependence less heavy

lb.utils

  • dependence on lb.server eliminated by extracting the state machine (above)
  • mon-db -- logically part of server, move it there
    • used to be standalone due to JRA2 remote access to database, seems not to be used anymore
  • honik
  • Remaining TODOs and FIXMEs:
    • DONE: move mon-db to server, Jan Pospisil 00:15, 10 June 2007 (CEST)
    • TODO: remove the dependency on server - not possible at the moment, statistics need jp_job_attrs.h, could be solved once the the probelm with LB plugin is solved
    • TODO: build & test

lb.ws-interface

  • good reasons to include it with both client and server
  • keep it as is for the time being

notifications

  • notif-il built in lb.logger, packaged with lb.server
    • implies build-time dependence, verify configs
    • shared code, splitting would imply either code duplication or non-trivial restructuring
    • to be obsoleted by ILNT (soon :-)
    • DONE, glite-lb-1.6.0-1
      • built and installed in lb.logger
      • lb.server startup script prints a warning if notif-il is not available
      • dependencies adjusted accordingly
  • TODO evaluate and pass on notifications after committing event insert and job state update

unify lb.proxy + server

  • most of the code is shared (or resides in server) either
  • provide "proxy access" to full server on WMS machine for smaller setups
    • avoid the overhead of running both on the same machine
  • --proxy-mode (server, proxy, both) switch to select functionality, interfaces to listen, purging strategy etc.
  • only one database - new flag in the 'jobs' table:
    • lb: local jobs (part HOST of jobid correspond to the hostname of running LB/proxy server)
    • proxy: jobs through local socket (likewise in the current lbproxy)
    • lb+proxy: intersection
  • general concepts still to be agreed, implementation to be done somewhat later (August 2007)
  • mulac
  • Remaining TODOs and FIXMEs:
    • DONE, announce and discuss the changes at AH in Helsinki, Jan Pospisil 10:47, 20 June 2007 (CEST)
    • DONE, move all code from org.glite.lb.proxy to org.glite.lb.server, new server option --proxy-mode
    • TODO, cleanup and unify the code - reduce the *Proxy functions, get rid of misleading names (rename if necessary) - add comments/description what they are supposed to do, etc., switch to one db only, etc.
    • DONE, build
    • TODO, test - new algorithm for storing records in jobs table was implemented, however further testing needs to be done
      • review of algorithm in store_jobs_server_proxy()
      • test server in all 3 modes (proxy only/server only/proxy+server)
      • WORKARROUNDEDforce at least 2 slaves for proxy&server mode? (deadlock in pseudoparallel registration)
      • check locking of store_jobs_server_proxy() - is it correctly locked?
      • is calling from db_store() correct or should it be called from edg_wll_StoreEvents()
      • do we want greyjobs mechanism for proxy jobs?
      • check proxy events resending to distant LB server
      • when testing 'locallity' of job - is testing of srvPort to jobidPort what we want, or strcmp(srvHost,joidHost) is enough
    • DONE, org.glite.lb.proxy module removal
    • DONE, better implementation of removal trigger conditions (not event dependent but state dependent)
    • DONE, increase robustnes of DB migration/upgrade (glite-lb-migrate_* scripts), solution for current proxy and server DBs unification into one DB
    • TODO, IMPORTANT: change the corresponding deployment module (with integration team - ljocha?)
    • TODO: documentation -- describe 3 deployment scenarios (honik in admin guide)
      • philosophy
        • single binary, shared DB, easier deployment
      • new parameters of glite-lb-bkserver
      • connection to transactions
    • DONE: check behaviour for parent jobs (especially collection state change events)
    • DONE: subjobs - does not work correctly for only embryonically registered subjobs on server+proxy (proxy flag not set)
    • DONE: job_reg -X -C -S registers only parent job to JP, not subjobs...
    • DONE: rewrite db_store_finalize
      • edg_wll_EventSendProxy() - do not send events if job is local
      • send registration to JP also for proxy jobs (with seq=1 && !EDG_WLL_LOGFLAG_DIRECT)
      • consider the same for notification match
    • TODO: check wheter notifications are triggered correctly, add some test table bellow
    • DONE: register_subjobs_embryonic() is not double callable (always using (buffered)insert, no update)
      • it is called from edg_wll_StoreEvent(), which is called only once from db_store()
      • it is necessary to make register_subjobs_embryonic() double-callable (from server and from proxy), or add proxy/server flag update capability into store_job_server_proxy
    • DONE: it seems that edg_wll_UnlockJob should call edg_wll_Commit() to enforce changes to be stored in DB immediately
      • mysql update hangs if called before commit from other process (which modified the same table?)
    • TODO: test whether DB is USP(united server+proxy) ready at the start of LB (somewhere close to glite_lbu_DBQueryCaps() call)

Tests

  • TODO: Rerun all the tests after all other TODO's are resolved
  • TODO: Do all the test on one binary, and two binaries on one machine (one DB) and two machines

1) register jobs to bkserver running as proxy and server (-B) and check DB columns proxy and server of jobs table (-x means register to proxy)

job_reg proxy server STATUS
-x local_jobid 1 1 WORKING
local_jobid 0 1 WORKING
-x remote_jobid 1 0 WORKING


2) let one of each job category to switch to terminal state a check DB columns proxy and server of jobs table (job type = XY, Y=proxy, Y=server)

job type proxy server STATE
11 0 1 WORKING
01 0 1 WORKING
10 erased erased WORKING

3) purge one of each job category and check DB columns proxy and server of jobs table

job type proxy server STATE
11 1 0 WORKING
01 erased erased WORKING
10 1 0 WORKING

4) query one of each job category on proxy socket and server port and look what is returned

job_status socket port STATE
11 state state WORKING
01 NA state WORKING
10 state state (from remote LB) WORKING

5) register collection with and without subjob re-registration and check DB columns proxy and server of jobs table

job_reg parent proxy parent server subjob proxy subjob server STATUS
-x -C -n 1 local_jobid 1 1 0 1 WORKING
-C -n 1 local_jobid 0 1 0 1 WORKING
-x -C -n 1 remote_jobid 1 0 1 0 WORKING
-x -C -S -n 1 local_jobid 1 1 1 1 WORKING
-C -S -n 1 local_jobid 0 1 0 1 WORKING
-x -S -C -n 1 remote_jobid 1 0 1 0 WORKING

|}

switch to transactions

  • GOAL DONE wipe out all POSIX locks and use DB transaction locking capability ov InooDB engine
  • test whether basic things works
    • DONE job_log
    • DONE job_status
    • DONE user_jobs
    • DONE notifications
      • test, bind
    • DONE purge
    • DONE bkindex
      • adding/removing indices
    • DONE resending proxy events to LB
    • DONE stress test
      • storing/quering server from mutliple clients for 1h
  • TODO measure performance
  • Merge to HEAD DONE
  • thoughout tests
    • DONE job_log
      • proxy/server query
    • DONE job_reg
      • basic, -x, -C -n, -C -n S, combinations
    • DONE job_status
      • server/proxy query, -fullhist, -fasthist, -all
    • DONE user_jobs
      • server/proxy query, owner parameter
    • DONE notifications
      • new/receive/refresh/drop
    • DONE purge
      • bunch/per jobid
    • DONE bkindex
      • adding/removing indices
    • DONE DB migration tools
      • slightly tested
  • Documentation TODO
    • POSIX locking vs. transaction DB locking
    • new DB schema
    • describe migration process

changes in the database schema

  • general cleanup
  • Remaining TODOs and FIXMEs:
    • TODO, add new field 'proxy' to table jobs, see above
    • TODO, add new field 'priority' to table events and add the code to store priority flags there
    • TODO, add new flag 'seqno' to table events and change the code to store sequence numbers there instead of in the short_fields

lb.server

  • TODO: transactions: general review, handle rollbacks, locking etc.
  • TODO: lowercased user tags
    • Due to SQL case insensitivity, all user tag names are lowercased in order to generate unique column names. Not sustainable anymore (JP namespace prefixes)

Documentation and examples

  • Really BIG trouble :(
  • See for example GGUS Ticket 19469, especially the initial comments
  • move examples to a separate module org.glite.lb.examples?
    • probably not, examples belong to its module (most of them are already in lb.client)
    • source code of the examples is REQUIRED to be distributed together with the RPMs (esp. lb-client)
    • an extra module can be created for documentation - org.glite.lb.doc - will depend (B-uild time) on almost all other LB modules, major parts of the documentation generated automatically, a separate module is not necessarily required, but it could simplify merging of all documentations together
  • everybody should contribute
  • Remaining TODOs and FIXMEs
    • MERGED INTO THE FOLLOWING TASK, first update the LB-GUIDE w.r.t. the comments above, they are ALL appropriate, Jan Pospisil 23:54, 15 December 2007 (CET)
    • WORK IN PROGRESS, create a comprehensive documentation for LB in org.glite.lb.doc and split it into several parts:
      • TODO User's Guide (job life cycle, CLI tools description: logging, changing ACL, querying, notifications, etc.), Jan Pospisil
      • TODO Administrator's Guide (deployment scenarios, instalation, configuration, daemons description, CLI tools description: purge/dump/load, complete RMP description, etc.), Zdenek Salvet
      • TODO Developer's Guide (complete API and WS documentation, programming and WS examples, etc.) - some parts should be generated automatically - avoid text duplicities, Michal Vocu
    • WORK IN PROGRESS, create a comprehensive documentation for JP in org.glite.jp.doc and also split it into
      • TODO User's Guide,
      • TODO Administrator's Guide
      • TODO Developer's Guide
    • TODO, create org.glite.lbjp-common.doc and put common parts of both org.glite.lb.doc and org.glite.jp.doc there (namely egee.cls, copyright.tex, definitions.tex, lbjp.bib, etc.)
    • DONE, install the source code of examples in lb-client (include them in the RPM) - a suitable Makefile to install is also needed and documentation of the examples - at least one README describing all the examples, Jan Pospisil 23:54, 15 December 2007 (CET)
    • TODO, create man pages for all executables we distribute with LB/JP (doxygen probably does not help much, some sort of automatic generator necessary) - Jiri Filipovic
    • TODO, create man pages for all important API functions (mainly automatically by doxygen) - Michal Vocu
    • TODO, create plain text documents to be included in all RPMs we distribute describing each module, add description to RPMs (fill in description fields in ETICS) - Ales Krenek
    • DONE, create plain text ChangeLog document to be included in all RPMs we distribute, it could be created automatically with poor-etics-tag when pushing versions - requires an extra description of all changes since previous version push-up, Ales Krenek 15:40, 26 November 2007 (CET)
    • TODO, update the web, documents and links at http://egee.cesnet.cz/en/JRA1, http://glite.web.cern.ch/glite/documentation/default.asp, https://edms.cern.ch/document/571273/2, http://egee-jra1-wm.mi.infn.it/egee-jra1-wm/, https://twiki.cern.ch/twiki/pub/LCG/GridServiceMonitoringDescriptions/0702-Grid_Service_Monitoring_LB.txt etc., Ales Krenek

merge RC31_3 -> HEAD

  • ljocha (virtually everybody contributed)
  • Remaining TODOs and FIXMEs:
    • DONE, merge all branch changes (esp. those after major API++) to HEAD
    • TODO build, test, and fix

megajob

  • TODO: rerun the stress tests to check overall throughput after restructuring

general build cleanup

  • DONE: remove old glite build system artefacts, Jan Pospisil 20:19, 24 September 2007 (CEST)
  • TODO: review all Makefiles, especially after the merge from RC31_3

Dropped tasks

  • Completely remove project directory in all modules *
    • (ljocha, May 15) not to be done yet, legacy build has to be kept, and we need at least version.properties either
  • dtto for expat it should be probably used only in org.glite.lb.common
    • light dependence, no need to do so
  • unify using of version, now it is used both from ETICS configuration as well as from the module/project/version file -> at the moment we prefer to use version form the CVS that appears in project/version.properties (but it will be removed, se above)
    • included in "poor-etics" scripts

ETICS issues

Feel free to tell your opinions about ETICS in their questionaire.

Our general feeling about ETICS is the following:

Which features are useful in ETICS?

  • Automatic RPM builds (no need to write spec files at all)
  • Quite good interoperability with CVS
  • Idea of remote builds (though they are far from being perfect at the moment)

Which features are missing in ETICS?

  • It is now impossible to track changes in configurations. We would appreciate to add comments similar to CVS commits and be able to read them again in the future. Ability to go back in the history and revert the changes is not so important but would be nice.
  • We would appreciate to run etics-build offline.
  • We would like to be able to specify that etics should not download a particular dependency to repository, but use locally installed one like it used to be in the old (ant) build system.

Which features MUST be improved in ETICS?

  • Correctness of packages in the repository for all available platforms. It is inappropriate to have many packages incorrect (missing libraries or links to them, wrong SONAMEs, different names or directory structures on different platforms, etc.).
  • Speed of all etics commands, especially etics-checkout and etics-build (etics-configuration is quite good, but it also has hidden limits :). Our measurements show that during etics-build only less than 30% of the time is spend in the real make, the rest (more than 300%!!) is overhead.
  • Moreover, it would be nice if etics-build --nodeps could build the corresponding module immediately and not after several minutes of "Skipping dependency...". In this case the overhead is sometimes upto 3000%!!
  • Reliability of remote builds, they often behave different way than local builds. This is probably caused by the inconsistency of remote build machines (in particular there was an issue with the libtool).
  • Consistency of local configuration store: we get frequently errors "configuration id xx-yy-... not found", and the workspace is not usable anymore. Morover, if one issues "configuration update" from such a workspace, it results in a corrupted configuration.
  • The binaries in the RPMs (build remotely) have stripped debug symbols, which makes the process of error identification in production environment very difficult or sometimes impossible.

How do we consider the handling of security in ETICS?

  • The ETICS Admin web interface is really far from being user friendly. Maybe you should create also the command line version of it.
  • You should substitue etics-certificate-server by using standard proxy certificates, it is a potential security hole regardless typing pass phrases again and again is quite annoying.

Opened bugs

  • Dependencies problem: bug 24897 - partially solved, --force required for etics-build command
  • Mailfunction of clone on the web: bug 25834
  • Cannot have two configurations with the same name, though promised by Alberto that it should work, bug 25836
  • Web interface not functional 31955, Doesn't work with Firefox3 36062
  • Submitting of "big" remote build jobs fails 35635

Closed bugs

  • Build status "processing" in the build-report is not consistent/correct: bug 28573 - solved Aug 28.
  • etics-checkout failure - error in SQL syntax, bug 28110 - solved July 19.
  • there is no glite_properties_3_1_0 configuration for org.glite, bug 25331 - solved May 22.
  • etics-tag not functional, bug 24686 - solved May 17.