Thursday, August 27, 2009

Fixes in GFAL and lcg_util

lcg_util: should use -1 length for gridftp/CKSM

In GFAL, we always calculate checksums for the whole file (if it is needed). However, the corresponding Globus API function (globus_ftp_client_cksm) was used with wrong length parameter: it was 0 (calculate checksum for 0-length data) instead of -1 (calculate until the end of the file). This error also pointed out an inconsistency in DPM, it interpreted this value a bit differently than in the API specification above.

GFAL: shall handle abreviated checksum names


It is a workaround on how DPM calls the checksum algorithms. Actually, the endpoints should follow the GridFTP specification, but DPM has already implemented a different name set. It will be changed in DPM as well, however, it may not be deployed everywhere soon. So, internally and temporarily, GFAL detects and converts the DPM names to the GridFTP conventions.

The fixes have been sent for integration and certification.

Monday, August 24, 2009

GFAL activities in Savannah

The following GFAL bugs have been certified:

Unable to get a tURL in a full space
See the description and the solution here.

GFAL: Problems with LDAP queries on SL5
The LDAP filters contained spaces between the filter parts and the operators. It seems that it is not allowed, but some GFAL releases (on different platforms probably) have worked with spaces.

In the same time, we introduced a new activity:
updating depreceted LDAP calls

OpenLDAP still continues supporting deprecated functions, so the priority is low.

Thursday, August 13, 2009

playing with git

I have set up a CVS-to-git export cron job for the glite data management and CASTOR CVS modules that we can try git.

On RedHat derivatives you can install from the DAG repository

yum install git git-cvs git-svn qgit

On Ubuntu you can install the dependencies as

sudo apt-get install git-core git-svn git-cvs qgit giggle

And then you can check out a module:

$ time git clone
git://lxtank02.cern.ch/org.glite.data.srm-util-cpp
...
real 0m0.803s
user 0m0.340s
sys 0m0.044s

At this point my checkout was a bit lost with the branches,
so needed a bit of help to find back to the true path:

$ cd org.glite.data.srm-util-cpp
$ git merge origin/origin

And this point you can start branching at wish to play with the GFAL code.

You can sync up later with CVS using
$ git pull

Once we move from CVS to SVN committing through git would become also feasible.


To see the efficiency of the storage here is a small experiment:

$ git clone git://lxtank02.cern.ch/CASTOR2
$ du -sh CASTOR2/.git
37M CASTOR2/.git
$ cd CASTOR2; time git pull
...
real 0m0.273s
user 0m0.092s
sys 0m0.124s

$ rm -rf CASTOR2
$ cvs -d ':pserver:anonymous@isscvs.cern.ch:/local/reps/castor' co CASTOR2
$ du -sh CASTOR2
58M CASTOR2
$ cd CASTOR2; time cvs up
...
real 0m2.700s
user 0m0.156s
sys 0m0.136s

In plain words the storage size of all the versions going back to 1999 (37MB) is smaller than the workspace (58MB).

Monday, August 10, 2009

Wrong LDAP search filters?

Today, I tried to replace the deprecated LDAP functions, related to this post. As I have never did anything with LDAP (new skill ;) ), first I wanted to get familiar with it. Google Code Search helped a lot, however, it turned out that the problem might not be related to the deprecated functions, because the LDAP API developers still maintain them, for backward compatibility. What I did in this context was:
  • Created the ldap_facade module, to hide the calling details of the deprecated functions (which function is called in fact, some of the always-the-same LDAP API function parameters, etc.).
  • Added the ldap_facade_init and ldap_facade_search functions only, and left the rest (may be added later, if change is needed there).
To check if I am able to connect to the LDAP server, and execute the search appeared in the log attached to the bug, I installed the Apache LDAP browser plugin for Eclipse, created a connection to the server, and copied the search filter. Here, I made an observation: the plugin did not accept the search filter, because it started with a space...

I checked it in the code, the hard-wired template filters really started with a single space. If I removed it, then the test command went further, and tried to do some SRM operations.

So, the questions:
  • Is it true that the LDAP search filter string cannot start with a white space?
  • If it is true, then is it a bug in the code? Other filters start with space as well.
  • If it is a bug, why could not we see them so far?

Friday, August 7, 2009

Deprecated LDAP API functions in GFAL

On SLC5, certification of the following patch failed:

https://savannah.cern.ch/patch/?3119

The error can be reproduced in the following way:

  1. Check out and build org.glite.data with GFAL (instuctions here). The next steps are continuation of the linked build process.
  2. Build the org.glite.data.dm-util package:

    cd ~/org.glite.data.dm-util/build
    make install

  3. Execute the following commands:

    cd src
    export LCG_GFAL_INFOSYS=lcg-bdii.cern.ch:2170
    ./lcg-cr -d srm://lxb7608v1.cern.ch/dpm/cern.ch/home/dteam/test_rm_02 -D srmv2 -vv /etc/redhat-release
(about the LCG_GFAL_INFOSYS: see this). The command results:

Using grid catalog type: lfc
Using grid catalog : (null)
Checksum type: None
[INFO] BDII server: lcg-bdii.cern.ch:2170/o=grid
[INFO] BDII filter: (| (GlueSEUniqueID=lxb7608v1.cern.ch) (& (GlueServiceType=srm*) (GlueServiceEndpoint=*://lxb7608v1.cern.ch:*)))
[INFO] Trying to use BDII: lcg-bdii.cern.ch:2170/o=grid (timeout 60)
[BDII][ldap_search_st][] lcg-bdii.cern.ch:2170 > Bad search filter
[GFAL][bdii_query_send][EINVAL] No accessible BDII
lcg_cr: Invalid argument

That is the same result that can be seen in the bug. After analysis, we found that several LDAP C API functions got deprecated on this platform. Ideas at this point:

- We may as well create a facade to the LDAP API.
Pros: unit-testability, without real LDAP servers (mock finctionality). Future LDAP API changes are isolated.
Cons: The LDAP API does not change frequently, so probably the workload is not worth.

TODO-s:

- Change the deprecated LDAP functions
- Create a regression test for the bug
- Re-send the patch for certification.

small fixes: transfer-agents and transfer-cli

There were a couple of other updates:

glite-data-transfer-agents v3.4.1-1

Really fixing #47507: SRMv2.2 as default.
This is a two character fix, which finally made it to the release.

If you cannot wait then there are some workarounds. The original idea of adding
FTA_GLOBAL_ACTIONS_SRMVERSION="2.2"
worked only for the VO agents, so one also has to add the following lines to the Yaim config:
FTA_TYPEDEFAULT_SRMCOPY_ACTIONS_SRMVERSION="2.2"
FTA_TYPEDEFAULT_URLCOPY_ACTIONS_SRMVERSION="2.2"

... or simply submit a full SURL to FTS including the endpoint of the SRMv2 server.


glite-data-transfer-cli v3.7.1-1
  • Fixing a regression: overwrite flag (-o) should not require an argument, which problem was introduced in 2008 March as a regression.
  • Updated the test suite to the latest FTS service.

transfer-url-copy version 3.2.1-rel2 released

The affected module is: org.glite.data.transfer-url-copy.

The changes are:
  • Warning removal
  • The result of the code review implemented partially: descriptive enum-s to signal the actual checksum checking use case.
The new release tag is:

glite-data-transfer-url-copy_R_3_2_1_2

The functionality and the behaviour have not been changed.

See the component in the CVS.

Building GFAL

OK, it's time to build this package. The selected platform is SCL5 (Scientific Linux @ CERN 5), because the world will soon move to this platform, and there are some problems there.

The build is done by using ETICS that is a special software configuraion management system funded by the European Commission, CERN, etc. The steps what I did were (ETICS has already been installed):

mkdir GFAL
cd GFAL
etics-workspace-setup
etics-checkout --ignorelocking --continueonerror --project-config glite_branch_3_2_0_dev org.glite.data
etics-build --continueonerror org.glite.data

GFAL is the (empty) project directory, I will refer to it as $WORKSPACE. This series of commands builds the whole org.glite.data suite.

Great, the suite build was successfull, let's see the GFAL build. The best if we start working under e-env:

cd $WORKSPACE
org.glite.data/bin/e-env

Then:

cd
$WORKSPACE/org.glite.data.gfal/build
make install

We should have a couple of executables with names gfal_test*. Let's execute the tests. During the tests, I had a valid proxy credential in the LCG Deployment Team Virtual Organization (dteam).

./gfal_version

Returned:

GFAL-client-1.11.8-1

touch a;
./gfal_teststat a

Returned:

stat successful
mode = 100644
nlink = 1
uid = 1000
gid = 1000
size = 0

COOL! And then the other test commands.

Thursday, August 6, 2009

Checksum code review

The latest FTS development was adding checksum support to verify if the data has been transferred properly, and the source/destination files has not been altered. The related requirement specification can be found in the wiki:

FtsChecksums

The feature has been transferred to the package certification process.

Today, we had a code review with Rosa and Ákos, we reviewed the checksum-related code. After a discussion about some fancy C++, Boost, STL features + some potential Google interview questions :), we had two findings that will be changed:

- The system determines the actual checksum use case and stores it in bool variables - enum-s should be used instead, with descriptive names.
- The asynchronous SRM operations called synchronously, so the same send/poll pairs go always together in the code. Should be merged into one function that encapsulates the new exponential backoff functionality as well.

We found no bugs and the changes above will not modify the behaviour, so we do not need a new release now.

Wednesday, August 5, 2009

GFAL

Today, I have started to get familiar with the GFAL package. I had a discussion with Rémi Mollon who is the actual package owner, and will stop maintaining the package in October.

GFAL provides a POSIX-compliant C and Java API to access data in grid environment. An important package is based on GFAL: the LCG Util package, that is maintained together with GFAL, in fact.

We had an idea to merge some components of FTS and GFAL, especially the SRM access layer. In FTS, it is the org.glite.data.srm-utils-cpp component, written in C++. In GFAL, it is written in C. Re-implementing org.glite.data.srm-utils-cpp on top of the GFAL SRM access layer would be benefical:

- Maintaining would be much easier
- There are several features that are under development now, and must be implemented in both packages (for instance, exponential backoff for failed or long requests, see FTS Request; there is a similar GFAL Request).
- org.glite.data.srm-utils-cpp has good unit test coverage -> the appropriate GFAL code would also be tested better if we execute the FTS test suite.

TODO: I need to check the feasibility of the above idea before proposing anything. So, in the next days, I will do the following on GFAL side:

- Organize a coffee with the responsibles from the experiments :)
- Set up a GFAL development environment
- Compile, run the tests, solve the complications coming from certificates, security, etc.
- Re-implement SrmLs on top of the GFAL SRM access layer (this is the most heavily used SRM call in FTS)
- Execute the FTS tests.

LCG Util


Today, I have started to get familiar with the LCG Utils package. I had a discussion with Rémi Mollon who is the actual package owner, and will stop maintaining the package in October.

First, I wanted to explore the users of the project, they are the ATLAS and LHCb CERN experiments.

Then, we went through the dependencies between LCG Utils and the rest of the gLite project. The most important is:

LCG Utils is main end user command line tool for data management provided by LCG. Implements high level file management tools, like copy files, etc.
.

It depends on GFAL.

TODO: identify the main users and stakeholders on the experiment side. Have a coffee with them :)