Showing posts with label gfal. Show all posts
Showing posts with label gfal. Show all posts

Tuesday, December 21, 2010

GFAL / LCG_Util 1.11.16 release


There has been no blog post for almost half a year. It does not mean that nothing has happened since than. We devoted enormous effort to some background works (automated test bed, nightly builds and test runs, change to EMI era from EGEE, etc.). We will test the tools and the procedures in the first months of 2011, analyze if they have added value and how they could be improved. As for the visible part, we released GFAL/LCG_Util 1.11.16 (finally) in November - see the release notes. Better later than never!

Thursday, February 4, 2010

Thanks for everything, Ákos!

Ákos has left the LHC Computing Grid projects. He has coordinated the development and support of the EGEE/LCG grid data management systems, and lead the FTS project since 2002. As he was always short, straight and to-the-point, following his style, I simply should not write more than

that :)

Your comments here are always welcome!

Friday, December 18, 2009

As wrapping up the year we have a GFAL/lcg_util v1.11.13, FTS 2.2.4 and DPM/LFC v1.7.4.

Details are in the data management relese notes.

This FTS snapshot already has a secure preview of the administrative web interface:

Friday, December 11, 2009

GFAL/lcg_util in SVN

GFAL and lcg_util is the first candidate for migration to SVN.

The new repository is available and I have managed to run remote ETICS builds on all platforms using the glite-data-dm-util_R_1_11_12_3 and glite-data-gfal_R_1_11_12_3 configurations.

While SVN feels much better while renaming files, there is a certain degree of complexity in tagging



svn copy https://svn.cern.ch/reps/glitedm/trunk/gfal \
https://svn.cern.ch/reps/glitedm/tags/glite-data-gfal_R_1_11_12_3 \
-m glite-data-gfal_R_1_11_12_3

compared to CVS

    cvs tag glite-data-gfal_R_1_11_12_3

or git

   git tag glite-data-gfal_R_1_11_12_3

Wednesday, November 25, 2009

releases

My magic URL for upcoming data management releases points into Savannah.

We usually create a 'patch' (i.e. release) when we have a draft idea of what should
go into a release. For example FTS 2.2.3 and GFAL 1.11.13 are created with all the
bugs attached that we intend to implement by the release date.

The first noteworthy state is 'Ready for certification', when the developers have
finished their work and there are already RPMs created. At this point we usually
upload the packages into our Release Candidate repository for the convenience of
early testers.

The next noteworthy state is 'Certified', when the release has passed all regression
tests and the new features seem to be working.

After this state there is a few weeks of testing (i.e. waiting if there is any unexpected
behaviour) in the pre-production testbed (PPS) and then comes the gLite release.

Thursday, November 19, 2009

Unifying LGC_Util and GFAL version numbers

A usual source of confusion: which LCG_Util version requires which GFAL library version. Almost after each release somebody installed the wrong packages somewhere. Now, the confusion is over: from the next release on, we always release those two components together, under the same version numbers (but with different tag prefix, certainly). We will create the first such a release pair this week, with version numbers 1.11.12-1 (the next GFAL version number). It means, that there will be a gap in LCG_Util case: version 1.7.8-1 will jump up to 1.11.2-1. Keep tuned.

Wednesday, November 4, 2009

Debugging tricks

When you want to debug the command-line tools of the projects, you find immediately that the commands are in fact shell scripts. They are libtool wrapper files actually and set up several things before calling the binaries themselves. You need to invoke gdb in the following way:

libtool gdb _command_

From this point, everything should work as usual.

Next, you may run into the following trouble when debugging:
[Thread debugging using libthread_db enabled]
Error while reading shared library symbols:
Cannot find new threads: generic error
For me, it occured on SLC5, when the code used the dynamic linking library (dlopen, etc.). You can eliminate the problem by linking libpthread directly to your executable. For example:

lcg_del_LDADD = $(COMMON_LIBS) -lpthread

Good luck!

Tuesday, November 3, 2009

GFAL and LCG_Util test bed developments

Currently, the GFAL test suite contains integration and regression tests only. The certification process develops and executes those tests. We need something more and flexible: basically, we need unit/white box tests that checks GFAL code validity until the boundaries of its dependencies. We started to create unit test suite for both GFAL and LCG_Util, for debugging and internal validation purposes. The unit test suite requires some redesign for the code (redesign for testability). The pattern behind is dependency injection. The code covered by unit tests will never call external library code directly (except for the standard C library functions), they will do it by replaceable function pointers. In production, the pointers point to the original functions, however, a white-box test can replace a set of functions for dummy ones simulating a scenario.

We do not change the whole GFAL code, to avoid regression. We change the code gradually as we solve the Savannah tasks. All the appropriate Savannah tasks will go with unit tests as well, and only the code affected by a task will be changed. What we will get is a "hybrid": in some cases, functions will be called directly, in some cases indirectly. In ideal case, we reach full unit test coverage when the function call methods get unified.

We will demonstrate the power of the unit test suite with solving RFE: Extra parameter in lcg-cp for a better TURL construction.

After unit tests, we have to have a controllable regression/integration test suite. As we cannot control the certification test bed and it is tightly bound to the certification test environment, we create our own test bed better integrated into the development environemnt. We do it by copying the cert. tests into our source tree, adapting to our environment, then we start adding tests covering our tasks and purposes.

Monday, September 14, 2009

GFAL release 1.11.11-1

The release contains the following minor fixes:

- IPv6 compliance
- Manual page update

The release tag is:

glite-data-gfal_R_1_11_11_1

IPv6 compliance in FTS and GFAL

There were several Savannah tasks targeting IPv6 compliance, they have been resolved now. The list of the related tasks:

#41844: IPv6 bug; LCG-utils client functionality immediately broken by IPv6

#41278: IPv6 bug: non compliant address in source code (hard coded IPv4: 127.0.0.1)
#41585: [FTA] IPv6 bug: non compliant name resolving function (gethostbyname_r)
#41586: [FTA] IPv6 bug: non compliant name resolving function in source code (gethostbyname_r)
#41278: IPv6 bug: non compliant address in source code (hard coded IPv4: 127.0.0.1)

See the resolution details in the comments of the individual tasks. Basically, the general solution was:

- remove dependency on the pre-compiled gSoap library
- take stdsoap2.c directly from the gSoap sources
- compile the above file with WITH_IPV6 defined

The release tags including the IPv6 compliance are:

glite-data-srm-api-c_R_1_1_0_12
glite-data-srm2-api-c_R_2_2_0_6
glite-data-transfer-cli_R_3_7_2_1
glite-data-transfer-agents_R_3_4_2_1
glite-data-gfal_R_1_11_11_1

It lists the affected components as well, actually they are the ones that implement SOAP communication with gSoap. The IPv6 functionality is encapsulated into gSoap completely, so we did not have to change the implementation, it was only configuration issue.

Thursday, August 27, 2009

Fixes in GFAL and lcg_util

lcg_util: should use -1 length for gridftp/CKSM

In GFAL, we always calculate checksums for the whole file (if it is needed). However, the corresponding Globus API function (globus_ftp_client_cksm) was used with wrong length parameter: it was 0 (calculate checksum for 0-length data) instead of -1 (calculate until the end of the file). This error also pointed out an inconsistency in DPM, it interpreted this value a bit differently than in the API specification above.

GFAL: shall handle abreviated checksum names


It is a workaround on how DPM calls the checksum algorithms. Actually, the endpoints should follow the GridFTP specification, but DPM has already implemented a different name set. It will be changed in DPM as well, however, it may not be deployed everywhere soon. So, internally and temporarily, GFAL detects and converts the DPM names to the GridFTP conventions.

The fixes have been sent for integration and certification.

Monday, August 24, 2009

GFAL activities in Savannah

The following GFAL bugs have been certified:

Unable to get a tURL in a full space
See the description and the solution here.

GFAL: Problems with LDAP queries on SL5
The LDAP filters contained spaces between the filter parts and the operators. It seems that it is not allowed, but some GFAL releases (on different platforms probably) have worked with spaces.

In the same time, we introduced a new activity:
updating depreceted LDAP calls

OpenLDAP still continues supporting deprecated functions, so the priority is low.

Monday, August 10, 2009

Wrong LDAP search filters?

Today, I tried to replace the deprecated LDAP functions, related to this post. As I have never did anything with LDAP (new skill ;) ), first I wanted to get familiar with it. Google Code Search helped a lot, however, it turned out that the problem might not be related to the deprecated functions, because the LDAP API developers still maintain them, for backward compatibility. What I did in this context was:
  • Created the ldap_facade module, to hide the calling details of the deprecated functions (which function is called in fact, some of the always-the-same LDAP API function parameters, etc.).
  • Added the ldap_facade_init and ldap_facade_search functions only, and left the rest (may be added later, if change is needed there).
To check if I am able to connect to the LDAP server, and execute the search appeared in the log attached to the bug, I installed the Apache LDAP browser plugin for Eclipse, created a connection to the server, and copied the search filter. Here, I made an observation: the plugin did not accept the search filter, because it started with a space...

I checked it in the code, the hard-wired template filters really started with a single space. If I removed it, then the test command went further, and tried to do some SRM operations.

So, the questions:
  • Is it true that the LDAP search filter string cannot start with a white space?
  • If it is true, then is it a bug in the code? Other filters start with space as well.
  • If it is a bug, why could not we see them so far?

Friday, August 7, 2009

Deprecated LDAP API functions in GFAL

On SLC5, certification of the following patch failed:

https://savannah.cern.ch/patch/?3119

The error can be reproduced in the following way:

  1. Check out and build org.glite.data with GFAL (instuctions here). The next steps are continuation of the linked build process.
  2. Build the org.glite.data.dm-util package:

    cd ~/org.glite.data.dm-util/build
    make install

  3. Execute the following commands:

    cd src
    export LCG_GFAL_INFOSYS=lcg-bdii.cern.ch:2170
    ./lcg-cr -d srm://lxb7608v1.cern.ch/dpm/cern.ch/home/dteam/test_rm_02 -D srmv2 -vv /etc/redhat-release
(about the LCG_GFAL_INFOSYS: see this). The command results:

Using grid catalog type: lfc
Using grid catalog : (null)
Checksum type: None
[INFO] BDII server: lcg-bdii.cern.ch:2170/o=grid
[INFO] BDII filter: (| (GlueSEUniqueID=lxb7608v1.cern.ch) (& (GlueServiceType=srm*) (GlueServiceEndpoint=*://lxb7608v1.cern.ch:*)))
[INFO] Trying to use BDII: lcg-bdii.cern.ch:2170/o=grid (timeout 60)
[BDII][ldap_search_st][] lcg-bdii.cern.ch:2170 > Bad search filter
[GFAL][bdii_query_send][EINVAL] No accessible BDII
lcg_cr: Invalid argument

That is the same result that can be seen in the bug. After analysis, we found that several LDAP C API functions got deprecated on this platform. Ideas at this point:

- We may as well create a facade to the LDAP API.
Pros: unit-testability, without real LDAP servers (mock finctionality). Future LDAP API changes are isolated.
Cons: The LDAP API does not change frequently, so probably the workload is not worth.

TODO-s:

- Change the deprecated LDAP functions
- Create a regression test for the bug
- Re-send the patch for certification.

Building GFAL

OK, it's time to build this package. The selected platform is SCL5 (Scientific Linux @ CERN 5), because the world will soon move to this platform, and there are some problems there.

The build is done by using ETICS that is a special software configuraion management system funded by the European Commission, CERN, etc. The steps what I did were (ETICS has already been installed):

mkdir GFAL
cd GFAL
etics-workspace-setup
etics-checkout --ignorelocking --continueonerror --project-config glite_branch_3_2_0_dev org.glite.data
etics-build --continueonerror org.glite.data

GFAL is the (empty) project directory, I will refer to it as $WORKSPACE. This series of commands builds the whole org.glite.data suite.

Great, the suite build was successfull, let's see the GFAL build. The best if we start working under e-env:

cd $WORKSPACE
org.glite.data/bin/e-env

Then:

cd
$WORKSPACE/org.glite.data.gfal/build
make install

We should have a couple of executables with names gfal_test*. Let's execute the tests. During the tests, I had a valid proxy credential in the LCG Deployment Team Virtual Organization (dteam).

./gfal_version

Returned:

GFAL-client-1.11.8-1

touch a;
./gfal_teststat a

Returned:

stat successful
mode = 100644
nlink = 1
uid = 1000
gid = 1000
size = 0

COOL! And then the other test commands.

Wednesday, August 5, 2009

GFAL

Today, I have started to get familiar with the GFAL package. I had a discussion with Rémi Mollon who is the actual package owner, and will stop maintaining the package in October.

GFAL provides a POSIX-compliant C and Java API to access data in grid environment. An important package is based on GFAL: the LCG Util package, that is maintained together with GFAL, in fact.

We had an idea to merge some components of FTS and GFAL, especially the SRM access layer. In FTS, it is the org.glite.data.srm-utils-cpp component, written in C++. In GFAL, it is written in C. Re-implementing org.glite.data.srm-utils-cpp on top of the GFAL SRM access layer would be benefical:

- Maintaining would be much easier
- There are several features that are under development now, and must be implemented in both packages (for instance, exponential backoff for failed or long requests, see FTS Request; there is a similar GFAL Request).
- org.glite.data.srm-utils-cpp has good unit test coverage -> the appropriate GFAL code would also be tested better if we execute the FTS test suite.

TODO: I need to check the feasibility of the above idea before proposing anything. So, in the next days, I will do the following on GFAL side:

- Organize a coffee with the responsibles from the experiments :)
- Set up a GFAL development environment
- Compile, run the tests, solve the complications coming from certificates, security, etc.
- Re-implement SrmLs on top of the GFAL SRM access layer (this is the most heavily used SRM call in FTS)
- Execute the FTS tests.