jPHPsetistats project

About

..the project

jPHPsetistats is a tool written in PHP to get SETI@home user statistics and pretty print them on websites. For example, it's used to generate the following image:

Seti@home statistics for Jo[zeRezo]

Data is read from the SETI@home website in an XML file, is cached then the image is generated.

..the goal

We were using Royale's script to get our SETI@home stats in our online signatures, but it was too slow, because of :

The script retrieve the HTML page then parse it with regular expressions. It is really unoptimised both for the page size (wasted bandwidth) and the CPU charge.
Results are not cached. Very disappointing, considering some forums tries to get the same image several times in a same page.
The original server is naturally under charge and its internet connection limited.

At that time, I had just learned how to use XML in PHP, and I was trying to find some application I could hack with. The old script was annoying because it was slowing down some forums for every readers. And I found the SETI@home guys has done an XML access to user statistics, so I thought it was a good idea to redo the script from scratch using the XML and handling its own cache, then putting it on a fast server, resolving in one step the three issues.

Usage

If you are a simple user, get your actualized image using URL http://iodream.free.fr/projects/jPHPsetistats/setipng.php?email=your@email, changing your@email with your real email address (the one you registered with as a SETI@home user).
If you are a PHP website author, extract the files from the download section into a directory of your website, then go test the setipng.php script in this directory. Have a look at the various parameters at the top of the main script class, there are several features you can control from there. You can also use the one I put online (see the link just above), but if it generates too much traffic I'll have to remove it.
If you are a PHP programmer, just include the project files into your own project. The examples are simple and clear, and the code is IMHO well commented. So feel free to hack around, but don't take my responsability here.

Implemented features

Image generation.
Simple error reporting.
Efficient and configurable data caching.
Access control by email checking.
…look into the changelog (in the header of the main class file) for more…

Next things to do

I think I successfully resolved the issues exposed at the “About ..the goal” paragraph. However I think I can still add some interesting features and improvements. I won't code them all, it's only a wishlist.

Generate HTML tab with statistics, like SieGeL's pstats do, but only with user stats, I don't want to reimplement SETI@home website.
Compression (gzipping) of local data files, the trade of less space usage versus raised server charge. I don't think it's worth the trade, because XML files are not really huge.
Gather access stats from setipng.php script. So we can know who is using the script, how many times, and if it isn't being under abuse.
Rewrite the process in order to support multiple simultanneous requests on the same user stats (see “Support for multiple simultanneous requests” below).
…your suggestions…

Download

jPHPsetistats in Tar/Gz format (6 125 bytes).
jPHPsetistats in Zip format (6 626 bytes).

License

This project is subject to the Lesser GNU Public Licence, which text can be found at http://www.gnu.org/licenses/lgpl.txt. Feel free to propose your ideas ans contributions (send me an email), I'll try to incorporate them.

XML DTD

I worked with the XML DTD provided by the folks at the SETI@home website. Unfortunately, when you check their XML output and DTD, you can see something is wrong. First, the rankinfo element, though declared in the header's commentary, is not defined in the DTD body. Second, the groupinfo founder element is defined, but seems to be missing from the XML output. I can't add myself the missing groupinfo founder element, but I can correct the DTD to include the rankinfo element: the modified DTD is here (userstats-corrected.dtd).

Support for multiple simultanneous requests

In the current release, the scripts run pretty well, astonishingly fast compared to Royale's original script. The bottleneck was to retrieve the data (HTML is larger than XML) then to process it (XML is faster than HTML grabbing through regular expressions).

But when you look the way it's used (inside forum's signatures), it's not unusual for the same image to be requested many times over the same page. In this case, the script is doing the same thing for every image; it is multithread-safe, but it isn't the most efficient way to do.

We can save server resources by doing the stuff on the first request, then directly drop the ready-to-go result to the "following simultanneous requests". So I tried to formalise (in a non-formalised language, but it should be easy to understand) how the current process works, and I crafted a new process to be multirequest-safe. I think the new process is well designed (I hand-tested it to verify safety, liveness, fairness, deadlock free and starvation free; I'm used to this kind of work, look at my SemServ project) but there is no formal proof of it, and I'll have to code it to see if I'm right or not.

========
= Current process (not MR-safe):
========

START.

READ.
READ1. If file found in cache and not too old (min wait interval), go to PARSE.
READ2. If web is available and data has been retrieved, go to PARSE.
READ3. If file found in cache and not too old (flush cache interval), go to PARSE.
READ4. FAILED.

PARSE. Parse XML data.

WRITE. If XML data is not from cache (ie is from from web), write into cache.

END.


========
= New process (MR-safe):
========

START.

READ.
READ1. If named temporary file not found, go to READ5.
READ2. If named temporary file too old, remove it then go to READ5.
READ3. Wait (yielding CPU) while named temporary file is still there (and time not elapsed).
READ4. If time elapsed, remove named temporary file.
READ5. If file found in cache and not too old (min wait interval), go to PARSE.
READ6. If web is not available or data has not been retrieved, go to READ8.
READ7. Create exclusive named temporary file, if failed then set data like coming from cache and go to PARSE.
READ8. If file found in cache and not too old (flush cache interval), go to PARSE.
READ9. FAILED.

PARSE. Parse XML data.

WRITE.
WRITE1. If XML data is from cache (or exclusive named temporary file has not been created by us), go to END.
WRITE2. Fill the temporary named file with XML data.
WRITE3. Rename (or copy then remove) the temporary named file to the cache file.

END.

Thanks

The SETI Institute holds the SETI project started by the NASA.
The SETI@home team provides the real first distributed computing software.
My friend Royale is administrator of the zeRezo server, had the first idea about this script, and is my primary tester.

Index

Projects