Jump to content   Jump to menu   Go back to index  Go back to zeRezo.com
[Temporary logo jo.zeRezo.com]

Index

Projects


Your browser has no CSS support, so you can ignore the following links :
Show the printer-friendly version of the page. (by Javascript)
Show the full version of the page. (by Javascript)

jPHPsetistats project

About

..the project

jPHPsetistats is a tool written in PHP to get SETI@home user statistics and pretty print them on websites. For example, it's used to generate the following image:

Seti@home statistics for Jo[zeRezo]

Data is read from the SETI@home website in an XML file, is cached then the image is generated.

..the goal

We were using Royale's script to get our SETI@home stats in our online signatures, but it was too slow, because of :

  1. The script retrieve the HTML page then parse it with regular expressions. It is really unoptimised both for the page size (wasted bandwidth) and the CPU charge.
  2. Results are not cached. Very disappointing, considering some forums tries to get the same image several times in a same page.
  3. The original server is naturally under charge and its internet connection limited.

At that time, I had just learned how to use XML in PHP, and I was trying to find some application I could hack with. The old script was annoying because it was slowing down some forums for every readers. And I found the SETI@home guys has done an XML access to user statistics, so I thought it was a good idea to redo the script from scratch using the XML and handling its own cache, then putting it on a fast server, resolving in one step the three issues.

Usage

Implemented features

Next things to do

I think I successfully resolved the issues exposed at the “About ..the goal” paragraph. However I think I can still add some interesting features and improvements. I won't code them all, it's only a wishlist.

Download

jPHPsetistats in Tar/Gz format (6 125 bytes).
jPHPsetistats in Zip format (6 626 bytes).

License

This project is subject to the Lesser GNU Public Licence, which text can be found at http://www.gnu.org/licenses/lgpl.txt. Feel free to propose your ideas ans contributions (send me an email), I'll try to incorporate them.

XML DTD

I worked with the XML DTD provided by the folks at the SETI@home website. Unfortunately, when you check their XML output and DTD, you can see something is wrong. First, the rankinfo element, though declared in the header's commentary, is not defined in the DTD body. Second, the groupinfo founder element is defined, but seems to be missing from the XML output. I can't add myself the missing groupinfo founder element, but I can correct the DTD to include the rankinfo element: the modified DTD is here (userstats-corrected.dtd).

Support for multiple simultanneous requests

In the current release, the scripts run pretty well, astonishingly fast compared to Royale's original script. The bottleneck was to retrieve the data (HTML is larger than XML) then to process it (XML is faster than HTML grabbing through regular expressions).

But when you look the way it's used (inside forum's signatures), it's not unusual for the same image to be requested many times over the same page. In this case, the script is doing the same thing for every image; it is multithread-safe, but it isn't the most efficient way to do.

We can save server resources by doing the stuff on the first request, then directly drop the ready-to-go result to the "following simultanneous requests". So I tried to formalise (in a non-formalised language, but it should be easy to understand) how the current process works, and I crafted a new process to be multirequest-safe. I think the new process is well designed (I hand-tested it to verify safety, liveness, fairness, deadlock free and starvation free; I'm used to this kind of work, look at my SemServ project) but there is no formal proof of it, and I'll have to code it to see if I'm right or not.

========
= Current process (not MR-safe):
========

START.

READ.
READ1. If file found in cache and not too old (min wait interval), go to PARSE.
READ2. If web is available and data has been retrieved, go to PARSE.
READ3. If file found in cache and not too old (flush cache interval), go to PARSE.
READ4. FAILED.

PARSE. Parse XML data.

WRITE. If XML data is not from cache (ie is from from web), write into cache.

END.


========
= New process (MR-safe):
========

START.

READ.
READ1. If named temporary file not found, go to READ5.
READ2. If named temporary file too old, remove it then go to READ5.
READ3. Wait (yielding CPU) while named temporary file is still there (and time not elapsed).
READ4. If time elapsed, remove named temporary file.
READ5. If file found in cache and not too old (min wait interval), go to PARSE.
READ6. If web is not available or data has not been retrieved, go to READ8.
READ7. Create exclusive named temporary file, if failed then set data like coming from cache and go to PARSE.
READ8. If file found in cache and not too old (flush cache interval), go to PARSE.
READ9. FAILED.

PARSE. Parse XML data.

WRITE.
WRITE1. If XML data is from cache (or exclusive named temporary file has not been created by us), go to END.
WRITE2. Fill the temporary named file with XML data.
WRITE3. Rename (or copy then remove) the temporary named file to the cache file.

END.

Thanks



Current page : http://jo.zerezo.com/projects/jPHPsetistats.php .
This page has been modified on 1970-01-01 at 01h00.
This page was run in 0.858 ms, day 2024-11-12 at 14h10m07s.
It conforms to XHTML 1.0 Strict, XHTML 1.1 and CSS2 recommendations of the World Wide Web Consortium.
[Valid XHTML 1.0 Strict!] [Valid XHTML 1.1!] [Valid CSS!]