W3Perl developement 

Next release
3.02
NOV
23
Release 3.01
23.11.2007 | 3.01 | Auteur : Domisse

New release ! 3.01 is out ! Almost three months since 3.00, time is spenting so fast ... time to update the website !

Only few changes over the previous days have been applied :
- A Search function have been added in the directory stats.
- Windows Virusscan found fixperlpath.pl as a virus so installation was incomplete ! I've changed few lines, now it works fine (line with open(FILE,$0) was seen as a virus signature).

New features from v3.00 :

loupe - Yearly stats
loupe - RSS stats
loupe - RSS Feed available
- Display day of week
loupe - Added graphs in daily, weekly and monthly summaries
loupe - List of hosts for each page
loupe - Added page's count and last date for each host
loupe - An AJAX toggle menu for hourly and proxy stats
loupe - An AJAX search tool to find hosts/pages/users and proxy with ease.
loupe - A Javascript popup on graphics to get instant summaries

NOV
19
Last improvements
19.11.2007 | Last improvements | Auteur : Domisse

Last few minor changes before releasing 3.01.

- There was a shift between request/accesses and hosts in yearly graphics.
- Wrong CSV link in hourly stats
- Konqueror doesn't work with sortable table. Don't know why as there is no javascript error console :(
- Use 127.0.0.1 instead of domainname in default configuration file. Will prevent link to domainname.com website !
- Daily traffic table does not display from beginning of logfile.
- Adding year in the frame menu
- Cron-w3perl now display right cpu time if computed across two days
- I've added the number of access in the selected directory graphs.
- Finally add the year script in the script master loop.
- Popup link can now have a target frame
- A default configuration for apache is provided
- Popup Page Graph was wrong if hole in date intervalle was found.
- List of hosts which have red a page was not updated in incremental mode

Things to do :
- Add popup over directories graphics.

Ferran Sunyer, a spanish student working in Munich, contact me in order to get help about sessions stats. He need to know how long people are reading pages and the path people are using to navigate. Faced with different websites, session stats will be able to show him which one is the more efficient.

NOV
16
3.007 and Planning
16.11.2007 | 3.007 and Planning | Auteur : Domisse

3.007 will be the last development package before the official 3.01 release. Hope to make the package before the end of the month.
Yearly stats have been improved, will display current year also.

Time to have some plans for the next release :
- Tags script to allow people without logfile to use W3Perl. Should be based on the same method as 'tag counter' : a piece of javascript code inside some of your pages, but will write to a log file which can be read by W3Perl.
- Add back support for Domino logfiles.
- Improved RSS stats
- Try to detect hackers and email spam.
- Use a better tool to build PDF files.
- Speed reverse dns by making multiple DNS queries
- Adding list of bounced mails
- Faster save with the highest level of precision
- Solve problem with wget
- Remove those frames
- Session sorted by referer
- Search engine on strings, not only on words
- More AJAX tools
- Robot daily graph
- Show current month

NOV
14
Infobulle
14.11.2007 | Infobulle | Auteur : Domisse

I've added a piece of Javascript from Jarodxxx which display a popup over an image. This allow to view bigger image without the need to open a new web page.

Small improvements :
- A new script is available to compute yearly stats.
- Session script now report an error if you try to compute stats older than available.
- Last visit have been added in the robot stats.
- Browsers graphs show also a popup windows which display the first 3 most successfull ones.
- Robots stats are now split in two pages, one with full list, the other with topten (to avoid too long list).

NOV
11
3.006 bugs
11.11.2007 | 3.006 bugs | Auteur : Domisse

First bug report from Hans-Ruodi on 3.006 :
- running install.pl -x will run the installer, instead of displaying default values and quit.
- running cron-w3perl.pl -a -e works ! .... I've added a warning because you had to choose between init and update, both can't be run at the same time !
- Hans-Ruodi uses config-ftp as a template to build its own web server configuration file. Config-ftp is about ftp servers and not web servers.....will provide a default configuration file for apache. Now I'm checking the date format of the logfile, if incorrect, will exit on error.
- Robots are being listed in the browsers stats althrough they have been excluded with configuration file. Reverse dns was forced to be disabled and there was a mismatching between IP and hostname.
- 'Useless files' show index.php files but shouldn't. Well, this part of the script is deisabled by default (still beta), need to make more tests.

NOV
10
3.006
10.10.2007 | 3.006 | Auteur : Domisse

v3.01 is closer....3.006 include support for input/outgoing bandwidth fields in Apache combinedio logfile format. Also a bug about saving robots file have been fixed. Popup windows over page's requests graph have been added.

Time to test mail stats now ... Emails are now put together based on domains name rather than countries, allowing smaller html pages. Thinking about spammer detection ... for an email address, lots of unique reciepents and nothing else (of course, mailing list should be take into account).

NOV
06
Tests
06.11.2007 | Tests | Auteur : Domisse

Hans-Ruodi Burch send me an email about problem with PDF building. The scripts report errors and stop but the PDF was were. In fact, he uses the latest HTMLdoc 1.9 (which is still an alpha) which has very limited css rendering. Best is to use 1.8.27.

The reverse dns file is now saved when a day have been processed. If you stop the processing before its completion, reverse dns do not need to restart from scratch.

Support for %I and %O have been added in the scripts. If available in the logfile (combinedio format), the input/outgoing traffic will be printed (this include headers, status line and the request itself and not only the size of the file requested).

NOV
03
W3Perl download
03.11.2007 | W3Perl download | Auteur : Domisse

October is over....let's see how many times w3perl have been downloaded since January ?
Well, let's start the stats...

Of course, you need to take care to reject robot which try to download the package. Also many people download big files with a download manager which open many requests on small part of the file. If you don't remove the download manager, the numbers will be wrong (too large by many factor). 'Commercial' package often 'forgot' this basic rule and so are able to promote their software with a great number of download !

About numbers, well, nearly 5 fives times increase since January...not too bad. But most come about new ways to distribute the package (rpm/debian/apache-windows).

W3Perl Download Stats

Forget to remove a test in the code....so it fail to load reverse dns hosts. Also, I've added a new column 'Percentage' in the Host area, more easy to knwo how much IP have not been resolved now.
Combinedio logformat from Apache have been added in the admin interface...but not yet in the scripts.

New idea :
- To increase reverse dns speed, process first log lines then use JDresolve to do multiple dns query simultaneously (5-10 times faster).

NOV
02
Debug
02.11.2007 | Debug | Auteur : Domisse

Today is dedicated to bug fixes ...

- first one being the use of rss.gif instead of rss.png in 3.005.
- Second was to sort evrything in ascending order, string has to be sort in descending order.
- Third was to remove the rss.xml file when starting from scratch. Best is to overwrite the file, avoiding an error on requests, while the stats are being completed.
- Fourth was a bug in displaying weekly graph legend
- Next one was about displaying the number of items in the traffic area. It was set to 0.1, now use the maximum value/100.
- Page occurence in traffic was only set to non-html files, preventing the table to be truly sortable.
- Daily filetype occurence for html files was wrong when running from scratch (only for the last day parsed)
- Forget to delete old PDF (older than $nbdays) to free some disk space
- When computing pages occurence trend, "Same over the previous days" applies now if all values are the same !

A small test on a 516 Mb logfile (2.4 million lines).

  CPU Lines/sec
Base 6'42 7100
Robot enabled 14'15 2880
Spammer enabled 26'06 1500
Precision 4 1h 34' 7000
ReverseDNS enabled 6h 25' 6280
Detecting referer spammer could be a waste of time if you don't have any ! There are a long list available in the /resources/referrer-standard.txt which take age to compare. Maybe I should give a minimal list so people will be able to add entries when needed.

Setting precision to 4 take too much time building html pages, need to improve this. It take less time if you use a logfile with domain name hosts.

Using Reverse DNS is quite slow ... 60 times slowly...but once the reverse dns cache have been filled, if you restart the run with the same data, it will take only 6 minutes. To increase speed, you can use Geo::IP which is basically a reverse dns local file.

New idea :
- Running other logfile analyzer and compare the results.
- Modifying the display threshold with AJAX
- Modifying the graphs with AJAX (selecting different date, different output graph)
- Tag script (inside html code) which will output a logfile (like tags stats).

NOV
01
3.005
01.11.2007 | 3.005 | Auteur : Domisse

Let's start....

Today was a busy day :) I fixed the problem with the logo. I tried to extract width and height from a URL file with a local open. Sadly I've used an empty warning message so no way to find what was wrong. David Rose allow me to connect on his host .... thanks to him !

I've completed RSS stats, should allow you to view how many visitors have subscribed to your RSS feed. A new entry have been added in the configuration file where you have to enter your rss file. Note that the RSS stats will work with any file, allowing you to monitor other files than RSS.

In the administration area, I've removed the NIS section, too many people were puzzled with. Language section is being checked so you can't unselect all language !

I've downloaded a small PAD generator under Windows because most Windows Repository software ask for. Will feed them with the PAD for the release.

Sorting tables is now first ascending, people are more interested in the top ten files rather the 10 worst files.

Hans-Ruodi Burch suggests me to use PDF::Template instead of Htmldoc, seems not to be updated since 2006 and lack of table support...but will make a try.

I had some problem guessing the website URL from pathserver and path because some user use very different path (don't speak about symlink !). Maybe it will be safer to add an extra field $link_weburl ?

New idea :
- on Fatal error, ability to send by email configuration file if the user is agree
- Also an email when the stats are completed ?
- A Web page showing all RSS related to logfile analysis ?
- RSS/Email alert when unsual trafic appears ?

Things to do :
- Using xslt to render rss output
- Stats with apache %I and %O