|
|
|
Release 3.01
23.11.2007 | 3.01 | Auteur : Domisse
|
|
|
|
|
|
|
Last improvements
19.11.2007 | Last improvements | Auteur : Domisse
|
|
Last few minor changes before releasing 3.01.
- There was a shift between request/accesses and hosts in yearly
graphics.
- Wrong CSV link in hourly stats
- Konqueror doesn't work with sortable table. Don't know why as there
is no javascript error console :(
- Use 127.0.0.1 instead of domainname in default configuration
file. Will prevent link to domainname.com website !
- Daily traffic table does not display from beginning of logfile.
- Adding year in the frame menu
- Cron-w3perl now display right cpu time if computed across two days
- I've added the number of access in the selected directory graphs.
- Finally add the year script in the script master loop.
- Popup link can now have a target frame
- A default configuration for apache is provided
- Popup Page Graph was wrong if hole in date intervalle was found.
- List of hosts which have red a page was not updated in incremental mode
Things to do :
- Add popup over directories graphics.
Ferran Sunyer, a spanish student working in Munich, contact me in
order to get help about sessions stats. He need to know how long
people are reading pages and the path people are using to navigate.
Faced with different websites, session stats will be able to show him
which one is the more efficient.
|
|
|
|
|
3.007 and Planning
16.11.2007 | 3.007 and Planning | Auteur : Domisse
|
|
3.007 will be the last development package before the official 3.01
release. Hope to make the package before the end of the month.
Yearly stats have been improved, will display current year also.
Time to have some plans for the next release :
- Tags script to allow people without logfile to use W3Perl. Should be
based on the same method as 'tag counter' : a piece of javascript code
inside some of your pages, but will write to a log file which can be
read by W3Perl.
- Add back support for Domino logfiles.
- Improved RSS stats
- Try to detect hackers and email spam.
- Use a better tool to build PDF files.
- Speed reverse dns by making multiple DNS queries
- Adding list of bounced mails
- Faster save with the highest level of precision
- Solve problem with wget
- Remove those frames
- Session sorted by referer
- Search engine on strings, not only on words
- More AJAX tools
- Robot daily graph
- Show current month
|
|
|
|
|
Infobulle
14.11.2007 | Infobulle | Auteur : Domisse
|
|
I've added a piece of Javascript from Jarodxxx which display a popup
over an image. This allow to view bigger image without the need to
open a new web page.
Small improvements :
- A new script is available to compute yearly stats.
- Session script now report an error if you try to compute stats older
than available.
- Last visit have been added in the robot stats.
- Browsers graphs show also a popup windows which display the first 3
most successfull ones.
- Robots stats are now split in two pages, one with full list, the
other with topten (to avoid too long list).
|
|
|
|
|
3.006 bugs
11.11.2007 | 3.006 bugs | Auteur : Domisse
|
|
First bug report from Hans-Ruodi on 3.006 :
- running install.pl -x will run the installer, instead of displaying
default values and quit.
- running cron-w3perl.pl -a -e works ! .... I've added a warning
because you had to choose between init and update, both can't be run
at the same time !
- Hans-Ruodi uses config-ftp as a template to build its own web server
configuration file. Config-ftp is about ftp servers and not web
servers.....will provide a default configuration file for apache.
Now I'm checking the date format of the logfile, if incorrect, will
exit on error.
- Robots are being listed in the browsers stats althrough they have
been excluded with configuration file. Reverse dns was forced to be disabled and
there was a mismatching between IP and hostname.
- 'Useless files' show index.php files but shouldn't. Well, this part
of the script is deisabled by default (still beta), need to make more
tests.
|
|
|
|
|
3.006
10.10.2007 | 3.006 | Auteur : Domisse
|
|
v3.01 is closer....3.006 include support for input/outgoing bandwidth
fields in Apache combinedio logfile format. Also a bug about saving
robots file have been fixed. Popup windows over page's requests graph
have been added.
Time to test mail stats now ... Emails are now put together based on domains
name rather than countries, allowing smaller html pages. Thinking
about spammer detection ... for an email address, lots of unique
reciepents and nothing else (of course, mailing list should be take
into account).
|
|
|
|
|
Tests
06.11.2007 | Tests | Auteur : Domisse
|
|
Hans-Ruodi Burch send me an email about problem with PDF building. The
scripts report errors and stop but the PDF was were. In fact, he uses
the latest HTMLdoc 1.9 (which is still an alpha) which has very
limited css rendering. Best is to use 1.8.27.
The reverse dns file is now saved when a day have been processed. If
you stop the processing before its completion, reverse dns do not need to
restart from scratch.
Support for %I and %O have been added in the scripts. If available in
the logfile (combinedio format), the input/outgoing traffic will be
printed (this include headers, status line and the request itself and
not only the size of the file requested).
|
|
|
|
|
W3Perl download
03.11.2007 | W3Perl download | Auteur : Domisse
|
|
October is over....let's see how many times w3perl have been
downloaded since January ?
Well, let's start the stats...
Of course, you need to take care to reject robot which try to download
the package. Also many people download big files with a download
manager which open many requests on small part of the file. If you
don't remove the download manager, the numbers will be wrong (too
large by many factor). 'Commercial' package often 'forgot' this basic
rule and so are able to promote their software with a great number of
download !
About numbers, well, nearly 5 fives times increase since January...not
too bad. But most come about new ways to distribute the package
(rpm/debian/apache-windows).
|
|
Forget to remove a test in the code....so it fail to load reverse dns
hosts. Also, I've added a new column 'Percentage' in the Host area,
more easy to knwo how much IP have not been resolved now.
Combinedio logformat from Apache have been added in the admin
interface...but not yet in the scripts.
New idea :
- To increase reverse dns speed, process first log lines then use JDresolve to do multiple dns query simultaneously (5-10 times faster).
|
|
|
|
|
Debug
02.11.2007 | Debug | Auteur : Domisse
|
|
Today is dedicated to bug fixes ...
- first one being the use of rss.gif instead of rss.png in 3.005.
- Second was to sort evrything in ascending order, string has to be
sort in descending order.
- Third was to remove the rss.xml file when starting from
scratch. Best is to overwrite the file, avoiding an error on requests,
while the stats are being completed.
- Fourth was a bug in displaying weekly graph legend
- Next one was about displaying the number of items in the traffic
area. It was set to 0.1, now use the maximum value/100.
- Page occurence in traffic was only set to non-html files, preventing
the table to be truly sortable.
- Daily filetype occurence for html files was wrong when running from
scratch (only for the last day parsed)
- Forget to delete old PDF (older than $nbdays) to free some disk
space
- When computing pages occurence trend, "Same over the previous days"
applies now if all values are the same !
A small test on a 516 Mb logfile (2.4 million lines).
|
CPU |
Lines/sec |
Base |
6'42 |
7100 |
Robot enabled |
14'15 |
2880 |
Spammer enabled |
26'06 |
1500 |
Precision 4 |
1h 34' |
7000 |
ReverseDNS enabled |
6h 25' |
6280 |
|
Detecting referer spammer could be a waste of time if you don't have
any ! There are a long list available in the /resources/referrer-standard.txt which take age to compare.
Maybe I should give a minimal list so people will be able to add entries when needed.
Setting precision to 4 take too much time building html pages, need to
improve this. It take less time if you use a logfile with domain name hosts.
Using Reverse DNS is quite slow ... 60 times slowly...but once the
reverse dns cache have been filled, if you restart the run with the
same data, it will take only 6 minutes. To increase speed, you can use
Geo::IP which is basically a reverse dns local file.
|
New idea :
- Running other logfile analyzer and compare the results.
- Modifying the display threshold with AJAX
- Modifying the graphs with AJAX (selecting different date, different output graph)
- Tag script (inside html code) which will output a logfile (like tags
stats).
|
|
|
|
|
3.005
01.11.2007 | 3.005 | Auteur : Domisse
|
|
Let's start....
Today was a busy day :) I fixed the problem with the logo. I tried to
extract width and height from a URL file with a local open. Sadly I've
used an empty warning message so no way to find what was wrong. David
Rose allow me to connect on his host .... thanks to him !
I've completed RSS stats, should allow you to view how many visitors
have subscribed to your RSS feed. A new entry have been added in the
configuration file where you have to enter your rss file. Note that
the RSS stats will work with any file, allowing you to monitor other
files than RSS.
In the administration area, I've removed the NIS section, too many
people were puzzled with. Language section is being checked so you
can't unselect all language !
I've downloaded a small PAD generator under Windows because most
Windows Repository software ask for. Will feed them with the PAD for
the release.
Sorting tables is now first ascending, people are more interested in
the top ten files rather the 10 worst files.
Hans-Ruodi Burch suggests me to use PDF::Template instead of Htmldoc,
seems not to be updated since 2006 and lack of table support...but
will make a try.
I had some problem guessing the website URL from pathserver and path
because some user use very different path (don't speak about symlink
!). Maybe it will be safer to add an extra field $link_weburl ?
New idea :
- on Fatal error, ability to send by email configuration file if the
user is agree
- Also an email when the stats are completed ?
- A Web page showing all RSS related to logfile analysis ?
- RSS/Email alert when unsual trafic appears ?
Things to do :
- Using xslt to render rss output
- Stats with apache %I and %O
|
|