|
TinyStats
|
Written by Datagod
|
|
Friday, 13 June 2008 |
This is a realtime display of the current hits per hour that this website is getting. The chart updates every 5 seconds. The chart is rendered via javascript/flash and gets its information from a MySQL query. The data is translated into XML of course, prior to being fed to the flash file. This is all wrapped in PHP. Talk about complicated!
|
|
Last Updated ( Friday, 13 June 2008 )
|
|
|
Written by Datagod
|
|
Thursday, 12 June 2008 |
For the past 6 months I have been fairly quiet on this site. The reason for this is that I have been focusing my attention on learning FusionCharts and FusionWidgets charting tools for my masterpiece MyTinyStats.com.
I love this product, they really did a great job on putting it all together. Considering how many charts and gadgets they provide, the documentation is incredible. My favorite part, however, is in writing PHP code to translate MySQL query results into XML.
I have spent countless hours on this project and am very pleased to say that the hard work is paying off. Not only is MyTinyStats starting to look really great, but I am also able to start selling my services to other clients. The money I make all goes back into paying for the mucho hardware/software being used to hold my little web empire together.
So, if you are looking for some help with PHP, MySQL and FusionCharts, I am the man to talk to. 
|
|
Last Updated ( Thursday, 12 June 2008 )
|
|
|
Written by Datagod
|
|
Thursday, 15 May 2008 |
My sites routinely get hammered by bots and crawlers. Some are malicious, attempting to find vulnerabilities in unpatched software. Some are link spammers that send fake referrer information containing URL's back to some crappy site selling cheap drugs.
Other crawlers (Google, Yahoo, etc.) are totally legitimate, indexing the pages as they crawl them.
In any case, I want to slow down the page loads for the bots and crawlers to prevent them from overwhelming the webserver. In particular, MyTinyStats has thousands of webpages that are generated via combination of PHP and complex MySQL queries.
I do this with the following code:
$BotList = file_get_contents("lookup/BotList.txt");
IF (strstr($BotList,$_SERVER["HTTP_USER_AGENT"]))
{
sleep(30);
}
The BotList.txt file contains a list of 350+ known bots and crawlers that I have identified over the past year. This file gets loaded into a local variable using the "file_get_contents()" function.
I then examine the BotList, searching for the current UserAgent. If found, I slow the page down by sleeping for 30 seconds. If not found, the rest of the page loads.
The file is approximately 23Kb and takes only a few milliseconds to load and parse.
While using this approach, I made an interesting discovery: file_get_contents() accepts a URL as a file parameter.
$BotList = file_get_contents("http://mytinystats.com/lookup/BotList.txt");
IF (strstr($BotList,$_SERVER["HTTP_USER_AGENT"]))
{
sleep(30);
}
Specifying a URL however indroduces a 3 second overhead!! 
I highly recommend using a relative file name and not a URL, if at all possible. |
|
Last Updated ( Thursday, 20 August 2009 )
|
|
|
Written by Datagod
|
|
Wednesday, 14 May 2008 |
Here is a master list of bots that have been crawling my websites over the past year. I examine the UserAgent and apply a few rules in order to classify the visitors as Bots. This list may contain some false positives, but it is interesting to read none the less.
Enjoy!
8484 Boston Project v 1.0
Acoon-Robot 4.0.2.17 (http://www.acoon.de)
AdsBot-Google ( http://www.google.com/adsbot.html)
AdsBot-Google (+http://www.google.com/adsbot.html)
Agent Anonymous/3G bot 1.0 (anything...anything you want.; http://anything.you.want;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
AnotherBot 1.0
Apexoo Spider 1.1
Attentio/Nutch-0.9-dev (Attentio\'s beta blog crawler; www.attentio.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Azureus 3.0.3.4;Mac OS X;Java 1.5.0_13
Baiduspider ( http://www.baidu.com/search/spider.htm)
Baiduspider ( http://www.baidu.com/search/spider_jp.html)
Baiduspider+(+http://www.baidu.com/search/spider.htm)
Baiduspider+(+http://www.baidu.com/search/spider_jp.html)
Balihoo/Nutch-0.9 (Crawler for Balihoo.com search engine - obeys robots.txt and robots meta tags ; http://balihoo.com/index.aspx; robot at balihoo dot com)
Balihoo/Nutch-1.0-dev (Crawler for Balihoo.com search engine - obeys robots.txt and robots meta tags ; http://balihoo.com/index.aspx; robot at balihoo dot com)
Bigsearch.ca/Nutch-1.0-dev (Bigsearch.ca Internet Spider; http://www.bigsearch.ca/;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
blackspider
Bloodhound/Nutch-0.9 (Testing Crawler for Research - obeys robots.txt and robots meta tags ; http://balihoo.com/index.aspx; robot at balihoo dot com)
boitho.com-dc/0.82 ( http://www.boitho.com/dcbot.html )
boitho.com-dc/0.85 ( http://www.boitho.com/dcbot.html )
boitho.com-dc/0.86 ( http://www.boitho.com/dcbot.html )
bot/1.0 (bot; http://;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
BrightCrawler (http://www.brightcloud.com/brightcrawler.asp)
Canookle/1.1 ( http://www.canookle.ca/spider.html)
CazoodleBot/CazoodleBot-0.1 (CazoodleBot Crawler; http://www.cazoodle.com/cazoodlebot;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
CazoodleBot/Nutch-0.9-dev (CazoodleBot Crawler; http://www.cazoodle.com/cazoodlebot;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
CazoodleBot/Nutch-0.9-dev (CazoodleBot Crawler; http://www.cazoodle.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
CCBot/1.0 ( http://www.commoncrawl.org/bot.html)
CentiverseBot
CentiverseBot - investigator
CentiverseBot/3.0 (http://www.centiverse-project.net)
ContactBot/0.2
ConveraCrawler/0.9e ( http://www.authoritativeweb.com/crawl)
ConveraCrawler/2.0 ( http://ews.converasearch.com/crawl.htm)
cytqqrmxxpthobottvljckjgx0j0sqqtts
dgwkexnqfsogisyhbotfi2vjum2kdmbbsqebrun
DiGi-RSSBot
disco/Nutch-1.0-dev (experimental crawler; www.discoveryengine.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
DNSRight.com WebBot Link Ckeck Tool. Report abuse to:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Dot TK robot -- spidering your website. Rename your website at www.dot.tk
DotTK-Robot/1.0 (Rename The Internet - http://www.dot.tk)
DuckDuckBot/1.0; ( http://duckduckgo.com/duckduckbot.html)
Dzesqclj qmjpbot nsexc
e-SocietyRobot(http://www.yama.info.waseda.ac.jp/~yamana/es/)
EnaBot/1.2 (http://www.enaball.com/crawler.html)
Eurobot/Nutch-1.0-dev (1.0)
FAST Enterprise Crawler 6 used by fast (jorgent@fast)
FAST Enterprise Crawler 6 used by Singapore Press Holdings (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
FAST MetaWeb Crawler (helpdesk at fastsearch dot com)
favorstarbot/1.0 ( http://favorstar.com/bot.html)
Feedfetcher-Google; ( http://www.google.com/feedfetcher.html)
Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 1 subscribers; feed-id=12476056983529841647)
Feedfetcher-Google; ( http://www.google.com/feedfetcher.html; 2 subscribers; feed-id=12476056983529841647)
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html)
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 1 subscribers; feed-id=12476056983529841647)
Feedfetcher-Google; (+http://www.google.com/feedfetcher.html; 2 subscribers; feed-id=12476056983529841647)
froGgle/0.0 BETA! ( http://www.jabea.net/bot.php)
Gaisbot/3.0 (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
; http://www.widelab.org/robot.html)
Gaisbot/3.0 (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
; http://gais.cs.ccu.edu.tw/robot.php)
Gaisbot/3.0+(
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
;+http://gais.cs.ccu.edu.tw/robot.php)
genieBot
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
GeonaBot/1.2; http://www.geona.com/
Gigabot/1.0
Gigabot/2.0 (http://www.gigablast.com/spider.html)
Gigabot/2.0att
Gigabot/3.0 (http://www.gigablast.com/spider.html)
Google-Sitemaps/1.0
googlebot
Googlebot-Image/1.0
Googlebot/2.0 (http://www.googlebot.com/bot.html)
Googlebot/2.1
Googlebot/2.1 ( http://www.google.com/bot.html)
Googlebot/2.1 ( http://www.googlebot.com/bot.html)
Googlebot/2.1 ( http://www.googlebot.com/bot.html; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322; TheWorld)
Googlebot/2.1 (http://www.googlebot.com/bot.html)
great-plains-web-spider/gpws (Flatland Industries Web Spider; http://www.flatlandindustries.com/flatlandbot.php;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
gsa-crawler (Enterprise; S4-EHN4YSQ2XLJAB;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
,
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
GurujiBot/1.0 ( http://www.guruji.com/en/WebmasterFAQ.html)
GurujiBot/1.0 (+http://www.guruji.com/en/WebmasterFAQ.html)
gzip(gfe) (via translate.google.com)
hdgjacfbqnupyrjavaqrngnxptq
HMSE_Robot
Holiday Shopping, Inc.
Hotakew perljwsi nudlg
HouxouCrawler/Nutch-0.8.2-dev
I411Crawling/Nutch-0.9 (CrawlData; www.i411.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
iaskspider/2.0( http://iask.com/help/help_index.html)
Ibotyze riucktv dmjy
ICC-Crawler(Mozilla-compatible; http://kc.nict.go.jp/icc/crawl.html; icc-crawl(at)ml(dot)nict(dot)go(dot)jp)
ICC-Crawler(Mozilla-compatible; http://kc.nict.go.jp/icc/crawl.html; icc-crawl-contact(at)ml(dot)nict(dot)go(dot)jp)
ichiro/2.0 (http://help.goo.ne.jp/door/crawler.html)
ichiro/3.0 (http://help.goo.ne.jp/door/crawler.html)
IIITBOT/1.1 (Indian Language Web Search Engine; http://webkhoj.iiit.net; pvvpr at iiit dot ac dot in)
ilial/Nutch-0.9 (Ilial, Inc. is a Los Angeles based Internet startup company. For more information please visit http://www.ilial.com/crawler; http://www.ilial.com/crawler;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
ilial/Nutch-0.9 (Ilial, Inc. is a Los Angeles based Internet startup company.; http://www.ilial.com/crawler;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
IlseBot/1.1
inelibotTqopauiuehccihTeenhkau
IRLbot/3.0 (compatible; MSIE 6.0; http://irl.cs.tamu.edu/crawler)
IRLbot/3.0 (compatible; MSIE 6.0; http://irl.cs.tamu.edu/crawler/)
IRLbot/3.0_space_(compatible;_space_MSIE_space_6.0;_space_http://irl.cs.tamu.edu/crawler)
IUPUI Research Bot v 1.9a
Java/1.4.2
Java/1.4.2_01
Java/1.4.2_03
Java/1.4.2_04
Java/1.4.2_07
Java/1.4.2_11
Java/1.4.2_12
Java/1.4.2_13
Java/1.5.0-p3
Java/1.5.0_01
Java/1.5.0_02
Java/1.5.0_09
Java/1.5.0_12
Java/1.5.0_13
Java/1.6.0-oem
Java/1.6.0-rc
Java/1.6.0_02
Java/1.6.0_03
Java/1.6.0_04
Java/1.6.0_05
John\'s SEO (http://www.familieboers.eu/seo/seoinfo.aspx -
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
KAIST AITrc Crawler
Krugle/Krugle,Nutch/0.8 (Krugle web crawler; http://corp.krugle.com/crawler/info.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Krugle/Krugle,Nutch/0.8+ (Krugle web crawler; http://corp.krugle.com/crawler/info.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
ktbxewuqetybotfo5aevfhg
LapozzBot/1.5 ( http://robot.lapozz.hu)
LarbinWebCrawler
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
LargeSmall Crawler
Lexxe/Robot
libwww-perl/5.36
libwww-perl/5.43
libwww-perl/5.47
libwww-perl/5.48
libwww-perl/5.49
libwww-perl/5.50
libwww-perl/5.51
libwww-perl/5.53
libwww-perl/5.60
libwww-perl/5.62
libwww-perl/5.63
libwww-perl/5.64
libwww-perl/5.65
libwww-perl/5.68
libwww-perl/5.69
libwww-perl/5.70
libwww-perl/5.73
libwww-perl/5.75
libwww-perl/5.76
libwww-perl/5.77
libwww-perl/5.78
libwww-perl/5.79
libwww-perl/5.800
libwww-perl/5.801
libwww-perl/5.802
libwww-perl/5.803
libwww-perl/5.804
libwww-perl/5.805
libwww-perl/5.806
libwww-perl/5.807
libwww-perl/5.808
libwww-perl/5.810
libwww-perl/5.811
libwww-perl/5.812
LocalcomBot/1.3.0 ( http://www.local.com/bot.htm)
LTI/LemurProject Nutch Spider/Nutch-1.0-dev (lti crawler for CMU; http://www.lti.cs.cmu.edu; changkuk at cmu dot edu)
LTI/LemurProject Nutch Spider/Nutch-1.0-dev (Research spider using Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
LTI/LemurProject Nutch Spider/Nutch-1.0-dev (Research spider using Nutch; http://www.lemurproject.org;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
lwbotuTafkqpjfpwhjlvmukjaolmamcnk
Lycos_Spider_(modspider)
McBot/5.001 (windows; U; NT4.0; en-us)
Mediapartners-Google
Mediapartners-Google/2.1
medrabbit/0.1beta (Crawling for health research; http://www.medrapid.info/bot.htm;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Missigua Locator 1.9
MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php? )
MJ12bot/v1.0.8 (http://majestic12.co.uk/bot.php?+)
MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php? )
MJ12bot/v1.2.0 (http://majestic12.co.uk/bot.php?+)
MLBot (www.metadatalabs.com)
MLBot (www.metadatalabs.com/mlbot)
Moreoverbot/5.00 ( http://www.moreover.com)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; http://sp.ask.com/docs/about/tech_crawling.html)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/2.0 (compatible; Ask Jeeves/Teoma; +http://sp.ask.com/docs/about/tech_crawling.html)
Mozilla/4.0 (compatible; Vagabondo/4.0Beta; webcrawler at wise-guys dot nl; http://webagent.wise-guys.nl/; http://www.wise-guys.nl/)
Mozilla/4.0 (compatible; GoogleBot 1.2)
Mozilla/4.0 (compatible; MSIE 5.01; Windows 95; MSIECrawler)
Mozilla/4.0 (compatible; MSIE 5.0; YANDEX)
Mozilla/4.0 (compatible; MSIE 6.0 compatible; Asterias Crawler v4; http://www.singingfish.com/help/spider.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
); SpiderThread Revision: 3.11
Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; Win 9x 4.90; MSIECrawler)
Mozilla/4.0 (compatible; MSIE 6.0; Windows CE; IEMobile m.n) libwww-perl/5.805
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; MSIECrawler
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1),gzip(gfe) (via translate.google.com)
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ),gzip(gfe) (via translate.google.com)
Mozilla/4.0 (compatible; MSIE is not me; DAUMOA/1.0.1; DAUM Web Robot; Daum Communications Corp., Korea) libwww-perl/5.808
Mozilla/4.0 (compatible; piCrawler)
Mozilla/4.5 RPT-HTTPClient/0.3-2,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (compatible; archive.org_bot/1.13.1x +http://crawler.archive.org)
Mozilla/5.0 (compatible; Ask Jeeves/Teoma; http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/5.0 (compatible; Ask Jeeves/Teoma; +http://about.ask.com/en/docs/about/webmasters.shtml)
Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Mozilla/5.0 (compatible; Googlebot/2.1; http://www.google.com/bot.html)
Mozilla/5.0 (compatible; MSIE 6.0; Podtech Network;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Mozilla/5.0 (compatible; Yahoo! DE Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp China; http://misc.yahoo.com.cn/help.html)
Mozilla/5.0 (compatible; Yahoo! Slurp/3.0; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp),gzip(gfe) (via translate.google.com)
Mozilla/5.0 (compatible; Yahoo! Slurp;http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0 (compatible; YodaoBot/1.0; http://www.yodao.com/help/webmaster/spider/; )
Mozilla/5.0 (compatible; zermelo; http://www.powerset.com) [email:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
,
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
]
Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)
Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13; Google-TR-5.1.707.23222-en) Gecko/20080311 Firefox/2.0.0.13
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.4) Gecko/20070515 Googlebot 2,1
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:2.0.0) Gecko/20050511 Firefox/2.0.0.1,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.8.1.5) Gecko/20070713 Firefox/2.0.0.5,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 5.1; pt-BR; rv:1.8.1.12; Google-TR-5.1.706.29690-en-GT) Gecko/20080201 Firefox/2.0.0.12
Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9b5) Gecko/2008032620 Firefox/3.0b5,gzip(gfe) (via translate.google.com)
Mozilla/5.0 (Windows;) NimbleCrawler 2.0.1 obeys UserAgent NimbleCrawler For problems contact:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
Mozilla/5.0 (Yahoo-Test/4.0 mailto:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Mozilla/5.0 (Yahoo-Test/4.0; mailto:
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Mozilla/5.0_space_(compatible;_space_Yahoo!_space_Slurp;_space_http://help.yahoo.com/help/us/ysearch/slurp)
Mozilla/5.0_space_(Twiceler-0.9_space_http://www.cuill.com/twiceler/robot.html)
Mplfsqik yjatgh mcavperlo
MQBOT/Nutch-0.9-dev (MQBOT Nutch Crawler; http://vwbot.cs.uiuc.edu;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Mrwhebot bmlk yfkcrhsw
msnbot-media/1.0 ( http://search.msn.com/msnbot.htm)
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm)
msnbot/0.9 ( http://search.msn.com/msnbot.htm)
msnbot/0.9 (+http://search.msn.com/msnbot.htm)
msnbot/1.0 ( http://search.msn.com/msnbot.htm)
msnbot/1.0 (+http://search.msn.com/msnbot.htm)
msnbot/1.0+(+http://search.msn.com/msnbot.htm)
msnbot/1.1 ( http://search.msn.com/msnbot.htm)
MSR-ISRCCrawler
MSRBOT (http://research.microsoft.com/research/sv/msrbot)
MSRBOT (http://research.microsoft.com/research/sv/msrbot/
multicrawler ( http://sw.deri.org/2006/04/multicrawler/robots.html)
mxdw ijboTgfrnTghmTwudn
my-heritrix-crawler( http://mywebsite.com)
NameOfAgent (CMS Spider)
NASA Search 1.0
NetResearchServer/4.0(loopimprovements.com/robot.html)
NextGenSearchBot 1 (for information visit http://about.zoominfo.com/About/NextGenSearchBot.aspx)
NextGenSearchBot 1 (for information visit http://www.zoominfo.com/About/misc/NextGenSearchBot.aspx)
nicebot
nKolcutqchh4Klyi4afbotvKsxK4ons cc
Nokia6610/1.0 (3.09) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
) )
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
) )
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (botmobi find.mobi/bot.html) )
Nokia6680/1.0 ((4.04.07) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 Configuration/CLDC-1.1 (for mobile crawler) )
Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; http://www.google.com/bot.html)
Nokia6820/2.0 (4.83) Profile/MIDP-1.0 Configuration/CLDC-1.0 (compatible; Googlebot-Mobile/2.1; +http://www.google.com/bot.html)
noxtrumbot/1.0 (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
nrsbot/5.0(loopimprovements.com/robot.html)
nrsbot/5.0(loopip.com/robot.html)
Nutch/Nutch-0.9 (Eurobot; http://www.ayell.eu )
nutchbot/Nutch-1.0-dev
NutchCVS/0.06-dev (Nutch; http://www.nutch.org/docs/en/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
NutchCVS/0.7 (Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
NutchCVS/0.7.2 (Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
nyac libmhq afJproJnnpcbotircgukoepykhc
nyjfbbotery dpvhcctvrplgfxykEgccxpx
oBot
Openfind data gatherer, Openbot/3.0 (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
; http://www.openfind.com.tw/robot.html
Openfind data gatherer, Openbot/3.0+(
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
;+http://www.openfind.com.tw/robot.html
OpenPlaces/Nutch-1.0-dev (OpenPlaces Content Crawler; http://www.openplace.com; dnadeau at openplaces.com)
Os xsxbotjpnbfbn wtm oq
owsBot/0.1 (Nutch; www.oneworldstreet.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Oxford_AI_BOT_V1.0 (compatible; MSIE 6.0;)
PCbot/3.0
Pete-Spider Light/1.2
pgyxeXbgqin8vbotkophhoXpnptgfvl8ulolryv
PHP version tracker (http://www.nexen.net/phpversion/bot.php)
Pingdom.com_bot_version_1.4_(http://www.pingdom.com/)
PlantyNet_WebRobot_V1.9
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
POE-Component-Client-HTTP/0.65 (perl; N; POE; en; rv:0.650000)
psbot/0.1 ( http://www.picsearch.com/bot.html)
psbot/0.1 (+http://www.picsearch.com/bot.html)
pythonic-crawler (
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
qi8asakoakbotfmvi mltoR8xav
Rankivabot/3.2 (www.rankiva.com; 3.2; vzmxikn)
RedBot/redbot-1.0 (Rediff.com Crawler; redbot at rediff dot com)
Retriever/Nutch-0.9 (Crawler for Balihoo.com search engine - obeys robots.txt and robots meta tags ; http://balihoo.com/index.aspx; robot at balihoo dot com)
Robotgenius crawler/Nutch-1.0-dev (http://robotgenius.net; misc at robotgenius dot net)
RSS-SPIDER (http://www.rss-spider.com/submit200709.php)
sbotcpkwujpdumgnbugnLgqljdxbicvmsx
schibstedsokbot (compatible; Mozilla/5.0; MSIE 5.0; FAST FreshCrawler 6; +http://www.schibstedsok.no/bot/)
ScSpider/0.2
Seekbot/1.0 (http://www.seekbot.net/bot.html) HTTPFetcher/2.2
Sensis Web Crawler (search_comments\\at\\sensis\\dot\\com\\dot\\au)
ShablastBot 1.0
ShopWiki/1.0 ( http://www.shopwiki.com/wiki/Help:Bot)
Snapbot/1.0 ( http://www.snap.com)
Snapbot/1.0 (+http://www.snap.com)
Snapbot/1.0 (Snap Shots, http://www.snap.com)
Snapbot/1.0 (Snap Shots, +http://www.snap.com)
sogou develop spider
Sogou web spider/3.0( http://www.sogou.com/docs/help/webmasters.htm
SouFZ-Spider[SouFZ.COM]
Speedy Spider (Entireweb; Beta/1.1; http://www.entireweb.com/about/search_tech/speedyspider/)
Speedy Spider (Entireweb; Beta/1.2; http://www.entireweb.com/about/search_tech/speedyspider/)
Speedy Spider (Entireweb; Beta/1.3; http://www.entireweb.com/about/search_tech/speedyspider/)
Speedy Spider (http://www.entireweb.com/about/search_tech/speedyspider/)
Speedy Spider (http://www.entireweb.com/about/search_tech/speedy_spider/)
Spock Crawler (http://www.spock.com/crawler)
Steeler/3.3 (http://www.tkl.iis.u-tokyo.ac.jp/~crawler/)
Steeler/3.4 (http://www.tkl.iis.u-tokyo.ac.jp/~crawler/)
SurveyBot/2.3 (Whois Source)
Technoratibot/0.7
Teemer (NetSeer, Inc. is a Los Angeles based Internet startup company.; http://www.netseer.com/crawler.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Teemer (NetSeer, Inc.; http://www.netseer.com/crawler.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
TestCrawler/Nutch-0.9 (Testing Crawler for Research ; http://chitchit.org/TestCrawler.html; amitjain at spro dot net)
ThaiSUE-Crawler
Theme Spider ( http://www.themespider.com/spider.html)
TinEye/1.1 (http://tineye.com/crawler.html)
TMCrawler
tmvmbmObsojgfamqqbOthlvdibxObugyvy
Trailfire-bot/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Trailfire-bot/0.7.1 (Trailfire page content analyzer; http://trailfire.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Trailfire-bot/Nutch-1.0-dev (Trailfire page content analyzer; http://trailfire.com;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
Trailfire/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
travel-spider/Nutch-0.9
TRAVEL.com - travel-search engine (see http://www.travel.com/spiders.html)
TurnitinBot/2.1 (http://www.turnitin.com/robot/crawlerinfo.html)
TwengaBot/1.1 ( http://www.twenga.com/bot.html)
Twiceler www.cuill.com/twiceler/robot.html
Twiceler-0.9 http://www.cuill.com/twiceler/robot.html
UogU0bxemmuf fyaprbot0 vbvhajwnUbaa
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1),gzip(gfe) (via translate.google.com)
VadixBot
VadixBot_Exp
VisBot/2.0 (Visvo.com Crawler; http://www.visvo.com/bot.html;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
voyager-hc/1.0 ( http://www.kosmix.com/crawler.html)
voyager-hc/1.0 (+http://www.kosmix.com/crawler.html)
vspider
VSynCrawler/1.0
VWBOT/Nutch-0.9-dev (VWBOT Nutch Crawler; http://vwbot.cs.uiuc.edu;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
w5vbotjovngrklftksnNvsxlimb srslooygel5
WebAlta Crawler/1.3.33 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
WebAlta Crawler/1.3.34 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
WebAlta Crawler/2.0 (http://www.webalta.net/ru/about_webmaster.html) (Windows; U; Windows NT 5.1; ru-RU)
webbot( http://webbot.com/bot.htm)
WebRankSpider/1.37 ( http://ulm191.server4you.de/crawler/)
WebRankSpider/1.37 (+http://ulm191.server4you.de/crawler/)
Wells Search II
WEP Search 00
WinHTTP Robot/1.0
woriobot ( http://www.worio.com/)
woriobot (+http://www.worio.com)
woriobot (+http://www.worio.com/)
Wujugo/Nutch-1.0-dev (Wujugo Content Crawler; http://www.wujugo.org; {dnadeau,srainville} at brightspark.com)
Xgpbotvu umkbwtq hmvykj
XwSpider
yacybot (i386 Linux 2.6.15-29-386; java 1.6.0; Europe/de) http://yacy.net/bot.html
yacybot (i386 Linux 2.6.9-023stab044.12-smp; java 1.6.0_03; Europe/en) http://yacy.net/yacy/bot.html
yacybot (i386 Linux 2.6.9-023stab044.12-smp; java 1.6.0_04; Europe/en) http://yacy.net/yacy/bot.html
yacybot (i386 Linux 2.6.9-023stab044.4-smp; java 1.6.0_02; Europe/en) http://yacy.net/yacy/bot.html
yacybot (x86 Windows 2003 5.2; java 1.6.0_03; Europe/de) http://yacy.net/bot.html
yacybot (x86 Windows 2003 5.2; java 1.6.0_04; Europe/de) http://yacy.net/bot.html
yacybot (x86 Windows XP 5.1; java 1.6.0_01; Europe/de) http://yacy.net/yacy/bot.html
Yahoo Pipes 1.0
Yahoo! Mindset
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
YahooFeedSeeker/2.0 (compatible; Mozilla 4.0; MSIE 5.5; http://publisher.yahoo.com/rssguide)
Yandex/1.01.001 (compatible; Win16; H)
Yandex/1.01.001 (compatible; Win16; I)
Yeti/0.01 (nhn/1noon,
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
, check robots.txt daily and follow it)
Yeti/1.0 ( http://help.naver.com/robots/)
yoofind/yoofind-0.1-dev (yoono webcrawler; http://www.yoono.com ;
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
)
YooName/Nutch-1.0-dev (YooName Content Crawler; http://www.yooname.com; pythonner-at gmail . com)
|
|
|
Written by Datagod
|
|
Tuesday, 01 January 2008 |
We need beta testers!
There are currently 5 people taking part in the initial beta test. We need more!
What is TinyStats, anyway?
TinyStats is a website visitor tracking system. Every time one of your websites pages is accessed, a call is made to a central server which logs the visitor information such as HostMachine, IPAddress, UserAgent, DomainName, etc. This information is stored in a MySQL database in a series of highly optimized tables.
Every five minutes a process runs that will analyze and summarize the incoming data, making it available for reporting purposes. 
How is TinyStats different from all the other traffic monitors?
The philosophy behind TinyStats has always been "Keep it simple, keep it Tiny!".
TinyStats is a light-weight, straighforward, easy access system. You can get at your data when you need it without having to jump through hoops, signing on to special areas, etc.
The features that make TinyStats stand out from the crowd are as follows:
- TinyStats tracks ALL visitors to your site, not just humans with Javascript enabled.
- The reports are "drillable" via hyperlinks to more reports. You can spend hours drilling into your data, increasing or decreasing the granularity as you see fit.
- Reports are open for all to see, driving traffic back to your website. Web crawlers (Google, Yahoo, etc) absolutely love crawling the the reports.
- TinyStats is a community, not a product. We have an active forum with many excellent ideas being bounced back and forth. The main developer (me) is passionate about TinyStats, and thoroughly enjoys responding to user's feature requests.
How can I join the party?
All you need to do is cut and past the following PHP code into each page you want to actively track. You can download the code here.
Where can I learn more?
Visit our forum, or see TinyStats in action!
Thanks! |
|
Last Updated ( Wednesday, 09 April 2008 )
|
|
|
Written by Datagod
|
|
Friday, 30 November 2007 |
I have been working diligently on improving the whole TinyStats experience The reports have evolved significantly, I have made architectural changes on the back end, I have registered a domain, etc.
I have been experimenting with various methods to communicate back to the central server, one of which is using the ever popular embedded image. My method is special, though, because inside the img tag I refer to a PHP script instead of a standard image file. The user's browser does not know the difference and merely waits for the image to load. The PHP script records the visitor's information, then serves up an image. It works great. I will post more at a later date...too much to do tonight...
I am also very excited about the future home of TinyStats: MyTinyStats.com. I am slowly moving pieces over to the new domain, cleaning things up a bit as I go. I will be treating the TinyStat report pages hosted on CookingWithSQL to be a testing/development area. When I am happy with the results of a modification, I will migrate that to MyTinyStats.com.
I hope to announce a general public beta before Christmas of this year. Stay tuned! |
|
Last Updated ( Friday, 30 November 2007 )
|
|
|
Written by Datagod
|
|
Thursday, 08 November 2007 |
Over the past few weeks I have been thinking about how to further crunch the TinyStats weblog data in order to increase performance of my Grand Summary reports.
Currently I have about 500,000 records in my WebLog table. This translates into 150,000 WebLogFact records. The fact table works great for detailed reports, but for grand total reports it can be a bit cumbersome.
My new report table summarizes the number of hits per domain per day. This information is broken down by visitor type. The table does NOT contain URL information, IP Address information or Machine Address information. This new table contains approximately 2500 records which is considerably smaller than the WebLogFact table.
Needless to say, the Grand Summary report is an order of magnitude faster than it was before.
I am hoping to be able to devote more time to my TinyStats project during the Christmas Holidays. I want to go public but I must be certain my database schema and reporting structure can handle large volumes.
Until then, stay tuned. If you want to take part in the beta test, send me an
This e-mail address is being protected from spam bots, you need JavaScript enabled to view it
. I would love to have more testers! |
|
Last Updated ( Sunday, 18 November 2007 )
|
|
|
Written by Datagod
|
|
Saturday, 21 April 2007 |
 I spent the past two weeks tinkering with PHP/SWF Charts which allows you to build some very impressive graphs, charts, etc. I followed the examples given on the website and in no time at all I was retrieving data from mysql and charting it. At first I found the examples on the website helpful, but they really were not flexible enough for my needs. I want the chart configuration to be data driven and not have to rely on hard coded values. For example, I created a report that shows the traffic history for a domain over the last 24 hours. The domain name is dynamic, as well as the date range. I can change the 24 hours to 12 hours with an input parameter. The chart object gets all the data it needs from the MySQL result set, not from hard coded values. I then took some time to built a bit of a reporting architecture. I now have generic blocks of PHP that I use to execute a MySQL query and pass the info along to the PHP/SWF module. I spent hours and hours on figuring out what I wanted and trying to build the PHP scripts to be as flexible as possible. The result was that in under 30 minutes I was able to put together a basic dashboard that consists of 3 graphic charts and one tabular report. Now I can spend some time on the prettiness factor. |
|
Last Updated ( Monday, 23 April 2007 )
|
|
|
Written by Datagod
|
|
Thursday, 12 April 2007 |
|
I deleted all the TinyStat data gathered for the past two weeks. I hated doing that, but I am still in Alpha development afterall.
I inadvertantly introduced a serious error in the portion of the codebase that inserts new UserAgent records. I translate spaces in the UserAgent to "_space_" prior to sending to the TinyStats collection script. I then use the replace function to change the data back. The bug was that I forgot to include the replace command in the correlated subquery. The example below shows the correct method. insert into UserAgent (UserAgent) select distinct replace(UserAgent,'_space_',' ') from WebLogQueue wlq where not exists(select 1 from UserAgent ua where ua.UserAgent = replace(wlq.UserAgent,'_space_',' '))
To make a long story short, I am too time pressed to spend the hour it would take to resolve the data inconsistencies. Hitting the reset switch is the easiest solution at this point. For those participating in the testing, I apologize. It hopefully will not happen again.  |
|
Last Updated ( Friday, 13 April 2007 )
|
|
|
Written by Datagod
|
|
Saturday, 31 March 2007 |
I re-architected the reports today. They still look ugly as ever, but the underlying structure is much easier to work with. The reports are now broken down into separate files to handle the header, footer, db connection string, look and feel (will soon be CSS!), optional variables, etc.I came up with something pretty clever. All where clauses in my reports say "where (1 = 1)" followed by an optional clause with a variable that may or may not be populated. This allows me to use the same sql code to create a report for all domains, or for a specific domain if the optional variable is set. I also control the number of rows returned by using the Limit clause. Here is a simple example: IF (isset($MyDomain)) $MyDomainClause = " and d.Domain = '$MyDomain' ";
IF (isset($MyLimit)) $MyLimit = " limit $MyLimit ";
$MyQuery = " select d.Domain, sum(hits) as 'HITS', URL from WebLogFact wlf join Domain d on d.DomainID = wlf.DomainID join URL u on u.URLID = wlf.URLID where (1 = 1) $MyDomainClause group by d.Domain, u.URL order by 2 desc $MyLimit ;" I am pretty happy with the results so far. 
|
|
Last Updated ( Saturday, 31 March 2007 )
|
|
|
Written by Datagod
|
|
Tuesday, 27 March 2007 |
 I found a small typo in my TinyStats php script. I was wondering why I was not able to capture any "Reverse DNS" information like Steve Gibson does with his Shields Up! product. Well now I know why. I had an uppercase A instead of a lowercase a in one of my php statements.  IF (!isset($HostMachine)) $HostMAchine = gethostbyaddr($_SERVER["REMOTE_ADDR"]);
It was a small error, but the result was that I was never getting any data into my WebLogQueue table. I am pleased as punch now that I am gathering more information than ever on all 4 of my regular visitors. Just kidding...there are at least 5! |
|
Last Updated ( Tuesday, 27 March 2007 )
|
|
|
Written by Datagod
|
|
Monday, 26 March 2007 |
I bit the bullet last night and re-architected my fact table. It now contains summary counts of the HEADs and POSTs as well as incoming links per hour, per day. I spent some time today rewriting the reports that use this data. They now run MUCH faster and are far easier to understand.The vast majority of the work takes place every minute when the data in the WebLogQueue table is crunched. The reports merely have to report the data, they no longer have to do funky correlated subqueries to get answers. It was a lot of work, but I will sleep better tonight for sure. |
|
Last Updated ( Tuesday, 27 March 2007 )
|
|
|
Written by Datagod
|
|
Monday, 26 March 2007 |
|
I have been around long enough to know that when you are just starting to use a new product you should not let your initial experiences cloud your judgement of the product.
For example, today I caused MySQL processes to hang because I was issuing select statements at the same time as an insert was taking place. In SQL Server, this is no big deal. You just use "with (nolock)", as long as you don't care about performing dirty reads. In this case I don't. I also don't know the syntax to perform the same dirty reads in MySQL. I also am getting very annoyed that the MySQL Query Browser keeps crashing on me every time I type in a minus sign. You have no idea how frustrating it is to think "oops, I must be blocking again" only to realize that the browser has simply crashed because I hit the minus key. To give some insight into the recent difficulties I have been having, here are some obstacles that I have had to overcome: - No stored procedures allowed
- No views allowed
- Can't figure out how to set/use variables and make them persist past the delimiter ";"
- Can't figure out how to change delimiter even though following instructions found on MySQL forums in vain attempt to get variables to work
- Limit of 10 connections per User
- Do not type the minus sign in MySQL Query Browser or the GUI will crash and you lose all your work
There are more, but it gets depressing. Some obstacles are the fault of my web hoster, some are the result of me being such a MySQL noob. It is frustrating. Fortunately I excel at getting around obstacles like these. My TinyStats processes are still gathering data and crunching them into Fact and Dimension tables. I even managed to throw a few reports together. Don't get me wrong, I think MySQL is one cool product, and you can't go wrong with the price tag. I am incredibly pleased with what I have been able to accomplish so far. Well, that's enough for one day. I have to get some sleep.  |
|
Last Updated ( Monday, 26 March 2007 )
|
|
|
Written by Datagod
|
|
Friday, 23 March 2007 |
|
I finally found some time today to fix up the webside processing of the WebLogQueue. My CRON job now runs every minute on the webserver and crunches the data and stores it in a highly normalized series of tables. I no longer have to rely on outside servers/processes to crunch the data and upload reports. All reports can be generated on the fly.
This is a HUGE breakthrough for me. I can now go crazy and generate all sorts of interesting reports without having to worry about moving data/files back and forth. Over the next week or so I will be sending special invitations to people whom I feel would want to join the stats gathering party. I need to limit the numbers while I work out the various kinks in the data gathering PHP scripts. I want my scripts to be as flexible as possible before going to a wider release. Over the next week or so, I will also be adding another table which will hold summary information much like a Fact table in a data warehouse. I plan on having at least one month of detailed information, and summary information for all time. Of course my old remote server will still connect and suck back data on a regular basis and store it in SQL Server. This is a form of offsite backup, but will allow me to explore the data on my own without having to sweat through too many MySQL Query Browser crashes. |
|
Last Updated ( Monday, 26 March 2007 )
|
|
|
Written by Datagod
|
|
Saturday, 17 March 2007 |
|
Instead of watching Heroes, Lost, or 24, I decided to work on the TinyStats database component.
I moved all my TinyStat tables over to MySQL. I also created a SQL script to extract the pertinent data from the WebLogQueue table, such as IPAddress, URL, Referrer, etc. This script runs ever minute as a CRON JOB. It is my workaround for not having stored procedures, and It works great!  I opened my gigantic SQL Server stored procedure that I was using to process the WebLogQueue remotely, and I was able to port the main code over with no conversion necessary whatsoever! The reason the procedure is so big is that I am a firm believer in full error handling and commenting. I am not comfortable enough with MySQL yet to bother with that, so I am just sticking to the absolute basics. Because I am now crunching the data on the web based MySQL box, I don't have to worry about FTP'ing static reports to a webserver. All reports will be PHP/MySQL based, and will hit the new table structures on the webserver. I am also pleased with my hoster because they upgraded the MySQL box I am using to a much faster one. I didn't even have to pay extra, I just sent about 5 beg-mails, asking for help.  Well, it's late. I have had enough fun for tonight. Tomorrow I will put up some cool reports and then start inviting people to join the party. All it takes is a snippet of PHP code, and you too can use TinyStats. |
|
|
Written by Datagod
|
|
Sunday, 11 March 2007 |
|
I have been itching to throw together some reports now for the past week, but due to work commitments I just have not had the time.
Well, tonight I built the SummaryByDay report. This report shows the number and type of page requests, unique IP's etc per domain, per day.
The report accepts a domain name as a parameter and displays all the summary information for that domain since collecting began. I may eventually limit the number of days on the report to 90, but for now it will just continue to auto-grow. |
|
Last Updated ( Sunday, 18 November 2007 )
|
|
|
Written by Datagod
|
|
Saturday, 10 March 2007 |
|
My WebLogQueue table is getting around 50,000 records a day which causes a slight performance problem with my linked server queries. Ideally I would execute a local stored procedure to remove the records once they have been processed, but as I described before my hoster does not support MySQL stored procedures at this time.
I spent a couple of minutes figuring out how to run a MySQL script from the command line, then I put a delete statement in a query file. I created a cron job that connects to MySQL and runs the query file. Bingo. I was surprised at how easy it was. |
|
|
Written by Datagod
|
|
Saturday, 03 March 2007 |
|

It is all starting to come together now. I have multiple websites that have embedded a php snippet which gathers data and sends it to a central php script which records that in a MySQL database. Because my hoster does not allow me to run stored procedures (how lame) I have a scheduled job that connects from one of my home machines to gather the data into a local database, crunches it, and stores it in a highly normalized fashion. Once the new data has been successfully crunched, a summary report is uploaded to a table back on the webserver. I found some cool php reporting code, and put together a little summary report. Its not pretty, but it is functional. The data is updated every minute or so. I was getting horrible performance by trying to perform remote deletes of the data that I already processed, so I decided to do things a little differently. I keep track of processing ranges in a table on the webserver, min / max dates of the WebLogQueue records at the time of processing. I will have a cron job that will run a delete statement to delete all data that is older than the maximim date in the BatchControl table. Now THAT is an example of "Cooking with SQL". I am getting very tired, I have been at this non stop for hours. Time to get some shut eye. |
|
Last Updated ( Wednesday, 07 March 2007 )
|
|
|
Written by Datagod
|
|
Thursday, 01 March 2007 |
|
The development of TinyStats is in full swing. I have a PHP snippet that you can paste into a webpage, and the tracking begins immediately. The snippet captures some basic info, and passes it to another larger PHP script hosted elsewhere. That PHP script fills in whatever details were missing and inserted the data into a MySQL database. My hoster does not yet support stored procedures, so I have my local MySQL database connect to the remote database, suck in whatever records are in the "queue", and process them locally via a series of stored procedures. Once the data is crunched, the remote "queue" is flushed. An example of crunching is taking an IP address, and storing it in a master IPAddress table, unless it already exists. I have over 5000 records tracked, but only 100 unique IPAddress records. That is a saving of 50 to 1. The space saving is incredible especially when I crunch URL's down. Some URL's are over 500 characters in length. I store the master copy once, and refer to it's ID (4 byte int) later. Major savings!! My battery is about to die, so I gotta go ...
|
|
Last Updated ( Thursday, 01 March 2007 )
|
|
|
Written by Datagod
|
|
Sunday, 25 February 2007 |
|
 After 2 hours of fiddling around, I was finally able to create a linked server definition between SQL Server and MySQL. If I can't create stored procedures in MySQL (I have not heard back yet from the hosting tech support regarding that issue, surprise surprise...) maybe I can suck the data into SQL Server, process it, and load the results back into MySQL. I found some instructions on the web, but they either were incomplete, or simply did not apply to my particular situation. However, I did find a detailed set of instructions that did work. Here they are in full text glory:
1. Download the MySQL ODBC driver from mysql.com 2. Install MySQL ODBC driver on Server where SQL Server resides -Double Click Windows Installer file and follow directions.
3. Create a DSN using the MySQL ODBC driver Start-> Settings -> Control Panel -> Administrative Tools -> Data Sources (ODBC) -Click on the System DSN tab -Click Add -Select the MySQL ODBC Driver -Click Finish On the Login Tab: -Type a descriptive name for your DSN. -Type the server name or IP Address into the Server text box. -Type the username needed to connect to the MySQL database into the user text box. -Type the password needed to connect to the MySQL database into the password text box. -Select the database you'd like to start in. On the Advance Tab: Under Flags 1: -Check Don't Optimize column width. -Check Return Matching Rows -Check Allow Big Results -Check Use Compressed protocol -Check BIGINT columns to INT -Check Safe Under Flags 2: -Check Don't Prompt Upon Connect -Check Ignore # in Table Name Under Flags 3: -Check Return Table Names for SQLDescribeCol -Check Disable Transactions Now Test your DSN by Clicking the Test button
4. Create a Linked Server in SSMS for the MySQL database SSMS (SQL Server Management Studio -> Expand Server Objects -Right Click Linked Servers -> Select New Linked Server On the General Page: -Linked Server: Type the Name for your Linked Server -Server Type: Select Other Data Source -Provider: Select Microsoft OLE DB Provider for ODBC Drivers -Product name: Type MySQLDatabase -Data Source: Type the name of the DSN you created On The Security Page -Map a login to the Remote User and provide the Remote Users Password -Click Add under Local server login to remote server login mappings: -Select a Local Login From the drop down box -Type the name of the Remote User -Type the password for the Remote User
5. Change the Properties of the Provider MSDASQL Expand Providers -> Right Click MSDASQL -> Select Properties -Enable Nested queries -Enable Level zero only (this one's the kicker) -Enable Allow inprocess -Enable Supports 'Like' operator
6. Change settings in SQL Server Surface Area Configuration for Features -Enable OPENROWSET and OPENDATASOURCE support. 7. Change settings in SQL Server Surface Area Configuration for Services and Connections -Enable Local and Remote connections via TCP/IP and named pipes
8. Stop SQL Server and SQL Server Agent 9. Start SQL Server and SQL Server Agent |
|
Last Updated ( Wednesday, 07 March 2007 )
|
|
|
Written by Datagod
|
|
Sunday, 25 February 2007 |
|
I created the TinyStats database last night and got a rudimentary PHP script working to send in various bits of information.
I am finding the MySQL query browser a little awkward to work with merely because I am so used to SQL Server's Management Studio / Query Analyzer. Connecting remotely to the database to work is incredible compared to mucking around with the php/edit/mysqladmin/whatever it is called. I put in a call to my site hoster, because it seems I cannot create stored procedures in my database. This is very frustrating. I don't want all my logic to be in PHP scripts. I want to shove data into a queue table as fast as possible, then have a scheduled job run a stored procedure to crunch that data into a more normalized form. I suppose I could schedule a PHP script, but that is not my style. I like doing database work inside the database. I am a fan of centrally managed code, and procs certainly help with that. I also like to minimize client/server round trips. I prefer to call one procedure that calls many other procedures to accomplish a hundred tasks rather than issue a hundred SQL command statements from a GUI. Cuts down on network congestion. |
|
Last Updated ( Wednesday, 07 March 2007 )
|
|
| |
|