[Gluster-infra] [Bug 1367588] Improve the redirection for specific URL for RTD coming from old website

bugzilla at redhat.com bugzilla at redhat.com
Wed Aug 17 12:10:13 UTC 2016


https://bugzilla.redhat.com/show_bug.cgi?id=1367588



--- Comment #3 from M. Scherer <mscherer at redhat.com> ---
So I did a quick verification on the whole set of logs, and we have since the
26 July around 22 000 hits. 

# grep /community/documentation/index.php www.gluster.org-access_log* |wc -l
22708

Around 90% of the traffic is bots:
# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v
g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v
Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep
-v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v
archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot |wc
-l
2598

I suspect on top of that that there is lots of refresh and duplicate ips

# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v
g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v
Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep
-v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v
archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot 
|awk '{print $1}' |awk -F: '{print $2}'  |sort -u |wc -l
649

Then trying to group by network just show around 600 hits. That's roughly 2 to
3 visitors per day on the wiki. 

After removing the various hacking attempt (aimed at joomla), the hit on the
redirect page itself, the tentative to login for spam, and favicon, we are down
to 1500 hits (without deduplication):

# grep /community/documentation/index.php www.gluster.org-access_log*  |grep -v
g2reader-bot/ | grep -v Slurp\; |grep -vi bingbot |grep -vi googlebot |grep -v
Baiduspider/ |grep -v AhrefsBot/  |grep -v MJ12bot/ | grep -v 'Sogou web' |grep
-v SeznamBot/ |grep -v electricmonk/ | grep -v 'HaosouSpider;' |grep -v
archive.org_bot  |grep -v Feedly/1.0 |grep -v SputnikBot/ | grep -v yoozBot
|grep -v docs-redirect   |awk '{print $7}' |grep -v 'Special:UserLogin' |grep
-v '&action=history'  |grep -v '%22%20h=/' |grep -v /favicon.ico |wc -l
1524


Then the 30 most popular URLs are:

[root at supercolony httpd]# grep /community/documentation/index.php
www.gluster.org-access_log*  |grep -v g2reader-bot/ | grep -v Slurp\; |grep -vi
bingbot |grep -vi googlebot |grep -v Baiduspider/ |grep -v AhrefsBot/  |grep -v
MJ12bot/ | grep -v 'Sogou web' |grep -v SeznamBot/ |grep -v electricmonk/ |
grep -v 'HaosouSpider;' |grep -v archive.org_bot  |grep -v Feedly/1.0 |grep -v
SputnikBot/ | grep -v yoozBot |grep -v docs-redirect   |awk '{print $7}' |grep
-v 'Special:UserLogin' |grep -v '&action=history'  |grep -v '%22%20h=/' |grep
-v /favicon.ico |sort |uniq -c  |sort -rn | head -n 30
    206
/community/documentation/index.php/Gluster_3.1:_Manually_Mounting_Volumes
    143 /community/documentation/index.php/Gluster_3.2:_Setting_Volume_Options
     87 /community/documentation/index.php/QuickStart
     69
/community/documentation/index.php/Gluster_3.2:_Starting_Gluster_Geo-replication
     52 /community/documentation/index.php/Gluster_3.2:_gluster_Command
     43 /community/documentation/index.php/Main_Page
     37 /community/documentation/index.php/Translators/storage/bdb
     37
/community/documentation/index.php/Gluster_3.2:_Monitoring_your_GlusterFS_Workload
     36 /community/documentation/index.php/Gluster_3.2:_Terminology
     35
/community/documentation/index.php/Gluster_3.2:_Displaying_Volume_Information
     29 /community/documentation/index.php/Gluster_3.2:_Expanding_Volumes
     24
/community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes
     22 /community/documentation/index.php/GlusterFS_Concepts
     21
/community/documentation/index.php/Gluster_3.2:_Configuring_Distributed_Striped_Volumes
     16 /community/documentation/index.php/User_Guide
     16 /community/documentation/index.php/Gluster_3.2:_Tuning_Volume_Options
     16 /community/documentation/index.php/Getting_started_overview
     15
/community/documentation/index.php/Gluster_3.4:_Brick_Restoration_-_Replace_Crashed_Server
     14
/community/documentation/index.php/Gluster_3.1:_Understanding_the_GlusterFS_License
     12 /community/documentation/index.php/Translators/performance
     12 /community/documentation/index.php/Gluster_Translators
     12 /community/documentation/index.php/GlusterHPC_FAQ
     12
/community/documentation/index.php/Gluster_3.2:_Manually_Mounting_Volumes_Using_NFS
     12 /community/documentation/index.php/Getting_started_test_it_out
     10 /community/documentation/index.php/About_GlusterFS_3.3
      9
/community/documentation/index.php/Gluster_3.2:_Installing_GlusterFS_on_Red_Hat_Package_Manager_(RPM)_Distributions
      9
/community/documentation/index.php/Gluster_3.2:_GlusterFS_Geo-replication_Deployment_Overview
      9 /community/documentation/index.php/Documenting_the_undocumented
      8 /community/documentation/index.php/MediaWiki:Userlogin
      8
/community/documentation/index.php/Gluster_3.2:_Updating_Memory_Cache_Size

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug https://bugzilla.redhat.com/token.cgi?t=zZF2Ablchd&a=cc_unsubscribe


More information about the Gluster-infra mailing list