Плохие боты и защита от нагрузки

Плохие боты и защита от нагрузки
просмотров: 721518 апреля 2012 года

Существует такое понятие, как "плохие боты" - это пауки (роботы), которые, могут приносить вред сайту создавая дополнительную нагрузку. Так же данные боты могут воровать информацию с сайта, например, выкачивать сайт для оффлайн просмотра, а нам нужны посетители smile...

Такие боты можно и даже нужно блокировать. Для этого, необходимо загрузить в корень сайта файл .htaccess с содержимым:

RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} 360Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^attach [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^CICC [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^DA [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^lwp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^Memo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^RMA [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^VCI [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Aboundex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} aesop_com_spiderman [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AhrefsBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} AIBOT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Alexibot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} almaden [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Anarchie [NC,OR]
RewriteCond %{HTTP_USER_AGENT} anonymouse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Aport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Art-Online [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ASPSeek [NC,OR]
RewriteCond %{HTTP_USER_AGENT} asterias [NC,OR]
RewriteCond %{HTTP_USER_AGENT} autoemailspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BackDoorbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BackWeb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Baiduspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BatchFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bingbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Birubot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Black.Hole [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlackWidow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BLEXBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BlowFish [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bot\ mailto:[email protected] [NC,OR]
RewriteCond %{HTTP_USER_AGENT} botALot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Buddy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BuiltbotTough [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Bullseye [NC,OR]
RewriteCond %{HTTP_USER_AGENT} bumblebee [NC,OR]
RewriteCond %{HTTP_USER_AGENT} BunnySlippers [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Butterfly [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CamontSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Cegbfeieh [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Cheesebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CherryPicker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ChinaClaw [NC,OR]
RewriteCond %{HTTP_USER_AGENT} clip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} clshttp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Cogentbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Copier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} CopyRightCheck [NC,OR]
RewriteCond %{HTTP_USER_AGENT} cosmos [NC,OR]
RewriteCond %{HTTP_USER_AGENT} craftbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Crescent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Custo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Demon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Density [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Devil [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DIIbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DISCo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DittoSpyder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} dlman [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DotBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} download [NC,OR]
RewriteCond %{HTTP_USER_AGENT} dragonfly [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Drip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} DSurf15a [NC,OR]
RewriteCond %{HTTP_USER_AGENT} easydl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EasyDL/2.99 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ebingbong [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eCatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EirGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} email [NC,OR]
RewriteCond %{HTTP_USER_AGENT} enhancer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EroCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} exabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Express\ WebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExpresssWebPictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} extract [NC,OR]
RewriteCond %{HTTP_USER_AGENT} eXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ExtractorPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} EyeNetIE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ezooms [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FairShare [NC,OR]
RewriteCond %{HTTP_USER_AGENT} fetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FileHound [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FlashGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} flunky [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Foobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} FrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetRight [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetSmart [NC,OR]
RewriteCond %{HTTP_USER_AGENT} getweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GetWeb! [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gigabaz [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Gigabot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go-Ahead-Got-It [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go.?is [NC,OR]
RewriteCond %{HTTP_USER_AGENT} go.?zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Go\!Zilla [NC,OR]
RewriteCond %{HTTP_USER_AGENT} gotit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} GrabNet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Grafula [NC,OR]
RewriteCond %{HTTP_USER_AGENT} grub-client [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Harvest [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Harvest-NG [NC,OR]
RewriteCond %{HTTP_USER_AGENT} hloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HMView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} httpdown [NC,OR]
RewriteCond %{HTTP_USER_AGENT} httplib [NC,OR]
RewriteCond %{HTTP_USER_AGENT} HTTrack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} humanlinks [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ia_archiver [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ilsebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Image\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} image\.coccoc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ImagesStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ImagesSucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ind[\sy]*Library [NC,OR]
RewriteCond %{HTTP_USER_AGENT} IndysLibrary [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InfonaviRobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} infotekies [NC,OR]
RewriteCond %{HTTP_USER_AGENT} intelliseek [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InterGET [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Internet\ Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetLinkagent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetSeer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} InternetsNinja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Iria [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ISC\ Systems\ iRc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Jakarta [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Java [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JBH*agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Jennybot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JetCar [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JOC [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JS-Kit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} JustView [NC,OR]
RewriteCond %{HTTP_USER_AGENT} jyxobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} kenjin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Kenjin\ Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Kenjin.Spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} keyword [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Keyword.Density [NC,OR]
RewriteCond %{HTTP_USER_AGENT} kmSearchBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} larbin [NC,OR]
RewriteCond %{HTTP_USER_AGENT} leacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LeechFTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LexiBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lftp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libweb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libWeb/clsHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww [NC,OR]
RewriteCond %{HTTP_USER_AGENT} libwww-perl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} likse [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Linguee [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Link [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lnspiderguy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} lwp-trivial [NC,OR]
RewriteCond %{HTTP_USER_AGENT} LWP::Simple [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mag[-]*Net [NC,OR]
RewriteCond %{HTTP_USER_AGENT} markwatch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mass[s\s]*Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mata.Hari [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MegaIndex [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Metasearch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Microsoft.URL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIDown[\.\s]*tool [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MIIxpc [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} missigua [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Missigua\ Locator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mister.*PiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MistersPiX [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MJ12bot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MLBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} moget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla*MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla.*Indy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla.*NEWT [NC,OR]
RewriteCond %{HTTP_USER_AGENT} mozilla.newt [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Mozilla/3.Mozilla/2.01 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MS\ FrontPage* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSFrontPage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSIECrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} msnbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} MSProxy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} musobot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nameprotect [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Navroad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NearSite [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Net.*Vampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetAnts [NC,OR]
RewriteCond %{HTTP_USER_AGENT} netcraft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetMechanic [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NetsVampire [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nextgensearchbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NICErsPRO [NC,OR]
RewriteCond %{HTTP_USER_AGENT} nimblecrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ninja [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NjuiceBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} NPbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Nutch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Octopus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline.*Explorer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Offline.*Navigator [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Openfind [NC,OR]
RewriteCond %{HTTP_USER_AGENT} outfoxbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PageGrabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pagerabber [NC,OR]
RewriteCond %{HTTP_USER_AGENT} papa [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pavuk [NC,OR]
RewriteCond %{HTTP_USER_AGENT} pcBrowser [NC,OR]
RewriteCond %{HTTP_USER_AGENT} php.?version.?tracker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PHPCrawl [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Ping [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PingALink [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pockey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PostRank [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ProgramsSharewares1 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ProPowerbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ProWebWalker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} psbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ptd-crawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Pump [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Purebot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} PycURL [NC,OR]
RewriteCond %{HTTP_USER_AGENT} QRVA [NC,OR]
RewriteCond %{HTTP_USER_AGENT} queryn [NC,OR]
RewriteCond %{HTTP_USER_AGENT} QueryN.Metasearch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RealDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Reaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Recorder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ReGet [NC,OR]
RewriteCond %{HTTP_USER_AGENT} RepoMonkey [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Scooter [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Seeker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SemrushBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Siphon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} site.quester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sitecheck.internetseer.com [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SiteSnagger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Slurp [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SlySearch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SmartDownload [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Snake [NC,OR]
RewriteCond %{HTTP_USER_AGENT} snapbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} snoopy [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sogou [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sogou\ web\ spider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sosospider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Soup [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SpaceBison [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Spankbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spanner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} spbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sproose [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SputnikBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SputnikImageBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} sqworm [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Stripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} suggybot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SuperHTTP [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Surfbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SurveyBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} suzuran [NC,OR]
RewriteCond %{HTTP_USER_AGENT} SWeb [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Szukacz [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Szukacz/1.4 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} tAkeOut [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Teleport [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Telesoft [NC,OR]
RewriteCond %{HTTP_USER_AGENT} The.Intraformant [NC,OR]
RewriteCond %{HTTP_USER_AGENT} TheNomad [NC,OR]
RewriteCond %{HTTP_USER_AGENT} TightTwatbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Titan [NC,OR]
RewriteCond %{HTTP_USER_AGENT} toCrawl/UrlDispatcher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} True_Robot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ttCrawler [NC,OR]
RewriteCond %{HTTP_USER_AGENT} turingos [NC,OR]
RewriteCond %{HTTP_USER_AGENT} turnitinbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Turnitinbot/1.5 [NC,OR]
RewriteCond %{HTTP_USER_AGENT} urldispatcher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URLSpiderPro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} URLy.Warning [NC,OR]
RewriteCond %{HTTP_USER_AGENT} User-Agent [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Vacuum [NC,OR]
RewriteCond %{HTTP_USER_AGENT} VoidEYE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Voyager [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web[\.\s]Image[\.\s]Collector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Downloader [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Web\ Sucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebAuto [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebBandit [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webclipping [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webcollage [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebCopier [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEMailExtrac.* [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebEnhancer [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebFetch [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebGo [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebHook [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebLeacher [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebmasterWorldForumBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMiner [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebMirror [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webpictures [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebReaper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebSauger [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebsImagesCollector [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Website [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebsiteseXtractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebsitesQuester [NC,OR]
RewriteCond %{HTTP_USER_AGENT} webspider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebsSucker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webster [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Webster.Pro [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebStripper [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WebWhacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WEP\ Search [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wget [NC,OR]
RewriteCond %{HTTP_USER_AGENT} whack [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Whacker [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Widow [NC,OR]
RewriteCond %{HTTP_USER_AGENT} wisenutbot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Wonder [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WordPress [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WWW-Collector-E [NC,OR]
RewriteCond %{HTTP_USER_AGENT} WWWOFFLE [NC,OR]
RewriteCond %{HTTP_USER_AGENT} x-Tractor [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xaldon [NC,OR]
RewriteCond %{HTTP_USER_AGENT} XaldonsWebSpider [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Xenu [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Yeti [NC,OR]
RewriteCond %{HTTP_USER_AGENT} YottosBot [NC,OR]
RewriteCond %{HTTP_USER_AGENT} Zeus [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zip [NC,OR]
RewriteCond %{HTTP_USER_AGENT} zyborg
RewriteRule .* - [F,L]

Альтернативный способ с регуляркой (список ботов важно записать в одну строку!):

RewriteCond %{HTTP:User-Agent} (?:360Spider|^attach|^CICC|^DA|^lwp|^Memo|^RMA|^VCI|Aboundex|aesop_com_spiderman|AhrefsBot|AIBOT|Alexibot|almaden|Anarchie|anonymouse|Aport|Art-Online|ASPSeek|asterias|autoemailspider|BackDoorbot|BackWeb|Baiduspider|Bandit|BatchFTP|bingbot|Birubot|Black.Hole|BlackWidow|BLEXBot|BlowFish|Bot\ mailto:[email protected]|botALot|Buddy|BuiltbotTough|Bullseye|bumblebee|BunnySlippers|Butterfly|CamontSpider|Cegbfeieh|Cheesebot|CherryPicker|ChinaClaw|clip|clshttp|Cogentbot|Collector|Copier|CopyRightCheck|cosmos|craftbot|Crescent|Custo|Demon|Density|Devil|DIIbot|DISCo|DittoSpyder|dlman|DotBot|download|dragonfly|Drip|DSurf15a|easydl|EasyDL/2.99|ebingbong|eCatch|EirGrabber|email|enhancer|EroCrawler|exabot|Express\ WebPictures|ExpresssWebPictures|extract|eXtractor|ExtractorPro|EyeNetIE|Ezooms|FairShare|fetch|FileHound|FlashGet|flunky|Foobot|FrontPage|GetRight|GetSmart|getweb|GetWeb!|gigabaz|Gigabot|Go!Zilla|Go-Ahead-Got-It|go.?is|go.?zilla|Go\!Zilla|gotit|Grabber|GrabNet|Grafula|grub-client|Harvest|Harvest-NG|hloader|HMView|httpdown|httplib|HTTrack|humanlinks|ia_archiver|ilsebot|Image\ Stripper|Image\ Sucker|image\.coccoc|ImagesStripper|ImagesSucker|Ind[\sy]*Library|IndysLibrary|InfonaviRobot|infotekies|intelliseek|InterGET|Internet\ Ninja|InternetLinkagent|InternetSeer|InternetSeer.com|InternetsNinja|Iria|ISC\ Systems\ iRc|Jakarta|Java|JBH*agent|Jennybot|JetCar|JOC|JS-Kit|JustView|jyxobot|kenjin|Kenjin\ Spider|Kenjin.Spider|keyword|Keyword.Density|kmSearchBot|larbin|leacher|LeechFTP|LexiBot|lftp|libweb|libWeb/clsHTTP|libwww|libwww-perl|likse|Linguee|Link|lnspiderguy|lwp-trivial|LWP::Simple|Mag[-]*Net|markwatch|Mass[s\s]*Downloader|Mata.Hari|MegaIndex|Metasearch|Microsoft.URL|MIDown[\.\s]*tool|MIIxpc|Mirror|missigua|Missigua\ Locator|Mister.*PiX|MistersPiX|MJ12bot|MLBot|moget|Mozilla*MSIECrawler|Mozilla.*Indy|Mozilla.*NEWT|mozilla.newt|Mozilla/3.Mozilla/2.01|MS\ FrontPage*|MSFrontPage|MSIECrawler|msnbot|MSProxy|musobot|nameprotect|Navroad|NearSite|Net.*Vampire|NetAnts|netcraft|NetMechanic|NetSpider|NetsVampire|nextgensearchbot|NICErsPRO|nimblecrawler|Ninja|NjuiceBot|NPbot|Nutch|Octopus|Offline.*Explorer|Offline.*Navigator|Openfind|outfoxbot|PageGrabber|Pagerabber|papa|pavuk|pcBrowser|php.?version.?tracker|PHPCrawl|Ping|PingALink|Pockey|PostRank|ProgramsSharewares1|ProPowerbot|ProWebWalker|psbot|ptd-crawler|Pump|Purebot|PycURL|QRVA|queryn|QueryN.Metasearch|RealDownload|Reaper|Recorder|ReGet|RepoMonkey|sauger|Scooter|Seeker|SemrushBot|Siphon|site.quester|SiteBot|sitecheck.internetseer.com|SiteSnagger|Slurp|SlySearch|SmartDownload|Snake|snapbot|snoopy|sogou|Sogou\ web\ spider|Sosospider|Soup|SpaceBison|Spankbot|spanner|spbot|sproose|SputnikBot|SputnikImageBot|sqworm|Stripper|Sucker|suggybot|SuperBot|SuperHTTP|Surfbot|SurveyBot|suzuran|SWeb|Szukacz|Szukacz/1.4|tAkeOut|Teleport|Telesoft|The.Intraformant|TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|True_Robot|ttCrawler|turingos|turnitinbot|Turnitinbot/1.5|urldispatcher|URLSpiderPro|URLy.Warning|User-Agent|Vacuum|VoidEYE|Voyager|Web[\.\s]Image[\.\s]Collector|Web\ Downloader|Web\ Sucker|WebAuto|WebBandit|Webclipping|webcollage|WebCopier|WebEMailExtrac.*|WebEnhancer|WebFetch|WebGo|WebHook|WebLeacher|WebmasterWorldForumBot|WebMiner|WebMirror|webpictures|WebReaper|WebSauger|WebsImagesCollector|Website|WebsiteseXtractor|WebsitesQuester|webspider|WebsSucker|Webster|Webster.Pro|WebStripper|WebWhacker|WEP\ Search|Wget|whack|Whacker|Widow|wisenutbot|Wonder|WordPress|WWW-Collector-E|WWWOFFLE|x-Tractor|Xaldon|XaldonsWebSpider|Xenu|Yeti|YottosBot|Zeus|zip|zyborg) [NC]
RewriteRule .* - [F,L]

Список наверняка не полный, но это все что мне удалось найти на данный момент.

Наиболее полный список известных мне "плохих" ботов: Bad Bots List

Список User Agent браузеров, роботов и пауков поисковых машин, веб-каталогов, менеджеров закачек, спам-ботов и плохих ботов можно найти на сайте List of User-Agents.

 

Не большой Вам нагрузки на Ваших сайтах! wink

Поделиться

Что скажем?