87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
without "href=" attributes. Then, javascript code is added to the
end of the page that goes back and fills in the "href=" attributes of
the anchor tags with the hyperlink targets, thus enabling the hyperlinks.
This extra step of using javascript to enable the hyperlink targets
is a security measure against spiders that forge a human-looking
UserAgent string. Most spiders do not bother to run javascript and
so to the spider the empty anchor tag will be useless. But all modern
web browsers implement javascript, so hyperlinks will appears
normally for human users.
<h2>Further defenses</h2>
Recently (as of this writing, in the spring of 2013) the Fossil server
on the SQLite website ([http://www.sqlite.org/src/]) has been hit repeatedly
by Chinese spiders that use forged UserAgent strings to make them look
|
|
|
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
|
without "href=" attributes. Then, javascript code is added to the
end of the page that goes back and fills in the "href=" attributes of
the anchor tags with the hyperlink targets, thus enabling the hyperlinks.
This extra step of using javascript to enable the hyperlink targets
is a security measure against spiders that forge a human-looking
UserAgent string. Most spiders do not bother to run javascript and
so to the spider the empty anchor tag will be useless. But all modern
web browsers implement javascript, so hyperlinks will show up
normally for human users.
<h2>Further defenses</h2>
Recently (as of this writing, in the spring of 2013) the Fossil server
on the SQLite website ([http://www.sqlite.org/src/]) has been hit repeatedly
by Chinese spiders that use forged UserAgent strings to make them look
|
127
128
129
130
131
132
133
134
135
136
137
138
139
140
|
delay is 10 milliseconds. The idea here is that a spider will try to
render the page immediately, and will not wait for delayed scripts
to be run, thus will never enable the hyperlinks.
These two subsettings can be used separately or together. If used together,
then the delay timer does not start until after the first mouse movement
is detected.
<h2>The ongoing struggle</h2>
Fossil currently does a very good job of providing easy access to humans
while keeping out troublesome robots and spiders. However, spiders and
bots continue to grow more sophisticated, requiring ever more advanced
defenses. This "arms race" is unlikely to ever end. The developers of
|
>
>
>
>
|
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
|
delay is 10 milliseconds. The idea here is that a spider will try to
render the page immediately, and will not wait for delayed scripts
to be run, thus will never enable the hyperlinks.
These two subsettings can be used separately or together. If used together,
then the delay timer does not start until after the first mouse movement
is detected.
See also [./server.wiki#loadmgmt|Managing Server Load] for a description
of how expensive pages can be disabled when the server is under heavy
load.
<h2>The ongoing struggle</h2>
Fossil currently does a very good job of providing easy access to humans
while keeping out troublesome robots and spiders. However, spiders and
bots continue to grow more sophisticated, requiring ever more advanced
defenses. This "arms race" is unlikely to ever end. The developers of
|