Python的`urllib2`:为什么当我打开维基百科页面时会出现错误403?

我有一个奇怪的错误,当试图从一个页面维基百科。这是页面:

http://en.wikipedia.org/wiki/OpenCola_(drink)

这是shell会话:

>>> f = urllib2.urlopen('http://en.wikipedia.org/wiki/OpenCola_(drink)')
Traceback (most recent call last):
  File "C:\Program Files\Wing IDE 4.0\src\debug\tserver\_sandbox.py", line 1, in <module>
    # Used internally for debug sandbox under external interpreter
  File "c:\Python26\Lib\urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "c:\Python26\Lib\urllib2.py", line 397, in open
    response = meth(req, response)
  File "c:\Python26\Lib\urllib2.py", line 510, in http_response
    'http', request, response, code, msg, hdrs)
  File "c:\Python26\Lib\urllib2.py", line 435, in error
    return self._call_chain(*args)
  File "c:\Python26\Lib\urllib2.py", line 369, in _call_chain
    result = func(*args)
  File "c:\Python26\Lib\urllib2.py", line 518, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 403: Forbidden

这发生在我在不同大陆的两个不同的系统。有人知道为什么会发生这种情况吗?

最佳答案
Wikipedias stance is

Data retrieval: Bots may not be used
to retrieve bulk content for any use
not directly related to an approved
bot task. This includes dynamically
loading pages from another website,
which may result in the website being
blacklisted and permanently denied
access. If you would like to download
bulk content or mirror a project,
please do so by downloading or hosting
your own copy of our database.

这就是为什么Python被封锁。你应该download data dumps

无论如何,你可以在Python 2中阅读这样的页面:

req = urllib2.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib2.urlopen( req )
print con.read()

或者在Python 3:

import urllib
req = urllib.request.Request(url, headers={'User-Agent' : "Magic Browser"}) 
con = urllib.request.urlopen( req )
print con.read()

转载注明原文:Python的`urllib2`:为什么当我打开维基百科页面时会出现错误403? - 代码日志