Public Properties |
---|
URL-related information |
file | | The name of the requested page or file, e.g. "page.html". |
host | | The host-part of the URL of the requested page or file, e.g. "www.foo.com". |
path | | The path in the URL of the requested page or file, e.g. "/page/". |
port | | The port of the URL the request was send to, e.g. 80 |
protocol | | The protocol-part of the URL of the page or file, e.g. "http://" |
query | | The query-part of the URL of the requested page or file, e.g. "?x=y". |
url | | The complete, full qualified URL of the page or file, e.g. "http://www.foo.com/bar/page.html?x=y". |
Content-related information |
bytes_received | | The number of bytes the crawler received of the content of the document. |
content | | The content of the requested document (html-sourcecode or content of file). |
content_tmp_file | | The temporary file to which the content was received. |
content_type | | The content-type of the page or file, e.g. "text/html" or "image/gif". |
cookies | | Cookies send by the server. |
header | | The complete HTTP-header the webserver responded with this page or file. |
header_bytes_received | | The number of bytes the crawler received of the header of the document. |
http_status_code | | The HTTP-statuscode the webserver responded for the request, e.g. 200 (OK) or 404 (file not found). |
meta_attributes | | All meta-tag atteributes found in the source of the document. |
received | | Flag indicating whether content was received from the page or file. |
received_completely | | Flag indicating whether content was completely received from the page or file. |
received_to_file | | Will be true if the content was received into temporary file. |
received_to_memory | | Will be true if the content was received into local memory. |
responseHeader | | The complete HTTP-header the webserver responded with this page or file as a PHPCrawlerResponseHeader-object. |
source | | Same as "content", the content of the requested document. |
Information about found links |
links_found | | An numeric array containing information about all links that were found in the source of the page. |
links_found_url_descriptors | | An numeric array containing a PHPCrawlerURLDescriptor-object for every link that was found in the page. |
Referer information |
referer_url | | The complete URL of the page that contained the link to this document. |
refering_link_raw | | Contains the raw link as it was found in the content of the refering URL. (E.g. "../foo.html") |
refering_linkcode | | The html-sourcecode that contained the link to the current document. |
refering_linktext | | The linktext of the link that "linked" to this document. |
Error-handling |
error_code | | The code of the error that perhaps occured while requesting/receiving the document.
(See PHPCrawlerRequestErrors::ERROR_... - constants) |
error_occured | | Indicates whether an error occured while requesting/receiving the document. |
error_string | | A representig, human readable string for the error that perhaps occured while requesting/receiving the document. |
Benchmarks |
data_transfer_rate | | The approximated data-transferrate for this document. |
data_transfer_time | | The approximated time it took to receive the data of the document. |
server_connect_time | | The time it took to connect to the server |
server_response_time | | The server response time |
unbuffered_bytes_read | | Number of unbuffered bytes received |
Deprecated |
received_completly | | Alias for received_completely, was spelled wrong in prevoius versions of phpcrawl. (deprecated!) |
Other |
header_send | | The complete HTTP-request-header the crawler sent to the server (debugging info). |
traffic_limit_reached | | Indicated whether the traffic-limit set by the user was reached after downloading this document. |