Factory

The Factory object saves each components instance as its properties. And it’s necessary that pass factory object into Crawler object. It would provide each components Crawler object needs to use in process.

But in version 0.2.0, it also could register components instance via function register_factory of Crawler object.

Please refer these 2 ways usage as following:

  • Pass Factory which saves components into Crawler by option factory

First of all, instantiate components and save it as factory object’s properties:

_cf = CrawlerFactory()
_cf.http_factory = YourHTTPRequest()
_cf.parser_factory = YourHTTPResponseParser()
_cf.data_handling_factory = YourDataHandler()

Pass the CrawlerFactory as option factory and run it via function run:

sc = SimpleCrawler(factory=_cf)
data = sc.run("GET", "http://www.example.com/")
print(f"[DEBUG] data: {data}")
# [DEBUG] data: Example Domain
  • Register components to Crawler by function register_factory

Register the factories via function register_factory and run it:

sc = SimpleCrawler(factory=_cf)
sc.register_factory(
    http_req_sender=RequestsHTTPRequest(),
    http_resp_parser=RequestsExampleHTTPResponseParser(),
    data_process=ExampleDataHandler()
)

data = sc.run("GET", "http://www.example.com/")
print(f"[DEBUG] data: {data}")
# [DEBUG] data: Example Domain

CrawlerFactory

class smoothcrawler.factory.CrawlerFactory[source]
property http_factory: HTTP

A property for component HTTP sender.

Returns

HTTP sender instance. It should be HTTP or AsyncHTTP type object.

property parser_factory: BaseHTTPResponseParser

A property for component HTTP response parser.

Returns

HTTP sender instance. It should be BaseHTTPResponseParser or BaseAsyncHTTPResponseParser type object.

property data_handling_factory: BaseDataHandler

A property for component data processing.

Returns

HTTP sender instance. It should be BaseDataHandler or BaseAsyncDataHandler type object.

property persistence_factory: PersistenceFacade

A property for component persistence.

Returns

HTTP sender instance. It should be PersistenceFacade type object.

AsyncCrawlerFactory

class smoothcrawler.factory.AsyncCrawlerFactory[source]
property http_factory: AsyncHTTP

A property for component asynchronous version of HTTP sender.

Returns

HTTP sender instance. It should be HTTP or AsyncHTTP type object.

property parser_factory: BaseAsyncHTTPResponseParser

A property for component asynchronous version of HTTP response parser.

Returns

HTTP sender instance. It should be BaseHTTPResponseParser or BaseAsyncHTTPResponseParser type object.

property data_handling_factory: BaseAsyncDataHandler

A property for component asynchronous version of data processing.

Returns

HTTP sender instance. It should be BaseDataHandler or BaseAsyncDataHandler type object.

property persistence_factory: PersistenceFacade

A property for component asynchronous version of persistence.

Returns

HTTP sender instance. It should be PersistenceFacade type object.