API Overview
Overview
Data Pipeline implements the Decorator Pattern as a way of chaining together operations to be applied to a data stream. The bulk of the API consists of Data Readers and Data Writers.
| DataEndpoint | Abstract super-class for reading and writing records. Common operations like open(), close(), and getRecordCount() are defined here. |
|---|---|
| DataEndpointGroup | Helper class--makes it easy to work with multiple readers and writers as a single unit. |
| DataReader | Abstract super-class for reading records. The only method that a subclass must implement is readImpl(), however, most subclasses will also override open() and close(). |
| DataWriter | Abstract super-class for writing records. The only method that a subclass must implement is writeImpl(DataReader, Record), however, most subclasses will also override open() and close(). |
Reader and writers generally come in 2 forms: Primary Endpoints and Secondary Endpoints.
Primary Endpoints
Primary endpoints are used to transfer data directly to and from physical sources. They include:
- CSVReader & CSVWriter
- JdbcReader & JdbcWriter
- ExcelReader & ExcelWriter
Secondary Endpoints
Secondary endpoints act on other endpoints. They intercept and transform data as it passes through them (adding and removing records, updating fields, performing lookups, etc.). Secondary endpoints will generally subclass either ProxyReader or ProxyWriter and includes:
- FilteringReader
- SortingReader
- ThrottledReader & ThrottledWriter
Data Readers
| AbstractReader | Abstract super-class, with some common logic, for reading records. |
|---|---|
| AggregateReader | A proxy that applies set functions (sum, max, etc.) to fields of records passing through. |
| AsyncReader | A proxy that reads data asynchronously using a separate thread. |
| CSVReader | Obtains records from a Comma Separated Value (CSV) stream. |
| DebugReader | A proxy that prints records passing through to a stream in a human-readable format. |
| ExcelReader | Obtains records from an excel document. |
| FileReader | Obtains records from a binary stream previously written using a FileWriter. |
| FilteringReader | A proxy that applies filters to determine which records passes through. |
| JdbcReader | Obtains records from a database query. |
| MemoryReader | Obtains records from an in-memory RecordList. |
| MeteredReader | A proxy that measures the rate (bytes/second) at which data is read. |
| ProxyReader | Abstract super-class for obtaining records from another DataReader, possibly transforming them along the way. The only method that a subclass should implement is interceptRecord(Record). |
| RemoveDuplicatesReader | A proxy that removes duplicate records. |
| SortingReader | A proxy that sorts records. |
| TextReader | Abstract super-class for obtaining records from a text stream. |
| ThrottledReader | A proxy that limits the rate (bytes/second) at which data is read. |
| TransformingReader | A proxy that applies transformations to records passing through. |
| ValidatingReader | A proxy that validates records by applying a set of filters. |
Data Writers
| AbstractWriter | Abstract super-class, with some common logic, for writing records. |
|---|---|
| CSVWriter | Writes records to a Comma Separated Value (CSV) stream. |
| FileWriter | Writes records to a binary stream that can be later read using a FileReader. |
| JdbcWriter | Writes records to a database table. |
| MemoryWriter | Writes records to an in-memory RecordList. |
| MeteredWriter | A proxy that measures the rate (bytes/second) at which data is written. |
| MultiWriter | Writes records to multiple DataWriter. |
| NullWriter | Discards records. |
| ProxyWriter | Abstract super-class for writing records to another DataWriter, possibly transforming them along the way. The only method that a subclass should implement is interceptRecord(Record). |
| StreamWriter | Writes records to a stream in a human-readable format. |
| TextWriter | Abstract super-class for writing records to a text stream. |
| ThrottledWriter | A proxy that limits the rate (bytes/second) at which data is written. |