Skip to content

Commit a326902

Browse files
PLUGIN-64 added teradata plugin and tests
Added missing javadocs and fixed pom.xml versions Changed copyright to 2020 Update for parent release version bump
1 parent cd931a9 commit a326902

34 files changed

+2623
-1
lines changed

README.md

+12-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ SAP HANA requires that password for DB is provided through url.
2727
Convenience script ```docker-compose/db-plugins-env/saphana-password-server.sh```
2828
provided for this purpose.
2929

30-
Netezza requires VMware Player for running Netezza emulator.
30+
Netezza and Teradata require VMware Player for running emulator.
3131

3232
* [Install Docker Compose](https://docs.docker.com/compose/install/)
3333
* Build local docker images
@@ -66,6 +66,11 @@ grant all on *.* to 'root'@'%' identified by 'root' with grant option;
6666
* [Install and start Netezza emulator](http://dwgeek.com/install-vmware-player-netezza-emulator.html/)
6767
* Create database `mydb` in Netezza emulator
6868

69+
70+
* [Install and start Teradata Express](https://downloads.teradata.com/download/files/7671/200652/1/B035-5948-018K.pdf)
71+
* Create database `mydb` in Teradata Express
72+
* Create user `test` with password `test` in Teradata Express
73+
6974
### Properties
7075
#### MySQL
7176
* **mysql.host** - Server host. Default: localhost.
@@ -110,6 +115,12 @@ grant all on *.* to 'root'@'%' identified by 'root' with grant option;
110115
* **memsql.database** - Server namespace for test databases. Default: mydb.
111116
* **memsql.username** - Server username. Default: root.
112117
* **memsql.password** - Server password. Default: root.
118+
#### Teradata
119+
* **teradata.host** - Server host. Default: localhost.
120+
* **teradata.port** - Server port. Default: 1025.
121+
* **teradata.database** - Server namespace for test databases. Default: mydb.
122+
* **teradata.username** - Server username. Default: test.
123+
* **teradata.password** - Server password. Default: test.
113124
#### Aurora MySQL
114125
* **auroraMysql.clusterEndpoint** - Cluster endpoint.
115126
* **auroraMysql.port** - Server port.

pom.xml

+1
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,7 @@
4242
<module>saphana-plugin</module>
4343
<module>cloudsql-mysql-plugin</module>
4444
<module>cloudsql-postgresql-plugin</module>
45+
<module>teradata-plugin</module>
4546
</modules>
4647

4748
<licenses>
+48
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,48 @@
1+
# Teradata Action
2+
3+
4+
Description
5+
-----------
6+
Action that runs a Teradata command.
7+
8+
9+
Use Case
10+
--------
11+
The action can be used whenever you want to run a Teradata command before or after a data pipeline.
12+
For example, you may want to run a sql update command on a database before the pipeline source pulls data from tables.
13+
14+
15+
Properties
16+
----------
17+
**Driver Name:** Name of the JDBC driver to use.
18+
19+
**Database Command:** Database command to execute.
20+
21+
**Host:** Host that Teradata is running on.
22+
23+
**Port:** Port that Teradata is running on.
24+
25+
**Database:** Teradata database name.
26+
27+
**Username:** User identity for connecting to the specified database.
28+
29+
**Password:** Password to use to connect to the specified database.
30+
31+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
32+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
33+
34+
Example
35+
-------
36+
Suppose you want to execute a query against a Teradata database named "prod" that is running on "localhost"
37+
port 1025 (Ensure that the driver for Teradata is installed. You can also provide driver name for some specific driver,
38+
otherwise "teradata" will be used), then configure the plugin with:
39+
40+
```
41+
Driver Name: "teradata"
42+
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6"
43+
Host: "localhost"
44+
Port: 1025
45+
Database: "prod"
46+
Username: "dbc"
47+
Password: "dbc"
48+
```
+93
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
# Teradata Batch Sink
2+
3+
4+
Description
5+
-----------
6+
Writes records to a Teradata table. Each record will be written to a row in the table.
7+
8+
9+
Use Case
10+
--------
11+
This sink is used whenever you need to write to a Teradata table.
12+
Suppose you periodically build a recommendation model for products on your online store.
13+
The model is stored in a FileSet and you want to export the contents
14+
of the FileSet to a Teradata table where it can be served to your users.
15+
16+
Column names would be autodetected from input schema.
17+
18+
Properties
19+
----------
20+
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc.
21+
22+
**Driver Name:** Name of the JDBC driver to use.
23+
24+
**Host:** Host that Teradata is running on.
25+
26+
**Port:** Port that Teradata is running on.
27+
28+
**Database:** Teradata database name.
29+
30+
**Table Name:** Name of the table to export to.
31+
32+
**Username:** User identity for connecting to the specified database.
33+
34+
**Password:** Password to use to connect to the specified database.
35+
36+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
37+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
38+
39+
Example
40+
-------
41+
Suppose you want to write output records to "users" table of Teradata database named "prod" that is running on "localhost",
42+
port 1025, as "root" user with "root" password (Ensure that the driver for Teradata is installed. You can also provide
43+
driver name for some specific driver, otherwise "teradata" will be used), then configure the plugin with:
44+
45+
```
46+
Reference Name: "snk1"
47+
Driver Name: "teradata"
48+
Host: "localhost"
49+
Port: 1025
50+
Database: "prod"
51+
Table Name: "users"
52+
Username: "dbc"
53+
Password: "dbc"
54+
```
55+
Data Types Mapping
56+
------
57+
Teradata specific data types mapped to string and can have multiple input formats and one 'canonical' output form.
58+
Please, refer to Teradata data types documentation to figure out proper formats.
59+
60+
| Teradata Data Type | CDAP Schema Data Type | Comment |
61+
|-----------------------------------------------------|-----------------------|----------------------------------------------|
62+
| BYTEINT | INT | |
63+
| SMALLINT | INT | |
64+
| INTEGER | INT | |
65+
| BIGINT | LONG | |
66+
| DECIMAL/NUMERIC | DECIMAL | |
67+
| FLOAT/REAL/DOUBLE PRECISION | DOUBLE | |
68+
| NUMBER | DECIMAL | |
69+
| BYTE | BYTES | |
70+
| VARBYTE | BYTES | |
71+
| BLOB | BYTES | |
72+
| CHAR | STRING | |
73+
| VARCHAR | STRING | |
74+
| CLOB | STRING | |
75+
| DATE | DATE | |
76+
| TIME | TIME_MICROS | |
77+
| TIMESTAMP | TIMESTAMP_MICROS | |
78+
| TIME WITH TIME ZONE | TIME_MICROS | |
79+
| TIMESTAMP WITH TIME ZONE | TIMESTAMP_MICROS | |
80+
| INTERVAL YEAR | STRING | |
81+
| INTERVAL YEAR TO MONTH | STRING | |
82+
| INTERVAL MONTH | STRING | |
83+
| INTERVAL DAY | STRING | |
84+
| INTERVAL DAY TO HOUR | STRING | |
85+
| INTERVAL DAY TO MINUTE | STRING | |
86+
| INTERVAL DAY TO SECOND | STRING | |
87+
| INTERVAL HOUR | STRING | |
88+
| INTERVAL HOUR TO MINUTE | STRING | |
89+
| INTERVAL HOUR TO SECOND | STRING | |
90+
| INTERVAL MINUTE | STRING | |
91+
| INTERVAL MINUTE TO SECOND | STRING | |
92+
| INTERVAL SECOND | STRING | |
93+
| ST_Geometry | STRING | |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
# Teradata Batch Source
2+
3+
4+
Description
5+
-----------
6+
Reads from a Teradata using a configurable SQL query.
7+
Outputs one record for each row returned by the query.
8+
9+
10+
Use Case
11+
--------
12+
The source is used whenever you need to read from a Teradata. For example, you may want
13+
to create daily snapshots of a database table by using this source and writing to
14+
a TimePartitionedFileSet.
15+
16+
17+
Properties
18+
----------
19+
**Reference Name:** Name used to uniquely identify this source for lineage, annotating metadata, etc.
20+
21+
**Driver Name:** Name of the JDBC driver to use.
22+
23+
**Host:** Host that Teradata is running on.
24+
25+
**Port:** Port that Teradata is running on.
26+
27+
**Database:** Teradata database name.
28+
29+
**Import Query:** The SELECT query to use to import data from the specified table.
30+
You can specify an arbitrary number of columns to import, or import all columns using \*. The Query should
31+
contain the '$CONDITIONS' string. For example, 'SELECT * FROM table WHERE $CONDITIONS'.
32+
The '$CONDITIONS' string will be replaced by 'splitBy' field limits specified by the bounding query.
33+
The '$CONDITIONS' string is not required if numSplits is set to one.
34+
35+
**Bounding Query:** Bounding Query should return the min and max of the values of the 'splitBy' field.
36+
For example, 'SELECT MIN(id),MAX(id) FROM table'. Not required if numSplits is set to one.
37+
38+
**Split-By Field Name:** Field Name which will be used to generate splits. Not required if numSplits is set to one.
39+
40+
**Number of Splits to Generate:** Number of splits to generate.
41+
42+
**Username:** User identity for connecting to the specified database.
43+
44+
**Password:** Password to use to connect to the specified database.
45+
46+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
47+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
48+
49+
**Schema:** The schema of records output by the source. This will be used in place of whatever schema comes
50+
back from the query. However, it must match the schema that comes back from the query,
51+
except it can mark fields as nullable and can contain a subset of the fields.
52+
53+
54+
Example
55+
------
56+
Suppose you want to read data from Teradata database named "prod" that is running on "localhost" port 1025,
57+
as "postgres" user with "postgres" password (Ensure that the driver for Teradata is installed. You can also provide
58+
driver name for some specific driver, otherwise "teradata" will be used), then configure plugin with:
59+
60+
61+
```
62+
Reference Name: "src1"
63+
Driver Name: "teradata"
64+
Host: "localhost"
65+
Port: 1025
66+
Database: "prod"
67+
Import Query: "select id, name, email, phone from users;"
68+
Number of Splits to Generate: 1
69+
Username: "dbc"
70+
Password: "dbc"
71+
```
72+
73+
For example, if the 'id' column is a primary key of type int and the other columns are
74+
non-nullable varchars, output records will have this schema:
75+
76+
| field name | type |
77+
| -------------- | ------------------- |
78+
| id | int |
79+
| name | string |
80+
| email | string |
81+
| phone | string |
82+
83+
Data Types Mapping
84+
------
85+
Teradata specific data types mapped to string and can have multiple input formats and one 'canonical' output form.
86+
Please, refer to Teradata data types documentation to figure out proper formats.
87+
88+
| Teradata Data Type | CDAP Schema Data Type | Comment |
89+
|-----------------------------------------------------|-----------------------|----------------------------------------------|
90+
| BYTEINT | INT | |
91+
| SMALLINT | INT | |
92+
| INTEGER | INT | |
93+
| BIGINT | LONG | |
94+
| DECIMAL/NUMERIC | DECIMAL | |
95+
| FLOAT/REAL/DOUBLE PRECISION | DOUBLE | |
96+
| NUMBER | DECIMAL | |
97+
| BYTE | BYTES | |
98+
| VARBYTE | BYTES | |
99+
| BLOB | BYTES | |
100+
| CHAR | STRING | |
101+
| VARCHAR | STRING | |
102+
| CLOB | STRING | |
103+
| DATE | DATE | |
104+
| TIME | TIME_MICROS | |
105+
| TIMESTAMP | TIMESTAMP_MICROS | |
106+
| TIME WITH TIME ZONE | TIME_MICROS | |
107+
| TIMESTAMP WITH TIME ZONE | TIMESTAMP_MICROS | |
108+
| INTERVAL YEAR | STRING | |
109+
| INTERVAL YEAR TO MONTH | STRING | |
110+
| INTERVAL MONTH | STRING | |
111+
| INTERVAL DAY | STRING | |
112+
| INTERVAL DAY TO HOUR | STRING | |
113+
| INTERVAL DAY TO MINUTE | STRING | |
114+
| INTERVAL DAY TO SECOND | STRING | |
115+
| INTERVAL HOUR | STRING | |
116+
| INTERVAL HOUR TO MINUTE | STRING | |
117+
| INTERVAL HOUR TO SECOND | STRING | |
118+
| INTERVAL MINUTE | STRING | |
119+
| INTERVAL MINUTE TO SECOND | STRING | |
120+
| INTERVAL SECOND | STRING | |
121+
| ST_Geometry | STRING | |
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
# Teradata Query Post-run Action
2+
3+
4+
Description
5+
-----------
6+
Runs a Teradata query at the end of the pipeline run.
7+
Can be configured to run only on success, only on failure, or always at the end of the run.
8+
9+
10+
Use Case
11+
--------
12+
The action is used whenever you need to run a query at the end of a pipeline run.
13+
For example, you may have a pipeline that imports data from a database table to
14+
hdfs files. At the end of the run, you may want to run a query that deletes the data
15+
that was read from the table.
16+
17+
18+
Properties
19+
----------
20+
**Run Condition:** When to run the action. Must be 'completion', 'success', or 'failure'. Defaults to 'success'.
21+
If set to 'completion', the action will be executed regardless of whether the pipeline run succeeded or failed.
22+
If set to 'success', the action will only be executed if the pipeline run succeeded.
23+
If set to 'failure', the action will only be executed if the pipeline run failed.
24+
25+
**Driver Name:** Name of the JDBC driver to use.
26+
27+
**Query:** Query to run.
28+
29+
**Host:** Host that Teradata is running on.
30+
31+
**Port:** Port that Teradata is running on.
32+
33+
**Database:** Teradata database name.
34+
35+
**Username:** User identity for connecting to the specified database.
36+
37+
**Password:** Password to use to connect to the specified database.
38+
39+
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
40+
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.
41+
42+
43+
Example
44+
-------
45+
Suppose you want to delete all records from Teradata table "userEvents" of database "prod" running on localhost,
46+
port 1025, without authentication using driver "teradata" if the pipeline completes successfully
47+
(Ensure that the driver for Teradata is installed. You can also driver name for some specific driver,
48+
otherwise "teradata" will be used ), then configure the plugin with:
49+
50+
```
51+
Run Condition: "success"
52+
Driver Name: "teradata"
53+
Query: "delete * from userEvents"
54+
Host: "localhost"
55+
Port: 1025
56+
Database: "prod"
57+
Username: "dbc"
58+
Password: "dbc"
59+
```
2.87 KB
Loading
2.87 KB
Loading
2.87 KB
Loading
2.87 KB
Loading

0 commit comments

Comments
 (0)