initial review updates

pskrbasu · pskrbasu · commit 3b74d2b75623 · 2025-06-18T17:20:31.000+05:30
diff --git a/docs/collect/configure.md b/docs/collect/configure.md
@@ -53,7 +53,7 @@ Tailpipe uses [hive partitioning](https://duckdb.org/docs/data/partitioning/hive
 
   - The data is written to Parquet files in the workspace directory, with a prescribed directory and filename structure.  Each partition is written to a separate directory.
 
-  - The `tp_index` is used to partition the data and defaults to `"default"` if not specified. You can configure the `tp_index` in your partition config to specify a different value or expression for the partition index. Be aware that defining a `tp_index` does not always increase performance and may, in fact, decrease it as it can result in many small parquet files.   
+  - The `tp_index` is used to partition the data and defaults to `"default"` if not specified. You can configure the `tp_index` in your partition config to specify a column name as the partition index. Be aware that defining a `tp_index` does not always increase performance and may, in fact, decrease it as it can result in many small parquet files.
 
 The standard partitioning/hive structure enables efficient queries that only need to read subsets of the hive filtered by index or date.  Because the data is laid out into partitions,  performance is optimized when the partition appears in a `where` or `join` clause.  The index provides a way to segment the data to optimize lookup performance in a way that is *optimal for your specific use case*.  For example, you might index on account ID for AWS tables, subscription for Azure tables, or project ID for GCP tables. 
 
diff --git a/docs/faq/index.md b/docs/faq/index.md
@@ -93,4 +93,4 @@ partition "aws_cloudtrail_log" "cloudtrail_all" {
 
 ## What partition indexes are available for a table?
 
-The `tp_index` value depends on how you have configured it in your partition config. By default, `tp_index` is set to `"default"`, but you can configure it to use any value or expression that makes sense for your data. For AWS tables, you might set it to `account_id`.
+The `tp_index` value depends on how you have configured it in your partition config. By default, `tp_index` is set to `"default"`, but you can configure it to specify a column name as the partition index that makes sense for your data. For AWS tables, you might set it to `account_id`.
diff --git a/docs/reference/config-files/partition.md b/docs/reference/config-files/partition.md
@@ -32,7 +32,7 @@ The partition has two labels:
 |----------|--------|-----------|-----------------
 | `source` | Block  | Required  | a [source](#source) from which to collect data.
 | `filter` | String | Optional  | A SQL `where` clause condition to filter log entries. Supports expressions using table columns.
-| `tp_index` | String | Optional  | The value or expression to use for the partition index. Defaults to `"default"` if not specified. This is used in the [hive partitioning](/docs/collect/configure#hive-partitioning) scheme.
+| `tp_index` | String | Optional  | The column name to use as the partition index. Defaults to `"default"` if not specified. This is used in the [hive partitioning](/docs/collect/configure#hive-partitioning) scheme.
 
 
 
@@ -179,7 +179,7 @@ partition "aws_cloudtrail_log" "s3_bucket_us_east_1" {
 }
 ```
 
-You can configure the `tp_index` to use a specific value or expression for the partition index:
+You can configure the `tp_index` to use a specific column as the partition index:
 
 ```hcl
 partition "aws_cloudtrail_log" "account_specific" {
diff --git a/docs/reference/config-files/table.md b/docs/reference/config-files/table.md
@@ -111,7 +111,7 @@ Tailpipe supports most of the [DuckDB general-purpose data types](https://duckdb
 
 Tailpipe tables include a set of common columns.  These mappings enable queries that correlate values across different logs. If you have collected both Cloudtrail and ALB logs, for example, you could query for `tp_ips` to find IP addresses in the `aws_cloudtrail_log` and `aws_alb_access_log` tables using the same syntax.
 
-When creating a custom table, `tp_timestamp` is the only required column; ***you must define a `tp_timestamp` column***.  This is because Tailpipe uses the timestamp to [organize the data files](/docs/collect/configure#hive-partitioning).  The `tp_index` is also used in the hive partitioning scheme.  By default, `tp_index` is set to `"default"`, but you can configure it in your partition config to specify a different value or expression for the partition index.
+When creating a custom table, `tp_timestamp` is the only required column; ***you must define a `tp_timestamp` column***.  This is because Tailpipe uses the timestamp to [organize the data files](/docs/collect/configure#hive-partitioning).  The `tp_index` is also used in the hive partitioning scheme.  By default, `tp_index` is set to `"default"`, but you can configure it in your partition config to specify a column name as the partition index.
 
 Some of the common columns (`tp_date`,`tp_id`,`tp_ingest_timestamp`,`tp_partition`,`tp_table`) are automatically set by the plugins - You do not need to create them.  Others are optional (but encouraged).  If you do not set an optional common column, all values will be `null`.
 

Original file line number	Diff line number	Diff line change
`@@ -93,4 +93,4 @@ partition "aws_cloudtrail_log" "cloudtrail_all" {`
`93`	`93`
`94`	`94`	`## What partition indexes are available for a table?`
`95`	`95`
`96`		-The `tp_index` value depends on how you have configured it in your partition config. By default, `tp_index` is set to `"default"`, but you can configure it to use any value or expression that makes sense for your data. For AWS tables, you might set it to `account_id`.
	`96`	+The `tp_index` value depends on how you have configured it in your partition config. By default, `tp_index` is set to `"default"`, but you can configure it to specify a column name as the partition index that makes sense for your data. For AWS tables, you might set it to `account_id`.