Lesson #2: Essential Redshift Utilities: Generate DDL and Search Table Metadata

A decade ago, technologists tended to specialize in a few systems. In the database realm, you might have focused on Oracle or DB2 or SQL Server. Even with massively parallel processing (MPP) databases we tended to specialize in Teradata or Netezza or Greenplum. However, over the past few years, I have worked on projects on all of these systems and more, including cloud-based systems like Hive, Spark, Redshift, Snowflake, and BigQuery. When I worked only in Oracle and only used an Oracle SQL editor, then I knew exactly where to find my store of SQL snippets for doing things like querying the database system tables.

However, as the number of databases I worked with each day expanded, so did the number of places I had to look for my old code, and the amount of time I had to spend researching the intricacies of system tables for those databases. In the well-known Workbench series of products from Aginity, each product focused on one database platform (Workbench for Netezza, Workbench for Redshift, etc.). Aginity Pro is more than a unified platform for executing SQL against many databases. Now there is an “Active Catalog” where I can store, organize, manage, and execute my hard-won SQL.

As SQL analysts and engineers, there are a few things we do every time we approach a new set of data or a new project. Many of those things involve querying the system tables of the current database. The Active Catalog included with Aginity Pro includes a “Sample Catalog” directory with, among other things, system table queries that are ready to use out of the box. Below, I’ll discuss two of these. Neither is a particularly sophisticated use of the Active Catalog, but being familiar with their location and usage means that new projects can start immediately with data investigation rather than with a Google search for how to query the system tables in [your database platform here].

Step 1 - Search Table Metadata

Gathering all of the bits and pieces that make up a DDL statement required a fairly large “system” query. Searching for column names should be simple. In fact, in most systems it is simple after you finish searching Google for that right tables to query. I have researched how to search partial column names on probably 30 database systems over the years. When I start a new project, I frequently get vague instructions like, “you should find what you need in the Current Customer table”. When I finally get a database connection, I see table names like FRDST_CST_MN. Huh?

They are all very straight-forward queries. For example, here is the definition of Search for Columns by partial name - Public schema:

1
2
3
4
  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence
  from information_schema.columns
  where lower(table_schema) = 'public'
  and lower(column_name) like $partial_column_name

Note

The $partial_column_name is a parameter and is usually used with % wildcard characters.

Step 2 - Generate Drop Table Query

In some cases you can string together SQL statements to get more value from them. For instance in a lot of cases we desire to search the database catalog for table names that match a pattern and then generate a DROP statement to clean the database up. The first query below will search for all tables in the information schema that match a name sequence.

1
2
3
  select table_catalog as database, table_schema, table_name
  from information_schema.tables
  where lower(table_name) like lower($partial_table_name);

You can now use this SQL and embed it inside of another SQL statement to generate the DROP statements

1
2
3
4
5
6
7
  select 'drop table ' + table_name + ';'
  from
    (
      select table_catalog as database, table_schema, table_name
      from information_schema.tables
      where lower(table_name) like lower($partial_table_name)
    ) a;

When executed for tables that match the name ‘%pendo%’ we will return the following results:

drop table pendo_featureevents;
drop table pendo_visitor;
drop table pendo_trackevents;

Step 3 - Generate DDL

When we sit down to a new project, we frequently need to work with the Data Definition Language (DDL). In most database systems, the actual structure of database objects is spread across several tables. Table-level properties are one place, columns another place, constraints another. Some systems provide a view to pull all of these sources together so that we can easily query the DDL of an existing table. Redshift does not provide a built-in view for this, but Amazon has provided an example query on Github.

This 230 lines of SQL provided by Amazon allows an admin to create a view that can then be queried to assemble the DDL.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
    (
    SELECT
    table_id
    ,schemaname
    ,tablename
    ,seq
    ,ddl
    FROM
    (
    SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,2 AS seq
    ,'CREATE TABLE IF NOT EXISTS ' + QUOTE_IDENT(n.nspname) + '.' + QUOTE_IDENT(c.relname) + '' AS ddl
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    WHERE c.relkind = 'r'
    --OPEN PARENT COLUMN LIST
    UNION SELECT c.oid::bigint as table_id,n.nspname AS schemaname, c.relname AS tablename, 5 AS seq, '(' AS ddl
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    WHERE c.relkind = 'r'
    --COLUMN LIST
    UNION SELECT
    table_id
    ,schemaname
    ,tablename
    ,seq
    ,'\t' + col_delim + col_name + ' ' + col_datatype + ' ' + col_nullable + ' ' + col_default + ' ' + col_encoding AS ddl
    FROM
    (
    SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,100000000 + a.attnum AS seq
    ,CASE WHEN a.attnum > 1 THEN ',' ELSE '' END AS col_delim
    ,QUOTE_IDENT(a.attname) AS col_name
    ,CASE WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING') > 0
      THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING', 'VARCHAR')
     WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER') > 0
      THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER', 'CHAR')
     ELSE UPPER(format_type(a.atttypid, a.atttypmod))
     END AS col_datatype
    ,CASE WHEN format_encoding((a.attencodingtype)::integer) = 'none'
     THEN 'ENCODE RAW'
     ELSE 'ENCODE ' + format_encoding((a.attencodingtype)::integer)
     END AS col_encoding
    ,CASE WHEN a.atthasdef IS TRUE THEN 'DEFAULT ' + adef.adsrc ELSE '' END AS col_default
    ,CASE WHEN a.attnotnull IS TRUE THEN 'NOT NULL' ELSE '' END AS col_nullable
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
    LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum
    WHERE c.relkind = 'r'
     AND a.attnum > 0
    ORDER BY a.attnum
    )
    --CONSTRAINT LIST
    UNION (SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,200000000 + CAST(con.oid AS INT) AS seq
    ,'\t,' + pg_get_constraintdef(con.oid) AS ddl
    FROM pg_constraint AS con
    INNER JOIN pg_class AS c ON c.relnamespace = con.connamespace AND c.oid = con.conrelid
    INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace
    WHERE c.relkind = 'r' AND pg_get_constraintdef(con.oid) NOT LIKE 'FOREIGN KEY%'
    ORDER BY seq)
    --CLOSE PARENT COLUMN LIST
    UNION SELECT c.oid::bigint as table_id,n.nspname AS schemaname, c.relname AS tablename, 299999999 AS seq, ')' AS ddl
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    WHERE c.relkind = 'r'

    --DISTSTYLE
    UNION SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,300000001 AS seq
    ,CASE WHEN c.reldiststyle = 0 THEN 'DISTSTYLE EVEN'
    WHEN c.reldiststyle = 1 THEN 'DISTSTYLE KEY'
    WHEN c.reldiststyle = 8 THEN 'DISTSTYLE ALL'
    WHEN c.reldiststyle = 9 THEN 'DISTSTYLE AUTO'
    ELSE '<<Error - UNKNOWN DISTSTYLE>>'
    END AS ddl
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    WHERE c.relkind = 'r'
    --DISTKEY COLUMNS
    UNION SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,400000000 + a.attnum AS seq
    ,' DISTKEY (' + QUOTE_IDENT(a.attname) + ')' AS ddl
    FROM pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
    WHERE c.relkind = 'r'
    AND a.attisdistkey IS TRUE
    AND a.attnum > 0
    --SORTKEY COLUMNS
    UNION select table_id,schemaname, tablename, seq,
       case when min_sort <0 then 'INTERLEAVED SORTKEY (' else ' SORTKEY (' end as ddl
    from (SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,499999999 AS seq
    ,min(attsortkeyord) min_sort FROM pg_namespace AS n
    INNER JOIN  pg_class AS c ON n.oid = c.relnamespace
    INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
    WHERE c.relkind = 'r'
    AND abs(a.attsortkeyord) > 0
    AND a.attnum > 0
    group by 1,2,3,4 )
    UNION (SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,500000000 + abs(a.attsortkeyord) AS seq
    ,CASE WHEN abs(a.attsortkeyord) = 1
    THEN '\t' + QUOTE_IDENT(a.attname)
    ELSE '\t, ' + QUOTE_IDENT(a.attname)
    END AS ddl
    FROM  pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    INNER JOIN pg_attribute AS a ON c.oid = a.attrelid
    WHERE c.relkind = 'r'
    AND abs(a.attsortkeyord) > 0
    AND a.attnum > 0
    ORDER BY abs(a.attsortkeyord))
    UNION SELECT
    c.oid::bigint as table_id
    ,n.nspname AS schemaname
    ,c.relname AS tablename
    ,599999999 AS seq
    ,'\t)' AS ddl
    FROM pg_namespace AS n
    INNER JOIN  pg_class AS c ON n.oid = c.relnamespace
    INNER JOIN  pg_attribute AS a ON c.oid = a.attrelid
    WHERE c.relkind = 'r'
    AND abs(a.attsortkeyord) > 0
    AND a.attnum > 0
    --END SEMICOLON
    UNION SELECT c.oid::bigint as table_id ,n.nspname AS schemaname, c.relname AS tablename, 600000000 AS seq, ';' AS ddl
    FROM  pg_namespace AS n
    INNER JOIN pg_class AS c ON n.oid = c.relnamespace
    WHERE c.relkind = 'r'

    )
    ORDER BY table_id,schemaname, tablename, seq
    )
    where schemaname = $schema and tablename = $table_name

Note

You will be prompted to supply an exact schema and table_name in this example.

Redshift Utilities Aginity Catalog Assets

There are seven assets you can add to your catalog. I chose to add them as shown below.

../../_images/RSUtilityCatalog.png

These queries are specific to Redshift but could be patterned after for other Database platforms. Just select a Redshift connection in the Pro/Team Editor and either double click the catalog item and execute or drag and drop the catalog item which will expose the code and run them.

Search for Columns by exact name - All schemas

This asset will search the information schema for columns with an exact name matches across all schemas

1
2
3
  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence
  from information_schema.columns
  where lower(column_name) = $column_name

Search for Columns by exact name - Public schema

This asset will search the information schema for columns with an exact name matches across just the public schema

1
2
3
4
  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence
  from information_schema.columns
  where lower(table_schema) = 'public'
  and lower(column_name) = $column_name

Search for Columns by partial name - All schemas

This asset will search the information schema for columns with a partial name matches across all schemas

1
2
3
4
  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence
  from information_schema.columns
  where lower(table_schema) = 'public'
  and lower(column_name) = $column_name

Search for Columns by partial name - Public schema

This asset will search the information schema for columns with an partial name matches across just the public schema

1
2
3
4
  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence
  from information_schema.columns
  where lower(table_schema) = 'public'
  and lower(column_name) like $partial_column_name

Search for Tables by partial name - All schemas

This asset will search the information schema for tables with a partial name matches across all schemas

1
2
3
  select table_catalog as database, table_schema, table_name
  from information_schema.tables
  where lower(table_name) like lower($partial_table_name)

Search for Tables by partial name - Public schema

This asset will search the information schema for tables with an partial name matches across just the public schema

1
2
3
4
  select table_catalog as database, table_schema, table_name
  from information_schema.tables
  where lower(table_schema) = 'public'
  and lower(table_name) like $partial_table_name