# Lesson #2: Essential Redshift Utilities: Generate DDL and Search Table Metadata¶

A decade ago, technologists tended to specialize in a few systems. In the database realm, you might have focused on Oracle or DB2 or SQL Server. Even with massively parallel processing (MPP) databases we tended to specialize in Teradata or Netezza or Greenplum. However, over the past few years, I have worked on projects on all of these systems and more, including cloud-based systems like Hive, Spark, Redshift, Snowflake, and BigQuery. When I worked only in Oracle and only used an Oracle SQL editor, then I knew exactly where to find my store of SQL snippets for doing things like querying the database system tables.

However, as the number of databases I worked with each day expanded, so did the number of places I had to look for my old code, and the amount of time I had to spend researching the intricacies of system tables for those databases. In the well-known Workbench series of products from Aginity, each product focused on one database platform (Workbench for Netezza, Workbench for Redshift, etc.). Aginity Pro is more than a unified platform for executing SQL against many databases. Now there is an “Active Catalog” where I can store, organize, manage, and execute my hard-won SQL.

As SQL analysts and engineers, there are a few things we do every time we approach a new set of data or a new project. Many of those things involve querying the system tables of the current database. The Active Catalog included with Aginity Pro includes a “Sample Catalog” directory with, among other things, system table queries that are ready to use out of the box. Below, I’ll discuss two of these. Neither is a particularly sophisticated use of the Active Catalog, but being familiar with their location and usage means that new projects can start immediately with data investigation rather than with a Google search for how to query the system tables in [your database platform here].

## Step 1 - Search Table Metadata¶

Gathering all of the bits and pieces that make up a DDL statement required a fairly large “system” query. Searching for column names should be simple. In fact, in most systems it is simple after you finish searching Google for that right tables to query. I have researched how to search partial column names on probably 30 database systems over the years. When I start a new project, I frequently get vague instructions like, “you should find what you need in the Current Customer table”. When I finally get a database connection, I see table names like FRDST_CST_MN. Huh?

They are all very straight-forward queries. For example, here is the definition of Search for Columns by partial name - Public schema:

 1 2 3 4  select table_catalog as database, table_schema, table_name, column_name, ordinal_position as column_sequence from information_schema.columns where lower(table_schema) = 'public' and lower(column_name) like $partial_column_name  Note The$partial_column_name is a parameter and is usually used with % wildcard characters.

## Step 2 - Generate Drop Table Query¶

In some cases you can string together SQL statements to get more value from them. For instance in a lot of cases we desire to search the database catalog for table names that match a pattern and then generate a DROP statement to clean the database up. The first query below will search for all tables in the information schema that match a name sequence.

 1 2 3  select table_catalog as database, table_schema, table_name from information_schema.tables where lower(table_name) like lower($partial_table_name);  You can now use this SQL and embed it inside of another SQL statement to generate the DROP statements  1 2 3 4 5 6 7  select 'drop table ' + table_name + ';' from ( select table_catalog as database, table_schema, table_name from information_schema.tables where lower(table_name) like lower($partial_table_name) ) a; 

When executed for tables that match the name ‘%pendo%’ we will return the following results:

drop table pendo_featureevents;
drop table pendo_visitor;
drop table pendo_trackevents;


## Step 3 - Generate DDL¶

When we sit down to a new project, we frequently need to work with the Data Definition Language (DDL). In most database systems, the actual structure of database objects is spread across several tables. Table-level properties are one place, columns another place, constraints another. Some systems provide a view to pull all of these sources together so that we can easily query the DDL of an existing table. Redshift does not provide a built-in view for this, but Amazon has provided an example query on Github.

This 230 lines of SQL provided by Amazon allows an admin to create a view that can then be queried to assemble the DDL.

  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158  ( SELECT table_id ,schemaname ,tablename ,seq ,ddl FROM ( SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,2 AS seq ,'CREATE TABLE IF NOT EXISTS ' + QUOTE_IDENT(n.nspname) + '.' + QUOTE_IDENT(c.relname) + '' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace WHERE c.relkind = 'r' --OPEN PARENT COLUMN LIST UNION SELECT c.oid::bigint as table_id,n.nspname AS schemaname, c.relname AS tablename, 5 AS seq, '(' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace WHERE c.relkind = 'r' --COLUMN LIST UNION SELECT table_id ,schemaname ,tablename ,seq ,'\t' + col_delim + col_name + ' ' + col_datatype + ' ' + col_nullable + ' ' + col_default + ' ' + col_encoding AS ddl FROM ( SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,100000000 + a.attnum AS seq ,CASE WHEN a.attnum > 1 THEN ',' ELSE '' END AS col_delim ,QUOTE_IDENT(a.attname) AS col_name ,CASE WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING') > 0 THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER VARYING', 'VARCHAR') WHEN STRPOS(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER') > 0 THEN REPLACE(UPPER(format_type(a.atttypid, a.atttypmod)), 'CHARACTER', 'CHAR') ELSE UPPER(format_type(a.atttypid, a.atttypmod)) END AS col_datatype ,CASE WHEN format_encoding((a.attencodingtype)::integer) = 'none' THEN 'ENCODE RAW' ELSE 'ENCODE ' + format_encoding((a.attencodingtype)::integer) END AS col_encoding ,CASE WHEN a.atthasdef IS TRUE THEN 'DEFAULT ' + adef.adsrc ELSE '' END AS col_default ,CASE WHEN a.attnotnull IS TRUE THEN 'NOT NULL' ELSE '' END AS col_nullable FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid LEFT OUTER JOIN pg_attrdef AS adef ON a.attrelid = adef.adrelid AND a.attnum = adef.adnum WHERE c.relkind = 'r' AND a.attnum > 0 ORDER BY a.attnum ) --CONSTRAINT LIST UNION (SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,200000000 + CAST(con.oid AS INT) AS seq ,'\t,' + pg_get_constraintdef(con.oid) AS ddl FROM pg_constraint AS con INNER JOIN pg_class AS c ON c.relnamespace = con.connamespace AND c.oid = con.conrelid INNER JOIN pg_namespace AS n ON n.oid = c.relnamespace WHERE c.relkind = 'r' AND pg_get_constraintdef(con.oid) NOT LIKE 'FOREIGN KEY%' ORDER BY seq) --CLOSE PARENT COLUMN LIST UNION SELECT c.oid::bigint as table_id,n.nspname AS schemaname, c.relname AS tablename, 299999999 AS seq, ')' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace WHERE c.relkind = 'r' --DISTSTYLE UNION SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,300000001 AS seq ,CASE WHEN c.reldiststyle = 0 THEN 'DISTSTYLE EVEN' WHEN c.reldiststyle = 1 THEN 'DISTSTYLE KEY' WHEN c.reldiststyle = 8 THEN 'DISTSTYLE ALL' WHEN c.reldiststyle = 9 THEN 'DISTSTYLE AUTO' ELSE '<>' END AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace WHERE c.relkind = 'r' --DISTKEY COLUMNS UNION SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,400000000 + a.attnum AS seq ,' DISTKEY (' + QUOTE_IDENT(a.attname) + ')' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid WHERE c.relkind = 'r' AND a.attisdistkey IS TRUE AND a.attnum > 0 --SORTKEY COLUMNS UNION select table_id,schemaname, tablename, seq, case when min_sort <0 then 'INTERLEAVED SORTKEY (' else ' SORTKEY (' end as ddl from (SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,499999999 AS seq ,min(attsortkeyord) min_sort FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid WHERE c.relkind = 'r' AND abs(a.attsortkeyord) > 0 AND a.attnum > 0 group by 1,2,3,4 ) UNION (SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,500000000 + abs(a.attsortkeyord) AS seq ,CASE WHEN abs(a.attsortkeyord) = 1 THEN '\t' + QUOTE_IDENT(a.attname) ELSE '\t, ' + QUOTE_IDENT(a.attname) END AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid WHERE c.relkind = 'r' AND abs(a.attsortkeyord) > 0 AND a.attnum > 0 ORDER BY abs(a.attsortkeyord)) UNION SELECT c.oid::bigint as table_id ,n.nspname AS schemaname ,c.relname AS tablename ,599999999 AS seq ,'\t)' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace INNER JOIN pg_attribute AS a ON c.oid = a.attrelid WHERE c.relkind = 'r' AND abs(a.attsortkeyord) > 0 AND a.attnum > 0 --END SEMICOLON UNION SELECT c.oid::bigint as table_id ,n.nspname AS schemaname, c.relname AS tablename, 600000000 AS seq, ';' AS ddl FROM pg_namespace AS n INNER JOIN pg_class AS c ON n.oid = c.relnamespace WHERE c.relkind = 'r' ) ORDER BY table_id,schemaname, tablename, seq ) where schemaname = $schema and tablename =$table_name 

Note

You will be prompted to supply an exact schema and table_name in this example.

## Redshift Utilities Aginity Catalog Assets¶

There are seven assets you can add to your catalog. I chose to add them as shown below.

These queries are specific to Redshift but could be patterned after for other Database platforms. Just select a Redshift connection in the Pro/Team Editor and either double click the catalog item and execute or drag and drop the catalog item which will expose the code and run them.

### Search for Columns by exact name - All schemas¶

This asset will search the information schema for columns with an exact name matches across all schemas

### Search for Columns by partial name - All schemas¶

This asset will search the information schema for columns with a partial name matches across all schemas

### Search for Tables by partial name - All schemas¶

This asset will search the information schema for tables with a partial name matches across all schemas