Select Page
Quick Hits: Exclude Current Week in SQL

Written by Chase Thacker

I am an adoptive father of two boys. My wife and I love West Virginia and hope to share that love of the Appalachian region with others. I do my small part by blogging about data science and Appalachian topics. For fun, I like to play hobbyist board games, read tons of books (particularly sci-fi and fantasy), and pretend to know what I am doing in my woodshop.

June 11, 2021

Welcome to the first of my “Quick Hits” entries on the blog! I am still trying to find the proper shape for my blog formats, and I hope this one will persist going forward.

Quick Hits entries will be more focused on a problem I found interesting and a solution to that problem. Other content will usually be written to explore a machine learning or statistics concept in detail, to explain tools for data science, or to examine data sets or recent projects. Unlike that long-form content which could be 5-15 minutes of reading, I aim to keep the Quick Hits in the 2-3 minute range.

Problem

When creating weekly metrics charts, including data from the current week can cause graphs to dive sharply in the current week since data collection for the current week will be incomplete. Though the incomplete data is not problematic for an analyst, end users can be concerned if their number appear to be declining sharply.

The same issue can present itself when looking at daily, monthly, yearly, etc. data if you include the current timeframe. The problem then is finding a way to exclude the current timeframe’s data.

Solution

Luckily, the solution is quite easy! I discovered a method recently when faced with this exact problem at work. My solution relies on the DATE_TRUNC function which exists in Redshift, Postgresql, and several other varieties of SQL. If your variety does not have the same function, you should still be able to find some kind of equivalent (just use whatever method you were using to collate your data into weeks anyway).

When using DATE_TRUNC on a date to truncate to the week, it returns a date object specifying midnight of the first day of the week. Therefore, you can call DATE_TRUNC on CURRENT_DATE to obtain the timestamp of the beginning of your current week.

At that point, filtering is easy. You can filter out anything where the timestamp is greater than the truncated date. Alternatively, you can run the DATE_TRUNC function on both your timestamp and on the current date. Then you can compare the resulting dates and filter anywhere where you a match.

Example

As an example, let’s consider a table recording customer visits (visits). This table has two columns, visit_time and customer_id. If you want a weekly count of visits to use in a line graph or other visualization, you could get the result using this:

select date_trunc(‘week’, visit_time) as week,
       count(*) as visit_count
    from visits
  group by 1

However, that query produces the exact downturning graph we are wanting to avoid! To remove that current week’s data, just modify the query slightly to include this:

select date_trunc(‘week’, visit_time) as week,
       count(*) as visit_count
    from visits
   where visit_time < date_trunc(‘week’, current_date)
  group by 1

With this small filtering statement, you can cut out all the current week’s data and display only completed weeks in your weekly graph.

Conclusion

Filtering out the current timeframes’s data is ultimately pretty easy once you know one or two ways to do it! By filtering out this data may not give you the most up-to-date information, but it will help ease the fears of product managers and other end users who see a sudden dip in the graphs. Try it out in your workflows, and see if it works well for your work!

You may also like…

If you enjoy my writing, you can also check out my other blog where I review books!