Working in technology, Iβve heard a number of things about ChatGPT from various news sources.
I was particularly interested hearing some teachers saying it was the end of being able to properly detect students cheating when writing papers. Then I heard a teacher talking about how this was the new βcalculatorβ that teachers need to learn how to leverage in their curriculums. Some talented developers at work were also talking about ways to use ChatGPT to write code and sharing experiences theyβve had with it.
Then a colleague mentioned that ChatGPT can write valid SQL, and I could hardly wait to try it.
The Challenge
At work, Iβve been playing around with some of MySQLβs unusual ways of handling GROUP BY, and I thought I would use the SQL Iβve been toying with there to experiment with ChatGPT.
The SQL Iβm playing with is using the Sakila database. Iβm using the rental table to demonstrate some of the unexpected behavior I get in MySQL with ONLY_FULL_GROUP_BY mode disabled. The results can be unexpected when I include a column in the SELECT clause that neither has an aggregate function specified, nor is one of the columns Iβm grouping on. For reference, Db2 will simply fail and require that every column either have an aggregation function or be included in the GROUP BY clause. MySQL behaves that way only when ONLY_FULL_GROUP_BY is enabled.
Note that Iβm using the sakila database in a docker container for this, if you want to try any of the SQL.
The SQL Iβm using is designed to display some of the surprising behavior MySQL has in this area. The goal is to list the name and rental date of the last film a user rented, using the Sakila database. This is the SQL that I wrote to display what someone might think works, but it drastically doesnβt:
select inventory_id, customer_id, max(rental_date)
from rental
where `customer_id`=1
group by customer_id
limit 1;
ERROR 1055 (42000): Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column ‘sakila.rental.inventory_id’ which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by
In this case, this SQL fails entirely as it would on Db2 because the mode ONLY_FULL_GROUP_BY is in use (which is the default). It fails because we include inventory_id in the SELECT list without either specifying it in the GROUP BY or specifying an aggregate function for it.
If I run this in an environment that isnβt using the mode ONLY_FULL_GROUP_BY, I get something like this:
+————–+————-+—————————-+
| inventory_id | customer_id | max(rental_date) |
+————–+————-+—————————-+
| 4020 | 1 | 2005-08-22 19:41:37.000000 |
+————–+————-+—————————-+
Note that I get an answer, and while that rental date is correct, the inventory_id is not. In this case, MySQL chooses any value from the column (often the first) and uses it. The value it chooses is not deterministic, and may vary.
This is SQL that actually works in 5.7, even without ONLY_FULL_GROUP_BY, to get a correct answer:
select inventory_id, customer_id, rental_date
from rental a
where rental_date=( select max(rental_date) from rental b where a.customer_id=b.customer_id and staff_id=1) and a.customer_id=1;
+————–+————-+———————+
| inventory_id | customer_id | rental_date |
+————–+————-+———————+
| 1446 | 1 | 2005-08-22 19:41:37 |
+————–+————-+———————+
1 row in set (0.00 sec)
This is a correct answer to the question, though Iβm eliminating joins so I can use the query for demonstrating the very specific point Iβm trying to illustrate. A full answer to the question is:
select title, customer_id, rental_date
from rental a join inventory i on a.inventory_id=i.inventory_id
join film f on f.film_id=i.film_id
where rental_date=( select max(rental_date) from rental b where a.customer_id=b.customer_id and staff_id=1)
and a.customer_id=1
and staff_id=1;
+———————–+————-+———————+
| title | customer_id | rental_date |
+———————–+————-+———————+
| FIREBALL PHILADELPHIA | 1 | 2005-08-22 19:41:37 |
+———————–+————-+———————+
1 row in set (0.01 sec)
Enter ChatGPT
Now keep in mind that Iβve never even played with ChatGPT before. I thought Iβd just try it. I entered this prompt:
Can you write a valid SQL statement for MySQL 5.7 that uses the Sakila database to find the name of the latest movie that customer id 1 rented from employee id 1?
And here is the result:
I figured my starting try would be too simple, that Iβd have to refine and maybe provide data model details. Nope, thatβs a correct answer from where I started:
SELECT film.title
FROM rental
JOIN inventory ON rental.inventory_id = inventory.inventory_id
JOIN film ON inventory.film_id = film.film_id
WHERE rental.customer_id = 1 AND rental.staff_id = 1
ORDER BY rental.rental_date DESC
LIMIT 1;
+———————–+
| title |
+———————–+
| FIREBALL PHILADELPHIA |
+———————–+
1 row in set (0.09 sec)
I am really impressed that it not only wrote a correct SQL statement, and knew the Sakila data model, but that it both did not make mistakes Iβve seen humans make, and also gave me the code in monospaced, easy to copy text.
I started with a simplified version of the query, and ChatGPT gave me a correct answer, but writing this for one customer for me is just a precursor to writing it for multiple customers, and thatβs why I chose the correlated subquery method of solving it (well that, and neither CTEs nor window functions work in MySQL 5.7). Let me try to get Chat GPT to write the more complicated form of the query:
Itβs a correct answer. Not only that, Iβve never seen that particular formatting for join syntax β
FROM
customer c
JOIN
rental r ON c.customer_id = r.customer_id
It really calls out the table names, and I love it. The only criticism I could come up with is that it did not enclose identifiers in tick marks, but Iβm pretty inconsistent on that myself.
Finally, I wondered if the SQL would change if I specified MySQL 8, because if I were writing this query for MySQL 8, I would use a CTE. And dang it, ChatGPT is right there with me:
More Challenges for ChatGPT
Iβm impressed. I expected minor problems, and having to give hints, including having to specify more about the data model. I donβt know that itβs a tool I would use myself to write SQL. I do think that there are limitations to ChatGPT, and some of those are human. Humans arenβt always great at formatting a request properly or frankly even at knowing what they want. I want to experiment more!
I get the impression that many responses to ChatGPT results are βWell, thatβs not wrongβ, so I decided to ask it to write a whole blog entry for me. I can see several areas I would cover differently, but most of what it has there is not wrong. Hereβs the prompt I gave it:
please write a blog entry of 500 words or more in the style of https://datageek.blog describing the pitfalls of not using ONLY_FULL_GROUP_BY with MySQL and how it can lead to unexpected query results.
And hereβs what it came up with:
As a data geek, one of the most important things to understand when working with SQL databases is the concept of a βgroup byβ clause. The group by clause is used to aggregate data, allowing you to perform calculations on multiple rows of data and return a single result. For example, you might use the group by clause to calculate the average salary of all employees in a specific department.
But what happens when youβre working with a MySQL database and you forget to include the βONLY_FULL_GROUP_BYβ mode? The answer is that you may end up with unexpected query results.
The ONLY_FULL_GROUP_BY mode is a MySQL setting that enforces the SQL standard for the group by clause. When this mode is enabled, the query must include all non-aggregated columns in the group by clause. This means that if youβre trying to group data by a specific column, you must also include that column in the group by clause.
For example, letβs say you have a table called βemployeesβ that contains the following columns: βemployee_idβ, βfirst_nameβ, βlast_nameβ, and βsalaryβ. If you want to group the data by the βlast_nameβ column, your query should look something like this:
SELECT last_name, AVG(salary)
FROM employees
GROUP BY last_name;
If you forget to include the βlast_nameβ column in the group by clause, youβll get unexpected results. Instead of calculating the average salary for each last name, youβll get an average salary for the entire table.
But the problem is not only the query but also the default mode of MySQL, before version 8.0, that is set to not enable this mode by default. This means that if youβre working with an older version of MySQL, you might not even realize that youβre not including all non-aggregated columns in the group by clause. And this can lead to unexpected results.
Another pitfall is when youβre trying to select non-aggregated columns that are not included in the group by clause. In this case, MySQL will return an arbitrary value for that column. This can lead to even more confusion and unexpected results.
So, how can you avoid these pitfalls? The first step is to make sure that you have the ONLY_FULL_GROUP_BY mode enabled. You can do this by adding the following line to your MySQL configuration file:
sql_mode = ‘ONLY_FULL_GROUP_BY’
If you are using version 8.0 or later, this mode is enabled by default.
The second step is to be mindful of the columns that youβre including in your group by clause. Make sure that youβre including all non-aggregated columns, and be aware of the potential pitfalls of selecting non-aggregated columns that are not included in the group by clause.
In conclusion, the group by clause is a powerful tool for aggregating data in MySQL. However, itβs important to understand the potential pitfalls of not using the ONLY_FULL_GROUP_BY mode. By enabling this mode and being mindful of the columns that youβre including in your group by clause, you can avoid unexpected results and ensure that your queries are returning accurate data.
Not bad. Iβd like to think itβs not as good as Iβd do, but youβll have to wait for me to finish writing my version of that one to see!
Are you using or experimenting with ChatGPT at all? Share your experiences in the comments below, or with me on Mastodon β Iβm @ember@hachyderm.io.
Ember is always curious and thrives on change. She has built internationally recognized expertise in IBM Db2, and is now pivoting to focus on learning MySQL. Ember shares both posts about her core skill set and her journey learning MySQL.
Ember lives in Denver and work from home

