how to find duplicates in a sqlite table column

these assume a database file named ./mydb, with a table named mytable, and a column named mycol.

these will print the count of duplicates:

from the sqlite shell

$ sqlite mydb
sqlite> select count(mycol) from mytable group by mycol having ( count(mycol) > 1 );
sqlite> .q

from the shell command line

$ sqlite mydb <<EOM
select count(mycol) from mytable group by mycol having ( count(mycol) > 1 );
EOM

from python

import os, sys, sqlite3
dbfile = "./mydb"
thetable = "mytable"
db = sqlite3.connect(dbfile)
cursor = db.cursor()
query = 'select count(mycol) from mytable group by mycol having ( count(mycol) > 1 );'
cursor.execute(query)
for row in cursor.fetchall():
        print row[1]
db.close()
== to delete duplicates:

== this may not work in sqlite (mysql?):

— create temp table same as original CREATE TABLE tmp (…); — insert just the unique records INSERT INTO tmp SELECT * FROM mytable GROUP BY mycol; — replace original table rename table mytable to mytable_old; rename table tmp mytable; drop table mytable_old;

== better

CREATE TEMP TABLE tmp (…); INSERT INTO tmp (mycol) SELECT mycol FROM mytable GROUP BY mycol; SELECT COUNT(mycol) FROM tmp; DELETE FROM mytable; VACUUM mytable; SELECT COUNT(mycol) FROM mytable; INSERT INTO mytable SELECT * FROM tmp; SELECT COUNT(mycol) FROM mytable; SELECT DISTINCT COUNT(mycol) FROM mytable; SELECT mycol FROM mytable LIMIT 10; DROP TABLE tmp;