I've got a file that's running a script to scrape from 2 Rss Feeds for Title, Summary, Link, Published_Date, and Keyword values and import into a Mysql table.
I've tasked the file to run every 59 minutes and today I checked and there are a ton of duplicates in the database. I understand this can be fixed by setting a primary key. This way when the script runs and a duplicate entry is found, the data will just update and if a new entry is found, the data will append to my table.
My table currently does not have a primary key, so I type into the console,
ALTER TABLE Rss_Feed ADD PRIMARY KEY(Link);
And I get back,
ERROR 1062 (23000): Duplicate entry 'https://www.upwork.com/jobs/Tableau-Expert-needed-build-dashboar' for key 'PRIMARY'
mysql> DESCRIBE Primary key;
When I try DESCRIBE Rss_Feed;
I get the following which shows no primary key is set
+----------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+----------------+--------------+------+-----+---------+-------+
| Title | varchar(250) | YES | | NULL | |
| Summary | varchar(300) | YES | | NULL | |
| Link | varchar(250) | YES | | NULL | |
| Published_Date | date | YES | | NULL | |
| Keyword | varchar(20) | YES | | NULL | |
+----------------+--------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
It looks like somehow my table is setting the link for one of the entries as the primary key. Online it said to try switching the primary key to 'title' and when I try that I get the same error
mysql> ALTER TABLE Rss_Feed ADD PRIMARY KEY (Title);
ERROR 1062 (23000): Duplicate entry 'Tableau Expert needed to build 2 dashboards - Upwork' for key 'PRIMARY'
I will continue digging online but any help / insight on this would be greatly appreciated.