Removing duplicates from a Python list is a common task which can be quickly and easily performed if you make use of the properties of a dictionary
.
A dictionary
is a data collection type that stores information in key:value pairs. You refer to the entry in the dictionary by the key, which then allows you to extract the corresponding value. Dictionary keys have to be unique; this is a property we make use of to remove duplicate values from a list.
To remove duplicates we invoke the dict.fromkeys()
function which creates a new dictionary from a list of keys. Since the dictionary can contain no duplicate keys, the function automatically removes the duplicates from the list. We then turn the newly created dictionary back into a list which will now contain no duplicate values.
>>> somelist = ['dog','cat','hamster','dog','fish']
>>> somelist = list(dict.fromkeys(somelist))
>>> print(somelist)
# duplicates removed from list
['dog', 'cat', 'hamster', 'fish']
A Case Sensitive Duplicate Removal Function
If you have a list of strings containing both upper and lower case values, you may want to modify the filter to be either case sensitive or case insensitive depending on the application. Since a dictionary only requires that the keys be unique, the same string in a different case will be viewed as unique and will not be filtered out by the dict.fromkeys()
method.
>>> mylist = ['dog','cat','Dog']
>>> mylist = list(dict.fromkeys(mylist))
>>> print(mylist)
# 'dog' and 'Dog' are viewed as unique keys and so are not filtered out.
['dog', 'cat', 'Dog']
If you do require that the filtering considers case then the best way to proceed is to write a function that performs the filtering as required.
def remove_duplicates(alist,case_sensitive=True):
""" Remove duplicates from a list of strings.
Default to a filter that IS sensitive to case.
i.e. 'dog' and 'Dog' are unique strings.
"""
if not case_sensitive:
alist = [i.lower() for i in alist]
return list(dict.fromkeys(alist))
The function will return a different result depending on whether you wish to filter with or without case sensitivity.
>>> animal_list = ['lion','zebra','Zebra','rhino','hippo','Lion']
>>> default_filtered_list = remove_duplicates(animal_list)
>>> print(default_filtered_list)
# resulting list is case sensitive
['lion', 'zebra', 'Zebra', 'rhino', 'hippo', 'Lion']
>>> remove_case_list = remove_duplicates(animal_list,case_sensitive=False)
>>> print(remove_case_list)
# resulting list is case insensitive
['lion', 'zebra', 'rhino', 'hippo']
The remove_duplicates()
function will break down if you try to filter a list that contains data other than strings. This is because you can't run the lower()
method on a datatype other than a string. If your list contains multiple data types you would need to modify the function to only apply the lower()
method to strings.