Code Odour: Using Both a List and Set Instead of Just a Dict

Matt McCormick

2016/06/09

I came across something like the following code recently and ended up refactoring it to make it much simpler. Let’s take a look at what the original developer was trying to do.

result = [ {'url': obj['relative_url'], ...} for obj in objs]

added_objs = set()

for obj in result:
	url = get_absolute_url(obj['url'])
	if url in added_objs:
		url = None

	else:
		added_objs.add(url)
	obj['url'] = url

return [obj for obj in result if obj['url'] != None]

In the first statement, a new list of dictionaries is being created and stored in the result variable.

Then an empty set is created which will be used in the loop below.

Jumping to the return statement, we see that we will be returning a list but only if the ‘url’ key is not set to None.

In the for loop we see that if the ‘url’ is not in the added_objs set, we will do some modification to it and then add it to the set.

One reason you can see that this is smelly code is that the developer is looping through the original list 3 times. The first time is to set the attributes that he wants in the result. The second loop is setting the url to None if it already exists in the added_objs set or adding it to the set if it doesn’t exist. The third loop is creating a new list the same as the previous but only if the ‘url’ is not None.

Pretty confusing, eh?

In plain English, the purpose of this function is to return a unique list of objects where the url is not duplicated.

Let’s see one method of simplifying this code. As mentioned, there are three loops being used for things that we could probably do in one.

result = dict()
for obj in objs:
	url = get_absolute_url(obj['relative_url'])
	if url not in result:
		result[url] = {'url': url, ...}

return result.values()

Here, instead of using a set to track the urls we’ve added, we are instead using the url as the key for the dictionary and then adding the dict as a value. Then we can simply check that we haven’t previously added the url to the dict and just return the values.

Anytime you see code that is using both a set and a list to track data, it’s an indication that a dict could take the place instead. After all, a dict is basically a set and a list combined.