The Set
The set, AKA a hashset, is an unordered, resizable, heterogeneous data structure that has the
properties of a Discrete mathematics Set
.
- All items of a set
are
Hashable - All items of a set are unique, based on their hash.
- The intersection, union, disjunction, etc can be taken between Sets.
Containschecks against a set are O(1), which can make them quite fast.- Insertions are idempotent, as all items are unique.
Declaring a set
Sets are declared using {} curly-braces with commas between values
# uci_bootcamp_2021/examples/sets.py
# Declaring a set literal
data = {1, 2, 3}
print(data)
# {1, 2, 3}
Warning: this syntax very similiar to
dict, To declare an empty set, you need to callset().
{}is interpreted as a dictionary literal!
# uci_bootcamp_2021/examples/sets.py
# Warning: syntax is VERY similar to dict literal syntax, beware!
some_dict = {}
some_set = set()
print(type(some_dict))
# <class 'dict'>
print(type(some_set))
# <class 'set'>
Sets have unique elements
All elements of a set, by definition, are distinct and unique. This property can be quite useful for reducing duplicate data.
# uci_bootcamp_2021/examples/sets.py
# Sets only contain unique values, demonstrate this by creating a set from a list with duplicate values.
raw_data = [1, 2, 2, 2, 4, 3, 1, 2, 7]
print(len(raw_data))
# 9
# cast `raw_data` to a set
data = set(raw_data)
# demonstrate that `data` is shorter than the `raw_data` it came from.
print(len(data))
# 5
# and demonstrate that duplicate items are not in the resulting set.
print(data)
# {1, 2, 3, 4, 7}
As stated above, set operations can be done against Python's set.
Union
# uci_bootcamp_2021/examples/sets.py
a = {1, 2, 3}
b = {3, 4, 5}
print(a.union(b))
# is the same as
print(a | b)
# and it is associative
print(b | a)
# {1, 2, 3, 4, 5}
Intersection
# uci_bootcamp_2021/examples/sets.py
print(a.intersection(b))
# is the same as
print(a & b)
# and its associative
print(b & a)
# {3}
Difference
Note: the difference between two sets is not associative. A-B != B-A
# uci_bootcamp_2021/examples/sets.py
print(a.difference(b))
# is the same as
print(a - b)
# {1, 2}
print(b.difference(a))
# is the same as
print(b - a)
# {4, 5}
Further Reading
This is not an exhaustive reference, please consult the standard library documentation for more information.