Introduction
It is very important when working with NumPy arrays to be aware of whether the array you are working with owns its own data or whether it is simply a view of another array. If the array is a view, then modifying the data in the view will translate through to the array that actually owns that data, modifying the parent array.
NumPy is designed to handle large datasets and as such it would be computationally expensive and memory intensive to store every representation of an array in a separate memory location. For many array actions (such as slicing or some instances of reshaping) it is more efficient to refer to the original array and store a view of the new array which references the original. This is referred to as a view or a shallow copy.
If you are not aware of the difference between views and copies you could run into issues of unintentially overwriting array data.
Internal Organization of NumPy Arrays
It is useful to quickly describe how NumPy arrays are structured as this leads directly to whether views or copies of the array are being accessed.
NumPy arrays are composed of two major components: the raw data array (called the data buffer) and all the metadata attached to the array. The data buffer is the contiguous and fixed memory block where the numerical information is stored.
The array also stores a lot of metadata information about itself which is used to describe how the array should be interpreted. This metadata includes such information as:
- The array dimensions and size.
- The data element's size (bytes).
- The type of data stored (via the
dtype
object) . - The stride of the data. Physical memory is one-dimensional and so the stride provides a map to link the elements of the array to the location of the data in memory.
- The byte order of the data.
By building the NumPy array in this way, simple changes to the metadata can be made to change the interpretation of the array buffer. This allows the shape of the array to be changed by creating a new metadata object but not changing the underlying data buffer.
This is the fundamental theory behind the creation of views: creating a new ndarray
object but referencing the original data buffer.
Array Views
An array view (or shallow copy) then is a new ndarray
object that references an existing data buffer, but with its own set of metadata to create a new view of the referenced array.
A view is created when elements can be addressed with offsets and strides in the original array. Basic indexing and array slicing always creates a view of the original data.
Indexing and Slicing
To illustrate a view in action we'll create an array, slice it, and then modify the slice and compare the slice to the orginal array.
When working with a view, any element value changes made to that view will reflect in the original array.
>>> arr = np.array([[4,44,444],[5,55,555],[6,66,666]])
>>> print(arr)
[[ 4 44 444]
[ 5 55 555]
[ 6 66 666]]
# slicing an array returns a view of it.
>>> slice_arr = arr[:,:2]
>>> print(slice_arr)
[[ 4 44]
[ 5 55]
[ 6 66]]
# now modify the sliced array
>>> slice_arr[2,1] = 77
>>> print(slice_arr)
[[ 4 44]
[ 5 55]
[ 6 77]]
# now print the original array
>>>print(arr)
[[ 4 44 444]
[ 5 55 555]
[ 6 77 666]]
Flags and Base
We can use the base
attribute to determine whether an array is a view or not. An array that is not a view will return None
when the base attribute is called.
>>> print(arr)
[[ 4 44 444]
[ 5 55 555]
[ 6 66 666]]
>>> print(arr.base)
None # arr is original array
>>> print(slice_arr)
[[ 4 44]
[ 5 55]
[ 6 77]]
>>> print(slice_arr.base is None)
False # slice_arr is not None therefore is a view
You can also check whether an array is a view or not by calling the flags
attribute on the array. This will return a flags object with an OWNDATA
attribute which returns a boolean True
if the array owns the data or False
if the array does not (therefore a view).
>>> print(arr.flags) # original array
C_CONTIGUOUS : True
F_CONTIGUOUS : False
OWNDATA : True
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
>>> print(slice_arr.flags) # view of original array
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
WRITEBACKIFCOPY : False
UPDATEIFCOPY : False
Copy
If you need to create a new array with no reference to the original but that would otherwise be a view, you should use the copy
method to create a copy of the array. This means that a new data buffer will be allocated in memory, and all links to the original array will be lost. A copy is of course slower and consumes more memory, but necessary in some cases.
Making changes to the copy will have no influence on the original array that the copy was originally created from.
>>> arr = np.array([[3,4,5,6,4],[2,4,3,1,3],[7,6,5,2,2],[3,4,3,2,2]])
>>> arr2 = arr[:,0:2].copy() # make a deep copy of the array
>>> arr2[0,1] = 999 # modify arr2
>>> print(arr2)
[[ 3 999]
[ 2 4]
[ 7 6]
[ 3 4]]
# now print the original array
>>> print(arr)
[[3 4 5 6 4]
[2 4 3 1 3]
[7 6 5 2 2]
[3 4 3 2 2]]
One usecase for copy
may be if you are slicing a very large array into a much smaller one and need to only perform further processing on the small sliced array. Working with the slice as a view would still require that the large parent array be kept in memory; forcing the smaller array into a copy and then deleting the original array would free up that memory.
We've come to the end of this shorter tutorial on the difference between NumPy views and copies. If you found this helpful then please share this content to help us to grow our knowledge hub. Doing so will allow us to create even more high-quality free content and grow our community. Thanks for reading.