MindMap Gallery pandas data structure
Pandas is a library created based on NumPy. It briefly introduces the data structure (DataFrame, Series) and mutual relationships of pandas.
Edited at 2022-06-06 00:06:13One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
One Hundred Years of Solitude is the masterpiece of Gabriel Garcia Marquez. Reading this book begins with making sense of the characters' relationships, which are centered on the Buendía family and tells the story of the family's prosperity and decline, internal relationships and political struggles, self-mixing and rebirth over the course of a hundred years.
Project management is the process of applying specialized knowledge, skills, tools, and methods to project activities so that the project can achieve or exceed the set needs and expectations within the constraints of limited resources. This diagram provides a comprehensive overview of the 8 components of the project management process and can be used as a generic template for direct application.
No relevant template
pandas data structures
DataFrame is the container of Series
Series: labeled, one-dimensional array DataFrame: Labeled, variable-sized, two-dimensional heterogeneous table
Series
1. Series creation method
1||| Using lists and tuples
data=list/tuple
2||| Use ndarray
data=ndarray
3||| use dictionary
data=dict
key: index
value: data
4||| Use scalars
data=value
s=pd.Series(data[,index=index,name=name])
data
Python object, ndarray, a scalar (fixed value)
index
Specify index, list, default [0,1,2,...,len(data)-1]
name
Specify Series name
dtype
Specify data type
2. Description of the Series object
Properties of Series objects
shape
shape
size
size
index
index tag
values
value(ndarray)
Methods of Series objects
head(x)
Get the specified number (x) of data in front of the object
tail(x)
Get the specified number (x) of data behind the object
DataFrame
1. DataFrame creation method
1||| dictionary
Key: column name
Value: specific data (list/tuple)
2||| Dictionary of Series
Key: column name
Value: Series
3||| list of dictionaries
Each dictionary is a row of data
4||| Series generation
A Series generates a DataFrame with only one column
df.DataFrame(data=None,index=None,columns=None)
data
Specific data, structured or isomorphic ndarray, iterable object, dictionary or DataFrame
index
Specify the index, default RangeIndex(0,1,2,...)
columns
Header (column label), default RangeIndex(0,1,2,...)
dtype
Specify data type
5||| Other methods
pd.DataFrame.from_dict(dict)
pd.DataFrame.from_records(list\dict darray)
pd.json_normalize(df.col)
pd.col.apply(pd.Series)
2. Description of the DataFrame object
df.info()
Use the info method to get df information
Type of object, row index, column index information, column label of each column of data, number and data type of non-missing values, memory size occupied, etc.
df.dtypes
Use the dtypes attribute to get the type of data in each column of df
df.shape
Use the shape attribute to get the number of rows and columns of df
Return as tuple
len(df)
Use the len function to get the number of rows and columns of df
len(df)
Rows
len(df.columns)
Number of columns
df.index
Use the index attribute to get the row index label of df
df.columns
Use the columns attribute to get the column index label of df
df.values
Use the values attribute to get the value of df
df.head(n)
Use the head method to get the first n rows of data, default n=5
df.tail(n)
Use the tail method to obtain the last n rows of data, default n=5
df.describe()
Use the describe method to obtain the descriptive statistics of each column of df data.
Including the number of data, mean, standard deviation, minimum value, 25% quantile, median, 75% quantile, maximum value, etc.
NumPy
NumPy is a high-performance scientific computing library for matrix operations in Python.
Two basic objects of NumPy
ndarray
Multidimensional array to store data
ufunc
Functions that process arrays
DataFrame is a dictionary
key, header
Value, data bar (Series)