进阶 pandas之DataFrame(二)

发布于:2023-01-21 ⋅ 阅读:(220) ⋅ 点赞:(0)

学习第二天,将会每天坚持打卡学习就学习,如果觉得有帮助可以点赞一哈!!!

# 1.将空值用上下的平均值去填充

1)导入库并且建立一个DataFrame

import pandas as pd
import numpy as np
data={"course":["A","B","C","D","E",np.nan,"F","G"],"grade":[22,34,45,45,67,np.nan,53,23]}
df=pd.DataFrame(data)
df
 
course grade
0 A 22.0
1 B 34.0
2 C 45.0
3 D 45.0
4 E 67.0
5 NaN NaN
6 F 53.0
7 G 23.0

2)开始填充

df["grade"]=df["grade"].fillna(df["grade"].interpolate())
df
course grade
0 A 22.0
1 B 34.0
2 C 45.0
3 D 45.0
4 E 67.0
5 NaN 60.0
6 F 53.0
7 G 23.0

# 2.按照grade列进行去除重复值

df.drop_duplicates(["grade"])
course grade
0 A 22.0
1 B 34.0
2 C 45.0
4 E 67.0
5 NaN 60.0
6 F 53.0
7 G 23.0

# 3.将grade列转化为list

df["grade"].to_list()
[22.0, 34.0, 45.0, 45.0, 67.0, 60.0, 53.0, 23.0]

# 4.计算grade列的平均值

df["grade"].mean()
43.625

# 5.获取grade,course列

1)第一种方法


df.grade
0    22.0
1    34.0
2    45.0
3    45.0
4    67.0
5    60.0
6    53.0
7    23.0
Name: grade, dtype: float64

2)第二种方法

df["grade"]
0    22.0
1    34.0
2    45.0
3    45.0
4    67.0
5    60.0
6    53.0
7    23.0
Name: grade, dtype: float64

看到有些小伙伴私信我,这些怎样可以记住,我总结了一哈:

注释1:

常用的方法有(一般末尾带’()‘):eg:以生成的对象df2为例子,1.抽样查看数据:head,tail,sample,take        df.head()        df2.tail()        df2.loc[:,'pop']        df2.iloc[:,1]        df2.sample(n=3)sample_idx = p.random.permutation        (3) df2.take(sample_idx)        2.drop删除记录/字段data = pd.DataFrame(np.arange(16).reshape((4,4)),index=['Ohio', 'Colorado', 'Utah', 'New York'], columns=['one','two','three','four'])
data.drop(['New York'],inplace=False)         # inplace=False,不在原对象上进行操作,返回新对象
data.drop(['Ohio'],inplace=True)              # inplace=True,在原对象上进行操作,不返回新对象
data.drop(['one'],axis=1,inplace=True)        # 删除列
## 3.函数应用和映射:applymap,apply
data = pd.DataFrame(np.random.randn(3,4), columns=list('abcd'))
np.abs(data)

### 1)将函数应用在每个元素上:精度保留到0.01
format = lambda x: '%.2f' %x
display(data.applymap(format))
display(data.a.map(format))

### 2)将函数应用在一行或一列的数组上:结果为标量或数组
f = lambda x:x.max() - x.mean()
data.apply(f,axis=0)

def f1(x):
    return pd.Series([x.min(),x.max()],index=['min','max'])

data.apply(f1)

注释2: 而不带括号的是属性,eg:df2.index,df2.values,df2.columns。

 


网站公告

今日签到

点亮在社区的每一天
去签到