Về Chương 6
Before/After Data Diff
Data Transformation Diff
So sánh dữ liệu thô (Raw) và dữ liệu sau làm sạch (Cleaned)
Trim
Imputed (NA)
Type Cast
Outlier Dropped
Duplicate
Dữ Liệu Thô (Raw Data)
ID
Name
Age
Income
Status
1
John Doe
25
15000
Active
2
Alice Smith
NaN
22000
Active
3
Bob
30
99999999
Inactive
4
Eve
22
18000
ACTIVE
5
Charlie
"45"
35000
Active
5
Charlie
"45"
35000
Active
7
Dave
38
Pending
Dữ Liệu Sạch (Cleaned)
ID
Name
Age
Income
Status
1
John Doe
25
15000
Active
2
Alice Smith
32
22000
Active
--- Row dropped: Outlier detected ---
4
Eve
22
18000
Active
5
Charlie
45
35000
Active
--- Row dropped: Duplicate detected ---
7
Dave
38
28500
Pending