You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -62,9 +65,162 @@ Here's what the dirty dataframe looks like.
62
65
11 NaN
63
66
12 NaN
64
67
65
-
Notice how there's an entire row of null values (row 7), as well as two columns of null values (`do not edit! --->` and `Certification.2`).
68
+
Cleaning Column Names
69
+
---------------------
70
+
71
+
There's a bunch of problems with this data. Firstly, the column names are not lowercase, and they have spaces. This will make it cumbersome to use in a programmatic function. To solve this, we can use the :py:meth:`clean_names` method. Firstly, we pass the dataframe to the :py:class:`janitor.DataFrame()` constructor (just a thin wrapper, really). Then, we call on the :py:meth:`clean_names()` class method.
72
+
73
+
.. code-block:: python
74
+
75
+
df_clean = jn.DataFrame(df).clean_names()
76
+
print(df_clean.head(2))
77
+
78
+
Notice now how the column names have been made better.
If you squint at the unclean dataset, you'll notice one row and one column of data that are missing. We can also fix this! Building on top of the code block from above, let's now remove those empty columns using the :py:meth:`remove_empty()` method:
Next, let's rename some of the columns. `%_allocated` and `full_time?` contain non-alphanumeric characters, so they make it a bit harder to use. We can rename them using the :py:meth:`rename_column()` method:
Note how now we have really nice column names! You might be wondering why I'm not modifying the two certifiation columns -- that is the next thing we'll tackle.
148
+
149
+
Coalescing Columns
150
+
------------------
151
+
152
+
If we look more closely at the two `certification` columns, we'll see that they look like this:
Rows 8 and 11 have NaN in the left certification column, but have a value in the right certification column. Let's assume for a moment that the left certification column is intended to record the first certification that a teacher had obtained. In this case, the values in the right certification column on rows 8 and 11 should be moved to the first column. Let's do that with Janitor, using the :py:meth:`coalesce()` method, which does the following:
0 commit comments