A concise function to transfer text to number in DataFrame

Annie Wang
2 min readJun 26, 2021

--

Photo by the author

Today I want to share a concise function that can transfer text to numbers. Let’s show the original and target data first to get a direct impression.

The original dataset is like below:

The target dataset is as below:

The dataset analysis

It shows which customer has purchased which products, of course, the dataset can be expanded by adding more information(columns), like time, and buy amount, etc, to show more detailed information. I want to focus on the main problem, so I delete other information.

The customer id has been anonymized, not so easy to analyze, so I would like to transfer them into numbers. From the datasets, we can see that nor the product id neither customer id is unique.

The function :

The function analyzing

This function uses a dictionary(coded_dict) to save the customer id in text (val) and response number (counter). It used customer id in text as dictionary key, customer id in number as value.

Then it uses a list(customer_encoded) to save the customer id in number.

Once the customer id in text(val) is already in the dictionary, nothing will be done to coded_dict, except that

  • using the key(val) points to the value (coded_dict[val])
  • adding the value to the list (customer_encoded.append(coded_dict[val])

Once the customer id in text(val) is not in the dictionary, a new record will be added to the dictionary:

if val not in coded_dict:
coded_dict[val] = counter
counter+=1

Then repeat the above action.

Finally, return the responded list (return customer_encoded).

Once the function is defined, execute below the execution function. First, it gives the returned list to customer_id_num, deletes customer id in text, then add the customer id in number.

Is it concise and efficient? And it can be extrapolated to similar situations.

Thank you for your time.

--

--