Cleaning only with Pandas

Photo by the author

Today’s story will include:

  1. Based on step 1, how to traverse batch Excel workbooks and consolidate the report
  2. How to save the Excel file to MySQL database and interact with it.

Read and write Excel files is very often operation in daily business. It is not complicated, but cumbersome, there are various cases you might have to face and needs patience to detail to handle them. Today I focus on reading-only Excel files (not modify Excel), cleaning the data only with Pandas, and interact with MySQL.

Introduction to datasets


And other challenges for a CNN project

Photo by Aphex34 on Wiki

Today I would like to show an example of how to calculate the shape and number of parameters for a simple convolutional neural network, also include some other experience. It takes the dog classification project as an example. It is for the entry-level, not tech-savvy, but welcome your professional comments to help me understand the topic better.

Let’s start.

Question 1: Calculate the shape and number of parameters of the CNN model

It is assumed that you have knowledge of the architecture of CNN. …

Photo by vadim kaipov on Unsplash

One of the advantages of a decorator in Python is that it can make the usage of function be extended, but no need to modify the original functions.

Today let’s take an example to check it step by step.

Let’s say I have defined a lot of original functions as below:

The problem: After serval months, I want to print some words to all of the calculation functions (add, minus, and multiply) before and after the calculation to make the process more clear. What should I do?

Method1: Change the original functions: add the printed words to the original…

Understand basic FunkSVD and focus on validation cases

Photo by Solé Bicycles on Unsplash

Today I would like to include the below parts:

  1. Based on 1, make predictions, validation and will give emphasis on the different cases in order to help to understand the validation process

Let’s get started.

FunkSVD: briefly intro

  1. The predicted ratings can be computed as R=HW, where R is the user-item rating matrix, H contains the user’s latent factors and W is the…

How to convert SVD to k dimension and make the prediction

Photo by Georg-Johann from Wiki

We know that there are cons with SVD to make a prediction in reality, for example, it can’t predict if there is even a NaN in the dataset, but as this is the starting point of collaborative filtering and I want to reproduce the procedure with SVD to see how it works, with some compromisation of the dataset.

This story will focus on code realization with SVD, have no offline testing (no splitting of train test dataset), and include some basic linear algebra-related terminologies.

Linear algebra basic

SVD is singular value decomposition. Detail explanation can be found in Wiki. Here…

Photo by the author

Today I want to share a concise function that can transfer text to numbers. Let’s show the original and target data first to get the direct impression.

The original dataset is like below:

squeeze(), unsqueeze(), tensor[None],max(),argmax(), and view()

Pic from github

When I started to learn PyTorch, I found that there are various functions which seem vague to understand for me. Today I would like to summarize them with examples, which I think helpful greatly.

The functions are :

  • unsqueeze() and a[None],
  • max()
  • argmax()
  • view()

You can also check the explanations from the official website one by one, but summarizing them together helps me.

1. squeeze()

squeeze(i): it is kind of dimension reduction: if the original dimension is 1, then it can be reduced. Let’s check an example:

Content-based, weighted content-based, Numpy functions

Photo from Michal Matlon on Unsplash

Today I would like to discuss two examples for content-based recommendation systems and some efficient array functions I learn from them. The two examples are

1: Based on item content recommendation

2: Based on weighted content recommendation

I use a simple movie set as an example and would like to focus on the main process and ignore other processes and special cases. Let’s get started.

Datasets preparing:

Use the below codes to generate two datasets: movie_df and review_df

The two tables as:

Focus on the backend, without frontend and authentication

Flask is the tool that can be used to create API server. It is a micro-framework, which means that its core functionality is kept simple, but there are numerous extensions to allow developers to add other functionality (such as authentication and database support).

Heroku is a cloud platform where developers host applications, databases, and other services in several languages. Developers can use Heroku to deploy, manage, and scale applications. Heroku is also free, with paid specialized memberships, and most services such as a database offer a free tier.


This story will focus on application deployment and database interactive without…

Photo designed by vectorjuice / Freepik

Nowadays, with big data becomes reality, people now focus on how to use the data to realize commercial values. One area which is much more mature is how to picture the potential customer or predict the behavior of the customer, to target the market or customer more precisely.

Problem statement

Bertelsman Arvato Financial Solution provided a real-world challenge in Udacity. Arvato provided four demographics datasets. They are:

  1. Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features…

Xue Wang

passionate about data analysis and data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store