Recognizing Handwritten Digits with Scikit-learn under Data Analytics using Python

Dhwani Panchal
4 min readFeb 21, 2021

Data analytics is the science of analyzing raw data in order to make conclusions about that information. Many of the techniques and processes of data analytics have been automated into mechanical processes and algorithms that work over raw data for human consumption. It is used for the discovery, interpretation, and communication of meaningful patterns in data. Data analytics techniques can reveal trends and metrics that would otherwise be lost in the mass of information. This information can then be used to optimize processes to increase the overall efficiency of a business or system. Data analysis is not limited to numbers and strings, because images and sounds can also be analyzed and classified.

Scikit-learn (formerly scikits.learn and also known as sklearn) is a free software machine learning library for the Python programming language. It features various classification ,regression and clustering algorithms including support vector machine, random forest, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy. Here we are going to analyze the digits data-set of the Sci-Kit learn library. We are going to train a Support Vector Machine and then we will be predicting the values of a unknown Handwritten digits.

We use jupyter notebook for the the performing operations. So lets get started by first importing the required libraries.

import the libraries — numpy, pandas, pyplot, dataset

There are total 1797 images in the dataset.

The whole data-set is stored in digits. Following is an example of a digit in our dataset. It consists of 64 pixels (8X8).The data set contains images of hand-written digits: 10 classes where each class refers to a digit from 0 to 9.Each image stored as 8x8 matrices as following (for digit 0):

digits.images[0]
Matrix Value for Digit

The 1,797 images in the dataset are 8x8 pixels in size. Each image is a handwritten digit in grayscale, as shown below

Pixel Image in Dataset

It consists of 6 images of 64 pixels each(8X8) of six different numbers.The output of the above test data will be produced as below:

Lets fit our model using SVM classifier. Here we use 1st 1790 images for training the model and remaining are for validation.

Prediction by Model

As we see, both predicted and target values are same for this data. Lets check the model prediction for some other values in datasets.

As we can see we have achieved 100% accuracy. The targeted and predicted values are same. Hence estimator is able to recognize and interpret handwritten digits and interpret all six digits of validation set.

Now we will use KNN classifier to predict the accuracy of our model.

Now let us consider 2 more cases for further checking our model:

last case:

Now let’s build a confusion matrix and classification report.

Conclusion

Thus we successfully imported the dataset and build a model using Scikit-Learn. We were successful in training the model and make prediction using it.

As we can clearly see for above cases we have received 98.33% accuracy. Hence we can easily conclude that our model works accurate for more than 95% of the time. Hence by using Scikit-Learn library in python, data analysis becomes easy ,effective and take less time.

“I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com

--

--