A Data Scientist's Playground
A Data Scientist's Playground
Home
Posts
Publications
Friends
technical
A Simple Implementation of Word2Vec
I have always been puzzled and amazed by the idea of “embedding”. A high-dimensional space, such as a corpus for a language, can be represented using, say, only 50 dimensions. How amazing!
Last updated on Apr 17, 2022
4 min read
Building My First Desktop Computer
I have been a Mac user for more than 6 years now, yet occasionally I have been attracted by Linux. Compared with Mac, I like the complete openess of Linux systems and the communications in the Linux user community.
Last updated on Apr 17, 2022
4 min read
Tensorflow for Statisticians (3)
Another important type of data is the time-to-event structured survival data, which I worked on during my entire PhD study. Different from what we usually see, there may be censoring in survival data, which adds one more layer of complication to modeling.
Last updated on Apr 17, 2022
4 min read
Tensorflow for Statisticians (2)
Following my last post on using tensorflow for linear regression, in this post I am going to extend the scope to generalized linear models. >>> import tensorflow as tf >>> import tensorflow_probability as tfp >>> tfd = tfp.
Last updated on Apr 17, 2022
2 min read
Docker in a Nutshell
Docker has been a really convenient tool for me to play with, and it has helped me both in terms of doing research and having fun. Recently I have been advertising Docker to my friends but my explanations are sometimes vague and not well-defined.
Last updated on Apr 17, 2022
6 min read
Using BaiduMap’s Search API in R
Recently I was asked by a friend for help. The background is that people in large cities, such as Shanghai, sometimes want to “escape from the city” and go to a resort in the suburban, or countryside areas.
Last updated on Apr 17, 2022
8 min read
On Academic Writing
It is only until recently that I come to realize the importance of professional academic writing. After revising one of my own manuscripts over and over again with my advisors, and reviewing two papers for journals, I feel it is better if I write down some thoughts.
Last updated on Apr 17, 2022
3 min read
A Glossary of Parallel Computing Packages in R
There are many packages in R that facilitate parallel computing. In this post, I intend to summarize some of the most popular functions/packages for this purpose. 1. The built-in parallel package The package parallel is included in R.
Last updated on Apr 17, 2022
4 min read
Measuring the Performance of Classification Models
Nearly all companies use machine learning to do classification now. When it comes to deciding which model best suits the data, we need to employ some performance measures. An inappropriate measure would lead to a poorly chosen model.
Last updated on Apr 17, 2022
6 min read
A Collection of Docker Images that I Find Useful
Updated 04/14: Tensorflow Updated 04/12: Ubuntu This image is particularly useful when you want to utilize some of the command line tools that come conveniently with linux systems. Also, I manage my Python packages using conda, and there are times when I just need a package temporarily yet it is not available via conda.
Last updated on Oct 1, 2023
3 min read
»
Cite
×