All posts by Anna Panyu Peng

A passionate young researcher with strong passion about technology and profound research experience in Machine Learning; a veteran of development and management of complex schedules and strategic planning; an ever high-quality multi-task performer, and always an proactive initiative taker and doer under pressures and unexpected changes; an effective trilingual communicator with experience in international negotiations and conferences, and a clear project presenter and reporter with experience in research conferences and talks.

WeChat, the “Super APP”

The very product that took my breath away and keeps me using it for at least 50 times a day is WeChat. WeChat is a Chinese multi-purpose social media mobile application software developed by Tencent. It was first released in 2011, and by 2017 it is one the largest standalone mobile apps with over 980 million monthly active users (902 million daily active users). The reasons why I love this APP are analyzed below from two perspectives.

First, WeChat’s various features are designed to accurately satisfy users’ daily needs and build convenience. As an international student from China who studies abroad in the United States, I can chat, voice call or video call my parents and friends in China for free on WeChat. According to Pew Research Center, nearly 820,000 international students were enrolled at U.S. colleges and universities in the 2012-13 year. China is the largest sender of international students to the U.S., with Chinese students accounting for 31.5% (328,547) of all international enrollments (1,043,839) in the U.S., according to the Institute of International Education (IIE).[1] The free video call and voice call feature makes Chinese students’ communication with their families in China much easier and much more frequent. WeChat has closely integrated the features with the physical world. Apart from being a Chinese version of Skype, WhatsApp, and GroupMe, WeChat is also a social media application where each user, besides a username, has a QR code for fast-friending. On social networking occasions, instead of handing out the paper-made business cards, people can simply pull out their QR codes for scanning and can quickly friend each other.

WeChat has a high retention rate because of the network effect whereby it gains additional value as more and more Chinese people and even foreigners use it. In addition, WeChat is a news media that supports all types of bloggers in the range of sports, fashion, women’s health, etc. Everyone can set up their own Official Accounts and start writing articles and gathering followers. Instead of having notifications flooding the phones, users follow such bloggers and get frequent push notifications within the APP. Lastly, WeChat is named as an “Everything APP” also because it supports online payment. Users link their mainland China bank card to WeChat Pay to confirm their identity, and then they can start transferring money or accepting friends’ money.

Nowadays in China, people leave their wallets and credit cards home and all they need for shopping and payment is their phone with WeChat Quick Pay installed. The Chinese Payer shows their barcode or QR Code on WeChat’s Quick Pay page to the Vendor to scan in order to pay directly. This payment solution makes money transactions faster and smoother.

Second, I love WeChat because it gives me a nearly perfect user experience. The User Interface is clear and simple. When you open WeChat, you are presented with four sections at the bottom – Chats, Contacts, Discover, and Me. All these four sections are self-explanatory so that a user can easily navigate through and fully explore the available features. WeChat has a relatively high SUS and a relatively low error rate compared to traditional websites.

WeChat is my favorite, and many others’ favorite APP because it truly builds convenience for users. It has a wide variety of services and features that make my daily life easier. Furthermore, it has integrated these features especially the online payment feature and QR codes with the physical world. Lastly, it has a simple and consistent user interface across all the features.

[1] http://time.com/4569564/international-us-students/

Cannot Feel Much Stronger

I wrote this blog while having my epic voyage from Honolulu to American Samoa in the summer of 2016. 4000 nautical miles sailed in five weeks in a brigantine, with four island stops. I couldn’t be more proud of the crew and myself. Missing Cap Rick, Will, Chief Scientist Jan, Scientist Assistant Janice, Nick, etc. Miss my salty crew mates so so so much.

Noon Position
1°28.7’ S x 170°24.7’W

Description of location
already in PIPA and is about 30 nautical miles to Enderbury Island (so ready to see land and say “land ho!” after two weeks of being in the middle of Pacific ocean!)

Ship Heading
160°

Ship Speed
1.10 kts

Taffrail Log
1624 nm

Weather / Wind / Sail Plan
Cloudy, using all four lowers

Souls on Board

This is Panyu logging in here. After being physically “tortured” by Super Station Deployment (all three nets deployed Neuston Tow, Tuckertroll Deep and Shallow, and yet another Hydrocast), a hectic hour-long lab practical exam, a totally out-of-expectation fire drill on the boat, getting stranded and soaked in a sudden squall while on watch, and finally finishing up the first policy draft for the Conservation and Management class, I am now EXHAUSTINGLY HAPPY and CANNOT FEEL MUCH STRONGER!

Today righteously marks a total of two weeks of stay in the Robert C. Seamans. Two weeks ago on the 3rd of July, we group of young people excitingly boarded this brigantine in Pier 9 in Honolulu. These two weeks to me are like a long span of two years. Yet they are a fast two years of repeating the same errands every single day. Being woken up for either breakfast or dinner forty minutes before the start of the six-hour watch, routinely putting on my harness like a soldier buckling his gun and sword, half-awake and half-asleep, dragging my body to the quarterdeck for a quick turn over with the previous watch, then either heading to the head rig for a bow watch, leaving the deck for the galley to begin the daily cleanup, or diving into the buzzy engine room logging all those fluctuating numbers… If you are still reading, I really appreciate your patience, just like I appreciate my own considering that I have not yet jumped off the boat for an
attempt of swimming back home and I am still alive keeping myself after tasks for fourteen days in a row. Yes, patience is the keyword for today, for tomorrow, and for the rest of the expedition.

My first boat dream is me lying on a peaceful beach with fruit margarita by hand and magnificent Pacific sunset in the background. As a recent graduate from college who finished her undergraduate degree in honor in mathematics and computational engineering in three years and who has a decent job lined up in September, I stubbornly thought I deserved a much better post-graduate summer holiday experience than being stranded on a boat doing tedious labor work (well they can be tediously interesting).

However, the reason that I am still mentally upbeat for all the repetitions and daily hustle bustle and the reason that I have not yet gone overboard is that I firmly believe there is something out there calling me to stay stronger and more patient. There is something subtly beautiful waiting for me to discover.

I can never forget the ecstasy, after days of disappointment, of beholding the first ever appendicularian sitting under the microscope after sorting out the 40 ml tucker trawl deep samples for hours. For many times, I was assigned to the task of steering the ship during dawn watch (from 1:00am to 7:00am), but I was too exhausted and sleepy and ended up with hugging the helm as if it was my pillow. My watch officer Ashley would then pat my shoulder, point to the east, and say, “Hey, Panyu, see the sky over there? The sunrise is happening!” The horizon far, far away was dyed in layers of colors that I have never seen in my life. I can never forget the relief of showering in the first ray of sunlight and beholding the newborn sun rising slowly after hours of rush in and out of the doghouse and work on deck. The other day I was lying on the net near the bow. I was hung up in the air above the roaring ocean. I felt like flying with the boat like a tropicbird. Furthermore, I had the best view of the ship from where I sat.

Robert C. Seamans is such a quietly brilliant warrior who steadily marches to his destination day and night.  Soon tomorrow morning we will be close enough to see Enderbury Island! A lot of expectations for the first PIPA atoll island we will see. So far a moral lesson learned is that the beautiful things are the fruit of patience and discovery. Stay strong and stay patient!

Before I log off, I want to let my family and friends know that I miss you all so much! I want to tell you that I am doing very well and please do not worry about me. I will be back safe and sound very soon.

To every best version of you,
Panyu Peng

What do Chinese Successful Corporate Entrepreneurs Think and Do?

Today I watched a couple of interviews of Chinese successful corporate big heads such as the CEO of GREE Electric, Ms. Mingzhu Dong and the head of Wanda Group, a multinational conglomerate company, Mr. Jianlin Wang.

I was deeply touched by their thought leadership – ponder on and do things that are higher than you.

I jogged down below a couple of sentences that stroke me and inspired me.

1. A person’s capability and ability is not the most important thing. Being loyal to principles and disciplines determines your success.

2. There is no absolute justice or fairness. Some people choose to sacrifice themselves to make other people’s lives better and these are their choices out of willingness.

3. Be tough on yourself.

4. When you are in a cozy and comfortable environment, your thoughts are on slack. You better live in an inferior environment to best examine yourself and maintain clear thoughts.

5. People who are not hated by others are not perfect people.

A Web-based MVP Demo for BitTiger

Exciting! My first time leading a team of four students – two in UI/UX Design and two in Full Stack Software Engineering to deliver a minimum viable product (MVP) for BitTiger, an online lifelong e-learning platform. This preliminary set of features aims to facilitate smoother and more efficient communication between course representatives and students (customers or potential customers).

In the demo, I presented three features:

  • Pop-up Box
  • Online Course Chatroom
  • Built-in Push Notification

Check out the video and all feedback is welcome😊

Updates about parallelization of Greedy Coordinate Descent Method

I have added parallelization of “for loop” for the Greedy Coordinate Descent method for kernel SVM (L2 empirical risk minimization problem), random partitioned gradient array into omp_get_num_threads() numbers of subarrays and chose the best variables for each thread to update gradient, and applied atomic mechanism to the updating of gradient to avoid conflict write and loss of information. However, the parallelized DCD (dual coordinate descent) doesn’t converge as fast as I expect when it is multi thread (4 threads only speed up 0.0001 second compared to single thread and doesn’t converge to the same optimum on each running of code)

I diagnosed that there would be sth wrong with the way I compute my kernel matrix when it is in the multi core setting. Hopefully I will fix this by the end of this week.

Cheers.

 

Gaussian Kernel

I found an intuitive post on Gaussian Kernel on  Jesse Johnson‘s blog. Enjoy!

Gaussian kernels

In order to give a proper introduction to Gaussian kernels, this week’s post is going to start out a little bit more abstract than usual. This level of abstraction isn’t strictly necessary to understand how Gaussian kernels work, but the abstract perspective can be extremely useful as a source of intuition when trying to understand probability distributions in general. So here’s the deal: I’ll try to build up the abstraction slowly, but if you ever get hopelessly lost, or just can’t take it any more, you can skip down to the heading that says “The practical part” in bold – That’s where I’ll switch to a more concrete description of the Gaussian kernel algorithm. Also, if you’re still having trouble, don’t worry too much – Most of the later posts on this blog won’t require that you understand Gaussian kernels, so you can just wait for next week’s post (or skip to it if you’re reading this later on).

Recall that a kernel is a way of placing a data space into a higher dimensional vector space so that the intersections of the data space with hyperplanes in the higher dimensional space determine more complicated, curved decision boundaries in the data space. The main example that we looked at was the kernel that sends a two-dimensional data space to a five-dimensional space by sending each point with coordinates (x,y) to the five-dimensional point with coordinates (x,y,x^2, y^2, x^y). If we wanted to give ourselves even more flexibility, we could pick an even higher dimensional kernel, for example by sending the point (x,y) to the point (x,y,x^2, y^2, x^y, x^3, x^2y, xy^2, y^3) in a nine-dimensional space.

This week, we’re going to go beyond higher dimensional vector spaces to infinite-dimensional vector spaces. You can see how the nine-dimensional kernel above is an extension of the five-dimensional kernel – we’ve essentially just tacked on four more dimensions at the end. If we keep tacking on more dimensions in this way, we’ll get higher and higher dimensional kernels. If we were to keep doing this “forever”, we would end up with infinitely many dimensions. Note that we can only do this in the abstract. Computers can only deal with finite things, so they can’t store and process computations in infinite dimensional vector spaces. But we’ll pretend for a minute that we can, just to see what happens. Then we’ll translate things back into the finite world.

In this hypothetical infinite-dimensional vector space, we can add vectors the same way that we do with regular vectors, by just adding corresponding coordinates. However, in this case, we have to add infinitely coordinates. Similarly, we can multiply by scalars, by multiplying each of the (infinitely many) coordinates by a given number. We’ll define the infinite polynomial kernel by sending each point (x,y) to the infinite vector (x,y,x^2, y^2, x^y, x^3, x^2y, xy^2, y^3, x^4,\ldots). In particular, every monomial in the variables x and y, such as x^7y^42 or y^{10,000} will appear in one of the entries of this kernel, possibly very far down the sequence.

In order to get back to the computational world, we can recover our original five-dimensional kernel by just forgetting all but the first five of the entries. In fact, the original five-dimensional space is contained in this infinite dimensional space. (The original five-dimensional kernel is what we get by projecting the infinite polynomial kernel into this five-dimensional space.)

Now take a deep breath, because we’re going to take this one step further. Consider, for a moment, what a vector is. If you ever took a mathematical linear algebra class, you may remember that vectors are officially defined in terms of their addition and multiplication properties. But I’m going to temporarily ignore that (with apologies to any mathematicians who are reading this.) In the computing world, we usually think of a vector as being a list of numbers. If you’ve read this far, you may be willing to let that list be infinite. But I want you to think of a vector as being a collection of numbers in which each number is assigned to a particular thing. For example, each number in our usual type of vector is assigned to one of the coordinates/features. In one of our infinite vectors, each number is assigned to a spot in our infinitely long list.

But how about this: What would happen if we defined a vector by assigning a number to each point in our (finite dimensional) data space? Such a vector doesn’t pick out a single point in the data space; rather, once you pick this vector, if you point to any point in the data space, the vector tells you a number. Well, actually, we already have a name for that: Something that assigns a number to each point in the data space is a function. In fact, we’ve been looking at functions a lot on this blog, in the form of density functions that define probability distributions. But the point is, we can think of these density functions as vectors in an infinite-dimensional vector space.

How can a function be a vector? Well, we can add two functions by just adding their values at each point. This was the first scheme we discussed for combining distributions in last week’s post on mixture models. The density functions for two vectors (Gaussian blobs) and the result of adding them are shown in the Figure below. We can multiply a function by a number in a similar way,  which would result in making the overall density lighter or darker. In fact, these are both operations that you’ve probably had lots of practice with in algebra class and calculus. So we’re not doing anything new yet, we’re just thinking about things in a different way.

addvectors

The next step is to define a kernel from our original data space into this infintie dimensional space, and here we have a lot of choices. One of the most common choices is the Gaussian blob function which we’ve seen a few times in past posts. For this kernel, we’ll choose a standard size for the Gaussian blobs, i.e. a fixed value for the deviation \sigma. Then we’ll send each data point to the Gaussian function centered at that point. Remember we’re thinking of each of these functions as a vector, so this kernel does what all kernels do: It places each point in our original data space into a higher (in fact, infinite) dimensional vector space.

Now, here’s the problem: In order to bring things back to the computational world, we need to pick out a finite dimensional vector space sitting in this infinite dimensional vector space and “project” the infinite dimensional space into the finite dimensional subspace. We’ll choose a finite-dimensional space by choosing a (finite) number of points in the data space, then taking the vector space spanned by the Gaussian blobs centered at those point. This is the equivalent of the vectors pace defined by the first five coordinates of the infinite polynomial kernel, as above.  The choice of these points is important, but we’ll return to that later. For now, the question is how do we project?

For finite dimensional vectors, the most common way to define a projection is by using the dot product: This is the number that we get by multiplying corresponding coordinates of two vectors, then adding them all together. So, for example the dot product of the three-dimensional vectors (1,2,3) and (2,.5,4) is 1 \cdot 2 + 2 \cdot .5 + 3 \cdot 4 = 15.

We could do something similar with functions, by multiplying the values that they take on corresponding points in the data set. (In other words, we multiply the two functions together.) But we can’t then add all these numbers together because there are infinitely many of them. Instead, we will take an integral! (Note that I’m glossing over a ton of details here, and I again apologize to any mathematicians who are reading this.) The nice thing here is that if we multiply two Gaussian functions and integrate, the number is equal to a Gaussian function of the distance between the center points. (Though the new Gaussian function will have a different deviation value.)

In other words, the Gaussian kernel transforms the dot product in the infinite dimensional space into the Gaussian function of the distance between points in the data space: If two points in the data space are nearby then the angle between the vectors that represent them in the kernel space will be small. If the points are far apart then the corresponding vectors will be close to “perpendicular”.

The practical part

So, lets review what we have so far: To define an N-dimensional Gaussian kernel, we first choose N points in the data space. We can then calculate the kernel coordinates of any point in the data space by calculating its distance to each of these chosen data points and taking the Gaussian function of the distances.

To better understand how this kernel works, lets figure out what the intersection of a hyperplane with the data space looks like. (This is what is done with kernels most of the time, anyway.) Recall that a plane is defined by an equation of the form a_1 x_1 + a_2 x_2 + \cdots + a_N x_N = b where (x_1,\ldots,x_N) are the coordinates of the point (in the higher dimensional kernel space) and a_1,\ldots,a_N are parameters that define the hyperplane. If we’re using a Gaussian kernel then, thanks to our version of the dot product, the values (x_1,\ldots,x_N) measure the distances to our N chosen points. The decision boundary is thus the set of points for which the Gaussian function of the distances to these N points satisfy this equation.

That’s still pretty hard to unpack, so lets look at an example where each of the values a_1,\ldots,a_K is either 1 or -1. Then near each data point with label a_i = 1, the value x_i will be very close to 1, while the other values x_j will be small, so the sum a_1 x_1 + a_2 x_2 + \cdots + a_N x_Nwill be positive. Similarly, near a point with a_i = -1, the sum will be negative. Thus if b = 0 then the decision boundary will separate the positive points from the negative points. In fact, it will carve out a region reminiscent of the Gaussian balls that define the kernel. One example is indicated on the left in the Figure below, where the colors indicate whether the coefficients are positive or negative. As you can see, the result looks something like a smooth version of the nearest neighbors algorithm.

gaussianclassif

If we adjust the parameters a_1,\ldots,a_K, this has the effect of changing the sizes of the Gaussian balls around the points, and thus moves the decision boundary towards or away from them, as on the right of the Figure. If a coefficient switches from positive to negative, the decision boundary will move from one side of a point to the other. If we have a labeled data set (which may or may not coincide with the N points that define the Gaussian kernel) then training a linear classification algorithm (such as SVM or logistic regression) in the kernel space corresponds to moving this decision boundary around, within the constraints defined above, to maximize how many of the data points are on the correct side.

So, this gives us more flexibility for choosing the decision boundary (or, at least, a different kind of flexibility) but the final result will be very dependent on the N vectors that we choose. If we choose too many (such as if we let the N points that define the kernel be the same as the data points) then we will risk overfitting, similar to how the nearest neighbor algorithm tends to lead to overfitting. What we really want is a small number of points that are evenly distributed throughout the set, ideally such that each of the N points is close to mostly points in the same class.

Finding such a collection of points is a very different problem from what we’ve been focusing on in the posts so far on this blog, and falls under the category of unsupervised learning/descriptive analytics. (In the context of kernels, it can also be thought of as feature selection/engineering.) In the next few posts, we’ll switch gears and start to explore ideas along these lines.

Living in the Office

I moved to live in my cubicle at GDC CS department last night. I slept in the conference room as it is the only open room without lights. I find it very efficient in the sense that it sets a regular life schedule for me: I am able to go to bed (sleeping bag) on time as the main lights at Dell complex are turned off every night at 12:00pm, and get up very early in the morning and quickly get back to my cubicle to be immersed in the working and learning environment.

 

 

[Summer Reading]: Matrix Computation and Numerical Stability/ Accuracy

During my short stay at Stanford University last week, I visited a young assistant professor Jack Poulson who has innovation in algorithmic design and expertise in fast algorithm. He has recently added support for distributed dense and sparse-direct Linear, Quadratic, and Second-Order Cone Programs into Elemental and begun integration into more user-friendly tools (e.g., CVXPY).

During our conversations, he kindly recommended great books on linear algebra and numerical analysis. They are as follows:

1) Trefethen and Bau, “Numerical Linear Algebra” (a friendly advanced undergraduate level book on numerical linear algebra)

2) Higham, “Accuracy and Stability of Numerical Algorithms” (a graduate-level book on numerical linear algebra; I recommend reading this *after* Trefethen and Bau)

3) Golub and van Loan, “Matrix Computations” (*the* reference for numerical linear algebra; it’s a bit too terse for an introduction)

4) Horn and Johnson, “Matrix Analysis” (a great reference for non-numerical linear algebra; it would be okay to read this in tandem with Trefethen and Bau)

I bought “Matrix Computation by Gene Golub” today and “Accuracy and Stability of Numerical Algorithms by Nicholas Higham“. I plan to start reading the book on numerical algorithm first as soon as I get the book. (hopefully I can earn some scholarship next semester to cover the expense!)

Happy reading summer!