philosopher bagpiper

3 years of CouchSurfing: safety, creepiness and the power of virtual rituals

a galician tune this time, a “muiñeira”, a traditional dance song on a 6/8 beat.

i finally did it! i finally merged and collected the data from the 3 houses. i still haven’t done the repeat visits (very few) and the cleaning/cooking score. but as for the rest of the data, it’s all there, in full relational-database beauty. here is a small analysis on these three years, the truth and the fiction around what CS claims to be proof of safety. this one is going to be like a small paper again, since the findings are very unrewarding.

abstract

CouchSurfing (CS) is the biggest hospitality network online. it provides on its website a series of tips for what it claims is increased safety. i decided to test these claims, more specifically, whether verification or vouching are relevant to the safety of the experience as a host. my study has demonstrated that both verification and vouching are irrelevant for security.

methodology

during the past 3 years, i hosted, without any selection, over 800 guests and collected their CS pages, length of stay and several other variables. of these about 100 were selected during the messaging stage, but a previous analysis showed the selection used did not change the data distribution. from the whole data set i then filtered out only the people whose profile lists their data publicly. from this selection i obtained a total of 477 guests for the analysis. this data is public so i disregarded any further privacy issues, since i don’t use personal information. the data was registered on several google calendars, which were parsed and fed into an SQLite database, with a schema designed for the purpose of this experiment. i include the schema and anonymized data in the end of this article for repeatability.

results

this table groups all data by category. here are their respective meanings:

  • place: the place where the data was collected. the three houses where we collected data, they were previously analyzed on this and other websites
  • sample size: the sample size for each grouping
  • verification level: CS allows for 4 verification levels: 0 not verified, 1 name locked, 2 postcard sent, 3 postcard confirmed
  • vouches: vouches are something people give away when they have vouches themselves. i’ve found them rather useless since the only people with multiple vouches are also usually people highly involved in the community and have many other “reliability symbols”
  • violence: whether there was a physical violence incident
  • theft: whether there was a theft incident
  • creepy: whether there was sexually creepy behavior (e.g., unrequited groping, etc)
  • average, max and min rating: my personal rating of the experience, from 0 to 10, 0 being me not liking the person at all, and 10 loving that person (not necessarily romantic love)

it should be noted that the only creepy instance was from a verified member, but there weren’t enough creepy guests to have statistically significant data.

it should also be noted that the street guests, not featured here, would also add 1 theft and 1 creep, but these guests were part of another social network (the street and the local anarchist punk network) and therefore not relevant for this study.

Place Sample size Verification Level Average Vouches Violence Theft Creepy Average rating Max rating Minimum rating
e8 142 1.17 5.55 10.0 2.5
145 1 2.31 5.56 10.0 2.5
14 2 2.79 4.79 7.5 1.0
50 3 4.0 1 5.28 10.0 0.0
SPCC 8 2.0 6.06 7.5 5.0
20 1 1.2 5.8 10.0 4.0
2 2 4.0 5.25 5.5 5.0
1 3 19.0 6.0 6.0 6.0
_42 34 0.71 5.43 7.0 2.5
44 1 0.55 5.31 7.5 1.5
4 2 0.5 7.38 9.0 5.0
13 3 8.0 5.35 7.0 4.5

conclusions

i conclude that safety on CS is unrelated to verification status and to vouch status. no difference was found between the verified and unverified guests in terms of safety and creepiness (except the one case mentioned).

verification and vouching serve, in my opinion, as virtual rituals to give the impression that the network is safe. while they are irrelevant for security itself, it gives the website users the feeling of safety. it also gives CS the big bulk of its income (verification is paid).

according to these results (and i shall post more), safety is entirely unrelated to any of the factors that one can scrutinize from a CS page: pictures, references, vouching, verification, age, gender, etc. security seems to be a consequence of the negative space of CS. this negative space is the people that don’t join because they are afraid/untrusting/don’t believe the concept and the people that left because of bad experiences. in this sense, security comes from the fact that CS guests are a very small subset of the general population, that have a computer, can read a minimum of english and believe a system like CS can work. anyone outside this frame is very hard to find on CS (i have never found on the whole 800+ group of people that i hosted)

this implies that the claims of safety by CS are false and that the verification serves more as an income source than a security feature. it can also be easily taken advantage, as we saw with the single incident of creepiness reported. but these features create a sense of security (unrelated to real security) that is key for the good functioning of the website. this is what i call the power of virtual ritual: even online, irrational beliefs tend to be perpetuated and gain social value, even when there is no evidence behind them

it is also interesting to see that no place had a significant effect on the security, even considering SPCC was an illegal squat. it hints that security must come from some unaccounted variable and certainly, a variable outside what can be obtained by visiting a CS profile

bonus charts

i now have hundreds of possible charts to do, so i’ll leave two charts that i found interesting around the distribution of the guest ratings. ratings were done using some software that allowed me to rate from 0 to 10 each guest using a continuous slider with .5 steps. this gave me a better “resolution” so i could add more detail to the choices. what i found was what one would expect: the shape is basically a bell curve, which means that people are best modeled by a random variable (a stranger can be someone we love, hate, or anything in between). i also leave the relationship between rating and age to show that age is also irrelevant for the rating, except when the guests are really young, which gives them a slightly lower rating (annoying teenagers, you know what i mean). the high variability of the older guests is due to low sample sizes. i am still working on the software so i get error bars with these charts. i’ll post more goodness soon. if anyone wants any specific charts, just let me know

data for repeatability

note: my license applies, if you want to use this data please respect it. the database was anonymized and can be downloaded here in sqlite3 format. sqlite3 is free software and if you don’t know SQL i can’t help you. if you want a spreadsheet or something like that please request it

SQL query for the table (outputs html-ready rows): sqlite3 -separator “</td><td>” guests.db “select place.name,count(guest.id),verified,round(avg(vouches),2),sum(violence),sum(theft),sum(unrequited_hitting),round(avg(personal_rating),2),max(personal_rating),min(personal_rating) from guest left join stay on stay.person=guest.id left join place on place_id=place.id where guest.age>0”

whenever something is null or empty, that means they do not provide that information, so it’s my recommendation to not include it in any analysis

thanks for reading! and yes, science confirms verification is a scam, CS is safe regardless of it!