State of Data Engineering 2024 Q2

Data Engineering and AI

Chip Huyen, who came out of Stanford and is active in the AI space recently wrote an article on what she learned by looking at the 900 most popular open source AI tools.

https://huyenchip.com/2024/03/14/ai-oss.html

Image Credit: huyenchip’s blog

In data engineering, one of our primary usages of AI is really just prompt engineering.

Use Case 1: Data Migration

Before LLMs, when we did data migrations, we would use Amazon Schema Conversion Tool (SCT) first to help convert source schemas to a new target schema. Let us say we are going from SQLServer to Postgres, which is a major language change.

From there, the hard part begins where you need to manually convert the SQL Server SQL business logic code to Postgres. Some converters do exist out there, and I assume they work on a basis of mapping a language grammar from one to another (fun fact – I almost pursued a PhD in compiler optimization, but bailed from the program).

Now what we can do is use LLMs to convert a huge set of code from one source to a target using prompt engineering. Despite a lot of the new open source models out there, Chat GPT 4 still seems to be outperforming the competitors for the time being in doing this type of code conversions.

The crazy thing is with the LLMs, we can convert really one source system to any source system. If you try it out Java to C#, SQL to Spark SQL, all work somewhat reasonably well. In terms of predictions of our field I see a couple things progressing

Phase 1 Now:

Productivity gains of code conversions using LLMs
Productivity gains of coding itself of tools like Amazon Code Whisperer or Amazon Q or LLM of your choice for faster coding
Productivity gains of learning a new language with LLMS
Debugging stack traces by having LLMs analyze it

Phase 2: Near Future

Tweaks of LLMs to make them more deterministic for prompt engineering. We already have the ability to control creativity with the ‘temperature’ parameter, but we generally have to give really tight prompt conditions to get some of the code conversions to work. In some of our experimentations with SQL to SparkSQL, doing things like passing in the DDLs have forced the LLMs to generate more accurate information.

An interesting paper about using chain of thought with prompting (a series of intermediate reasoning steps), might help us move towards this
Arxiv paper here – https://arxiv.org/abs/2201.11903
In latent.space’s latest newsletter, they mentioned a citation of a paper adding “Let’s think step by step” improved zero shot reasoning from 17 to 79%. If you happen to DM me and say that in an introduction I will raise an eyebrow.
latent.space citation link

Being able to use LLMs to create data quality tests based on schemas or create unit tests based off existing ETL code.

Phase 3: Future

The far scary future is where we tell LLMs how to create our data engineering systems. Imagine telling it to ingest data from S3 into an Open Table Format (OTF) and to write business code on top of this. I kind of don’t see this for at least 10ish years though.

Open Table Format Wars – Continued

The OTF wars continue to rage with no end in site. As a refresher, there are 3 players

Apache Hudi – which came out of the Uber project
Apache Iceberg – which came out of the Netflix project
Databricks Deltalake.

As a reminder, OTFs provide features some as time travel features, incremental ETL, deletion capability, and schema evolution-ish capability depending on which one you use.

Perhaps one of the biggest subtle changes which has recently happened is that the OneTable project is now Apache X Table.

https://xtable.apache.org

Apache X Table is a framework to seamlessly do cross-table work between any of the OTFs. I still think this is ahead of its time because I haven’t seen any project that have needs to combine multiple OTFs in an organization. My prediction though is in 5-10 years this format will become a standard to allow vendor interoperability, but it will take a while.

Apache Hudi Updates

Newsletter – https://hudinewsletter.substack.com/ – because we all can’t get enough Substack in our lives, Hudi now has a newsletter you can check for updates
For those who want a high level overview of Hudi, this blog is a nice quick read https://defogdata.substack.com/p/table-format-series-apache-hudi-hadoop
Our favorite data influencer Soumil Shah has an article about the Apache X Table (which I don’t think it is relevant for now, but still is an interesting read) – https://www.linkedin.com/pulse/advanced-data-management-building-multi-modal-indexing-soumil-shah-ux95f/

Apache Iceberg Updates

Views are now available! – https://github.com/apache/iceberg/releases/tag/apache-iceberg-1.5.0
AWS Blog on how to do schema evolution – https://aws.amazon.com/blogs/big-data/use-aws-glue-etl-to-perform-merge-partition-evolution-and-schema-evolution-on-apache-iceberg/
Tabular article on mirror CDC https://tabular.io/blog/mirroring-data-cdc-in-apache-iceberg-with-debezium-and-kafka-connect/
Still no consensus what to call these OTFs, lake house, or transaction datalakes, I dunno, but here is an article pro Iceberg – https://hightouch.com/blog/iceberg-rise-of-the-lakehouse

Lake Formation

Lake Formation, which still is a bit weird to me as one part of it is blue prints which we really don’t use, and the other part which deals with access control, rolled out some new changes with OTF integration and ACL

OTF Lake Formation Integration – https://aws.amazon.com/blogs/big-data/enforce-fine-grained-access-control-on-open-table-formats-via-amazon-emr-integrated-with-aws-lake-formation/

Summary about state of the OTF Market

It is still kind of mess, and there still really aren’t any clear winners. There are also multiple options where you can choose to go the open source branch or with a hosted provide with One House or Tabular.

The false promises of AWS announcements – S3 Express Zones

Around Re:invent, there are always a huge set of announcements, and one stood out, S3 Express Zones. This feature would allow retrieval of data in S3 in the single digit milliseconds with the tradeoffs of storage being in one zone (so no HA). You can imagine if this actually works, datalakes can hypothetically start competing with databases as we wouldn’t need to worry about the SLA time penalty you usually get with S3.

Looking at the restrictions there are some pretty significant drawbacks.

https://docs.aws.amazon.com/athena/latest/ug/querying-express-one-zone.html

As you can see here Hudi isn’t supported (not sure why Iceberg tables aren’t there), and Deltalake has partial support. The other consideration is this is in one zone, so you have to make sure there is a replicated bucket in a standard zone.

I kind of feel that Amazon seems to test the waters by launching not fully formed products, to get feedback from us. Unfortunately that makes us the guinea pigs

TLDR – This service works for Glue jobs, but for OTFs, it is dead in the water for the time being.

Amazon Q

I remember being in an AWS roundtable representing data consulting companies at Re:invent and a complaint from other firms was that Amazon had too many confusing products. As we are all guinea pigs in their early services, Amazon Q is no exception.

	Use Case	Features
Amazon Q for Business	Chatbot for internal enterprise data that is managed by Amazon. No dev work required	Chatbot
Amazon Q For Developers	Best for doing basic coding and coding with AWS specific services. Broader coding is probably better with a foundational model	Code completion – Code whisperer Chat – Amazon Q

TLDR

Amazon Q for business is a managed product where you click and add data sources and a chatbot is used
Amazon Q for developers contains Code completion (Code Whisperer) AND a chat in Visual Studio IDE with, yes, Amazon Q again as the chat. Confused yet?

Quicksight Q

I’d like to confuse you one more time with the history of Quicksight Q. Pre ChatGPT and LLM craze, Quicksight Q in 2021 went Generally Available (GA) being powered by Machine Learning

https://aws.amazon.com/about-aws/whats-new/2021/09/amazon-quicksight-q-generally-available

After Chat GPT came out, Quicksight Q went back into Preview

With LLM integration, but they kept the same name.

One of the things to really keep in mind is as you do your solutions architecture, you need to keep in mind of a service is in preview or GA. Things in preview typically only support a couple regions and don’t have production support. If you are interested in a service in preview (like Amazon Q), it is advisable to wait a bit.

A Framework for Processing Uber Uber Large Sets of Data – Theseus

I show this diagram very often, and as a refresher, a lot of the work we do in data engineering is yellow and in red, and often involves OTFS.

Voltron Data, who created a GPU Query Engine called Theseus, put out these benchmarks comparing their framework Theseus vs Spark

https://voltrondata.com/benchmarks/theseus

Image Credit: Voltran’s Blog¹
Their guidance also quite interesting

For less than 2TBs: We believe DuckDB and Arrow backed projects, DataFusion, and Polars make a lot of sense. This is probably the majority of datasets in the world and can be run most efficiently leveraging these state-of-the-art query systems.

For up to 30TBs: Well-known data warehouses like Snowflake, Google BigQuery, Databricks, and distributed processing frameworks like Spark and Trino work wonders at this scale.

For anything over 30TBs: This is where Theseus makes sense. Our minimum threshold to move forward requires 10TB queries (not datasets), but we prefer to operate when queries exceed 100TBs. This is an incredibly rare class of problem, but if you are feeling it, you know how quickly costs balloon, SLAs are missed, and tenuously the data pipeline is held together.

I mostly work in the AWS space, but it is interesting to peek on what innovations are going on outside of the space.

The author of Apache Arrow also made this observation

</= 1TB — DuckDB, Snowflake, DataFusion, Athena, Trino, Presto, etc.
1–10TB — Spark, Dask, Ray, etc.
10TB — hardware-accelerated processing (e.g., Theseus).
(citation credit link)

You might ask, what my guidance might be for the Amazon space?

< 100 gigabytes – your run of the mill RDS or Aurora
>= 100 gigabytes – 30 TB – Redshift, or OTF
>30 TB – We haven’t really played in this space but things like Apache Iceberg are probably better candidates

TLDR – you probably will never use Theseus, so this is just a fun article.

American Privacy Rights Act (APRA)

There was a bit of surprising news coming out of the US Congress that there is now draft legislation for a national data privacy rights for Americans. In the United States, data privacy has consisted of an odd patchwork of legislation state to state (like CCPA in California or the Colorado Privacy Act). The US really is quite behind in legislation as the rest of the world has some type of privacy legislation.

https://energycommerce.house.gov/posts/committee-chairs-rodgers-cantwell-unveil-historic-draft-comprehensive-data-privacy-legislation

Here are some draft highlights

Deletion Requests: Companies are required to delete personal data upon an individual’s request and must notify any third parties who have received this data to do the same.
Third-Party Notifications: Companies must inform third parties of any deletion requests, ensuring that these third parties also delete the relevant data.
Verification of Requests: Companies need to verify the identity of individuals who request data deletion or correction to ensure the legitimacy of these requests.
Exceptions to Deletion: There are specific conditions under which a company may refuse a deletion request, such as legal restrictions, implications for data security, or if it would affect the rights of others.
Technological and Cost Constraints: If it is technologically impossible or prohibitively expensive to comply with a deletion request, companies may decline the request but must provide a detailed explanation to the individual.
Frequency and Cost of Requests: Companies can allow individuals to exercise their deletion rights free of charge up to three times per year; additional requests may incur a reasonable fee.
Timely Response: Companies must respond to deletion requests within specified time frames, generally within 15 to 30 days, depending on whether they qualify as large data holders or not.

Who is this applicable for?

Large Data Holders: The Act defines a “large data holder” as a covered entity that, in the most recent calendar year, had annual gross revenue of not less than $250 million and, depending on the context, meets certain thresholds related to the volume of covered data processed. These thresholds include handling the covered data of more than 5 million individuals, 15 million portable connected devices identifying individuals, or 35 million connected devices that can be linked to individuals. Additionally, for handling sensitive covered data, the thresholds are more than 200,000 individuals, 300,000 portable connected devices, or 700,000 connected devices.
Small Business Exemptions: The Act specifies exemptions for small businesses. A small business is defined based on its average annual gross revenues over the past three years not exceeding $40 million and not collecting, processing, retaining, or transferring the covered data of more than 200,000 individuals annually for purposes other than payment collection. Furthermore, all covered data for such purposes must be deleted or de-identified within 90 days unless retention is necessary for fraud investigations or consistent with a return or warranty policy. A small business also must not transfer covered data to a third party in exchange for revenue or other considerations.

A while back I worked on a data engineering project which was exposed to the European GDPR. It was interesting because we had meetings with in-house counsel lawyers to discuss what kind of data policies they had in place. One of the facets of GDPR which is similar here is the ‘right to remove data.’

We entered some gray areas as when talking with lawyers the debate was occurring which data would be removed? Removing data from a database or data lake is clear if it contained customer data, but what if it was deeply nestled in Amazon Glacier?

I don’t really have any great answers, but if this legislation actually does pan out, it makes a strong case for large companies to use OTFs for their data lakes otherwise it would be extremely difficult to delete the data.

TLDR – if you are a solution architect, do ask what kind of data policy exposure they have. If this legislation does pass, please pay attention when you start your projects based in the USA whether this legislation is applicable to them based of the final legislation.

Citation Link and Credit For Talking About This – Hard Fork Podcast

Everything Else

Glue: Observability

The AWS Team recently put out a blog series on monitoring and debugging AWS Jobs using observability metrics.

https://aws.amazon.com/blogs/big-data/enhance-monitoring-and-debugging-for-aws-glue-jobs-using-new-job-observability-metrics-part-3-visualization-and-trend-analysis-using-amazon-quicksight/ (part 3)

DBT

The DBT team also released their 2024 state of analytics engineering (PDF here) –

TLDR, data quality is still of big concern
I’m surprised data mesh is still a thing, although it seems like it is only for big orgs according to the survey

AWS Exams:

AWS released a free training course on the Data Engineer Associate Exam

https://explore.skillbuilder.aws/learn/course/external/view/elearning/18546/exam-prep-standard-course-aws-certified-data-engineer-associate-dea-c01?trk=e6934e10-170d-4c94-bf7b-b88f95ed0f47&sc_channel=el

Also note the AWS Specialty Analytics and Database Specialty exams are being retired this month.

YADT (Yet Another Data Tool)
As if there weren’t enough tools on the market..

Kestra – Airflow competitor – https://github.com/kestra-io/kestra
Starrocks – An open source data warehouse (Redshift competitor) – https://www.starrocks.io/
Apache Superset – An alternative to paid BI (like Quicksight ) – https://docs.google.com/presentation/d/1GaIN0p6msfYm3ZzwPoV6q4HqARXi0003AxVyxqDs_jU/edit?ref=blef.fr#slide=id.g2ca5e1ff3e2_0_236
Puppy graph – Query your data as a graph – https://www.puppygraph.com/

Devon:

Fortunately uhh I don’t think anyone in our team is named Devon, but this video has been making its rounds the Internet as the first ‘AI software engineer’

https://www.youtube.com/watch?v=fjHtjT7GO1c

Just remember, Devon hasn’t taken our jobs…. yet.

Vietnam

Grandma and the Vietnam War
When I was young, friends would visit, and there was one photo on the shelf that caught their attention in my room. It was a photo of an elderly Caucasian lady and their first question to me was, “How come you didn’t take the stock photo out of the frame?” I replied that she was my grandma, and they became even more confused because they thought I was 100% Vietnamese, so why would my grandma be white?

In 1975, my mom was working in the Saigon Adventist Hospital in Vietnam, and around April 20, conditions were deteriorating quickly in the capital with rumors that the communists would take over soon. She witnessed firsthand horrors of the war working in the emergency room, with one memory of treating an 8-year old where a grenade had exploded near his head. Due to the severity of the injuries, the child passed away and she grieved heavily with her family.

Similar to the fall of Afghanistan in 2021, people became desperate to get out of the country, especially if they were associated with the Americans. Charter flights were leaving around the clock organized by the US State Department to evacuate as many people out of Vietnam as possible.

There was one charter flight where one lady was a no-show and my mom took her place. At that moment, she left everything behind, her family, her possessions, and was only left with a US $20 dollar bill given to her.

On the other side of the Pacific Ocean, my foster grandma, Beryl Bason heard calls from the Loma Linda Adventist church about sponsoring Vietnamese refugees to help get them on their feet. My grandma ended up hosting my mom, and two of her nursing school classmates in San Diego for a couple of months where they all went back to nursing to become certified nurses to work in the US.

My mom met my dad after immigrating to the US and settled in Orange County, California where it would end up having one of the biggest populations of Vietnamese people outside of Vietnam.

Return to the home land part 1

It is a bit strange, but according to my parents, my first language was actually Vietnamese. They were afraid I would be confused learning two languages, so they switched to speaking to me in English when I was young. Since then new research has shown kids can learn multiple languages without issue. Because I never learned Vietnamese formally, my proficiency was stunted, unlike my Spanish which I consider myself semi-fluent in due to four great years of education in high school.

There was a running joke that since I really didn’t look Vietnamese, my friends bought me a 23andme genetic test to settle the issue once and for all. Funnily enough, the first result of the test showed 1% speculative European, but the results eventually tightened up to confirm that my origins are indeed 100% Vietnamese.

Most kids of immigrants make some type of ‘return to the homeland’ type journey when they are young, and for me it was when I was 13. A priority for my parents was to meet my grandparents while they were still alive and to meet my extended family.

I have to admit, that was a rough trip. My mom’s hometown of Tam Kỳ wasn’t really well developed and I remember when I had to go to the bathroom, it was in an outhouse not unlike a camping trip. Hardly anyone spoke much English so I struggled talking with my cousins and just about everyone else

For some reason, my parents also had wacky expectations that after I went to Vietnam I would become fluent in Vietnamese. Let’s just say that didn’t happen; becoming fluent in something requires understanding the foundational basics of grammar, language, and some schooling which I didn’t get.

As an adult, Vietnam was never really on my radar to visit. I befriended some Europeans at a previous job, and they told me they spent months in the country visiting every nook and cranny. Because of my last trip, my memories of Vietnam were primarily associated with seeing family, extended family, and more family so I didn’t get to see the country on my own terms (although I would have been too young anyways to make my own decisions).

Return to the homeland part 2– 27 years later

In 2014, I took a trip to Mexico with my partner and parents since they had a time share in Cabo San Lucas. It was the first international-ish trip with my parents and I was a bit nervous as I had never traveled with them as adults.

I was pleasantly surprised that we all had a great time with each other, and the trip went really well. There was even a time where I said, let’s go snorkeling in Cabo Pulmo (about 2 hours away) and they were okay with the drive. On the way the road seemed to end and I took the right fork when I should have taken the left fork and got the car stuck in the ditch in the sand. There was nobody around, and I didn’t have any cell phone data (at that time North America plans didn’t exist yet). I don’t know why exactly but my partner and my dad pushed the car while I hit the accelerator and we got the car out of the ditch. After such an incident like that, I’m happy my parents didn’t disown me after such a scary incident. Happy to say, that snorkeling was probably some of the best I’ve ever seen, with a travel adventure to back it up.

Since then I’ve been intentional to travel with my parents as much as possible as I know they are getting older, and there will be a time they won’t be able to travel anymore due to their age or health. Unfortunately this type of thinking boded true as when my dad passed away, I am grateful that I could travel with him quite a bit.

In March 2020, we scheduled a trip to visit Vietnam all together, and it would have been 27 years since I had last visited. However at around February 2020 we were getting news that schools in Vietnam were getting shut down due to Covid. At the time there weren’t hard shut downs, but out of an abundance of caution we cancelled the trip.

In early 2023, my partner proposed we go to Vietnam all together around Thanksgiving time. My parents initially were going to go, but then declined to hang out with the grandkids. Initially I was relying on my parents to do a lot of the planning, but now that we were on our own we began doing research on what to do. My parents not going ended up being fortunate, as my dad got seriously sick around the same time, so it was good he was in North America.

With the Lonely Planet book, I began researching things to do and oddly enough realized that I actually didn’t know much about Vietnam in terms of the cities, regions, or even things to do. After much research (and talking to my cousins and friends), we decided to stick mainly in the north because it was drier, and the central area was rainy season so we didn’t spend too much time there.

Vietnamese – Forked

We started our trip in the city of Hanoi, which is the capital in the north. Probably the most important expression everyone learns when visiting a foreign country is, “where is the bathroom?”. My proficiency in Vietnamese is kind of like a bunch of lego blocks in my head with limited abilities of building certain structures. Everything kind of comes out in bits and pieces, but at the least I know how to say

Cầu tiêu ở đâu? (Where is the bathroom – but literal translation is ‘where is the toilet’)

Of course the waiter in the restaurant gives me a really confused look, and says, you mean

Nhà vệ sinh ở đâu? (‘Where is the bathroom – but literal translation is ‘where is the hygienic house’)

Now I give the puzzled look as I don’t understand. I’ve never heard of a bathroom called “Nhà vệ sinh” in my life growing up in Southern California at all.

One of the things which is important to realize is when the refugees came from Vietnam to North America and other parts of the world in 1975, two Vietnamese diaspora now existed in different locations around the world. One in Vietnam the main branch, and another branch in North America.

Talking to my mom and some relatives about this, my guess is the etymology of “where is the bathroom” in Vietnam pre 1975 probably was “Cầu tiêu”, and that at some time post 1975 it changed to “Nhà vệ sinh”. Even the English language, has radically changed much in 40 years. When I talk with the newest generation in high school, there are a bunch of words that I have no idea what they mean.

Soleil Ho, talks about this how Vietnamese food in North America is basically food from Vietnam from the 1970s. Again, it makes sense because the food traditions were from the initial wave of refugees.

I talked to a good friend living in Vietnam about this, and she mentioned how Vietnamese people living in North America now have a Vietnamese – North American accent. The Vietnamese spoken in North America comes off as a lighter tone and people in the north consider this tone as someone who has studied formally.

This kind of explains a weird situation I had in a grocery market. I was asking what was in the center of this candy, and the lady remarked in Vietnamese, “Your Vietnamese is so good, how many years did you study for?”. Inside I was dying because I didn’t have the heart to tell her that I was an overseas Vietnamese (Việt kiều Mỹ). I’m sure if she knew that she probably would have instead asked, “why is your Vietnamese so bad?”

The other surprising fork which I hadn’t really considered is how different the northern dialects are different from the southern dialects. I distinctly recall several friends learning Vietnamese through duolingo, and their parents asking why they are learning the northern accent. In the eyes of the north, their dialect is often viewed as the gold standard of speaking. Below are examples of English Word – Northern Vietnamese Dialect – Southern Vietnamese Dialect as explained by one of my friends in Vietnam.

English Word	Northern Vietnamese Dialect	Southern Vietnamese Dialect
Cup	cóc	ly
Fruit	hoa quả	trái cây
10 thousand	10 nghìn	10 ngàn
Pineapple	dứa	thơm
Passion fruit	chanh leo	chanh dây
Bowl	bắt	chến

You don’t have to be proficient in Vietnamese just to see these are totally different words. I am happy to report when I later went to my parent’s hometown Đà Nẵng (central Vietnam), I did have an easier time understanding and speaking to people.

Reconciliation

Maybe it isn’t me not giving Vietnam enough credit, but the museum scene in Hanoi was unexpectedly stellar. There is the Hoa Lo Prison Museum (made famous where John McCain was held as a POW), Ethnology Museum (about minority populations), Women’s Museum, Ho Chi Minh Museum, and the list actually does go on and on quite a bit.

One day we were kind of tired so we went to a museum right next to the hotel, aptly called the Hanoi Museum. On the second floor was an exhibit on the American War. That’s right, in Vietnam, it is not called the Vietnam War, it is called the American War. Much of the panels were spent talking about Americans as the aggressors, and the breaking of the Paris peace accords. However, the last panel discussed a lot about reconciliation between Vietnam and America.

It is amazing to me that 40 years later, formerly bitter enemies are now actually allies. In the midst of some of the grand global conflicts occurring now, it is helpful to have hope that I truly believe anything is possible in terms of peace.

Hà Giang Loop Tour

From family friends we heard about a trek that was not exactly off the beaten path, but not the first thing tourists do. There is a 3 day loop tour that most people rent motorcycles and drive up the Ma Li Peng pass bordering China. Given we didn’t want to be riding motorcycles on mountain roads for 3 days, we booked a 3 day car trip.

We booked with the Yesd travel agency (huge plug for them here, I do recommend this agency), where the tour guides consist of ethnic minorities of the region. Talking to the guide, there are over 54 ethnic minorities there. I was kind of shocked, and humbled that I really knew so little of Vietnam. The drive consisted of driving through spectacular greenery on the mountain roads and we took many stops for photos.

Our accommodations were at houses belonging to the ethnic tribes in the region. We first stayed with the Tay people, and I was pretty surprised that our accommodation, despite being on the second floor of a traditional wood structure had a shower, heat pump, and pretty fast wifi.

For dinner, we would eat with the families, and it was nice in a way that it wasn’t performative. The families didn’t talk about their lives or their minority status. It was just a regular dinner you would eat with a family. With the eco-tourism booming there, it’s just easier to act like you normally do when you have visitors and not need to put on a show.

On the second day we did a tour through a Hmong village. It was rural in its location, but not in the stereotypical sense because everyone there had fancy cell phones. The Hmong people there still adhered to their traditional of marrying early and their agricultural routines.

We sat down for tea with one of the Hmong shop owners when I noticed he was wearing a French beret. I kind of asked why he was wearing a beret, and he explained that from French colonial rule, this clothing item was introduced. Ever since that time, the beret in that region became traditional wear.

Thinking about this further, it really has helped me alter my thinking of the word ‘authentic’. There is this Vietnamese Instant Pot Facebook group which constantly argues about recipes not being ‘authentic’. Some claim it is only authentic when it is from the source home country. Others argue a recipe is authentic when you cater towards the spirit of it.

Rachel Ray a little while back then sowed a bit of controversy from her pho video. 2 years ago people were raging against her, “how dare she change the recipe, that’s not authentic!”.

Coming back the Hmong person I wonder, who is the arbiter of when something is authentic and traditional? After hearing his story, I think it’s difficult to nail down authenticity to a static period of time. And that the reality is the traditions change through time, and perhaps there is no such thing as authenticity.

Ha Long Bay

After Ha Giang, we went to Ha Long Bay, a large bay of water where cruise ships sail around for about 1-2 days. We took a shuttle and from Hanoi the trip was only 2 hours. Expecting the trip to be a straight shot, I was surprised we stopped half way for a ‘restroom stop’. The restroom stop was next to a pretty fancy gift shop.

At the check-in area of Ha Long Bay, there were tons of people representing a diverse set of people all over the world similar to an airport. The check-in area was a bit chaotic and you board a small boat to join the big boat.

On our particular cruise boat, we somehow joined everyone from Portugal as there were probably only two other couples who weren’t from Europe. We befriended one couple who was from New York while watching a cooking demonstration. The guy was half Vietnamese and half Indian, which was pretty fascinating to me as I have never met anyone of that mix. He was willing to entertain some of my questions on which side he felt most comfortable with. I imagine he probably gets asked this question quite a bit being mixed.

After dinner I stood towards the front of the boat looking at the view and the cruise manager was out there also. He looked pretty young, probably early 20s so I greeted him with “Chào em” (hello little brother). In Vietnamese when you talk to anybody you address people relative to the age of your parents, or relative to your own age. Snafus always happen about people incorrectly assuming age causing people to correct you on how you should address them.

I don’t know if that was a good or bad thing, but he took that as a cue that I was in Vietnamese and began speaking to me in Vietnamese. I explained to him my Vietnamese wasn’t that great and it was my first time visiting Vietnam since I was 13. He said something really surprising during the conversation said, “Welcome home, even though you forgot a lot of your Vietnamese, it will come back to you.”

We were pretty mixed about the experience especially coming after an awesome cultural and nature experience from Ha Giang. Ha Long Bay oddly enough was one of our least favorite parts of our trip more so for the feeling of having a Disneyland type experience of tons of people and long lines. Ha Giang in the next couple years will have a faster road built from Hanoi cutting the drive time to about 2 – 2.5 hours so expect tourism in that region to increase soon.

Da Nang

The last part of our trip was to the central area of Da Nang. My uncle (my dad’s brother) still lives in Da Nang, and my cousins flew from Saigon to meet us up there. On the first night being there, they took us out to eat at a famous Banh Xeo place.

Bánh Xèo is crispy turmeric rice crepe where the ingredients are either chicken, pork, or shrimp. The name of the dish is a fun play of words roughly translating to sizzling item. Bánh is kind of a weird weird as it could refer to a lot of things depending on the word it is paired with. My cousins know I don’t eat pork and shrimp, so I was surprised when they mentioned we could order Beef Banh Xeo.

I inquired further why there is a beef banh xeo, as I never heard of that before. Apparently because of the Korean influences into the city due to tourism and business, the people of Da Nang began changing their food to cater to the Koreans. Because of that, a couple new dishes emerged like Banh Xeo Bo and Mi Quang Bo. The latter dish is traditionally made of chicken, turmeric, and shrimp, but now they made a beef version. What previously was an authentic dish of only chicken, pork, and shrimp now added beef as an option.

Next to Da Nang is Hoi An, the famous town known for its lanterns at night and on the river. On the way there, my cousins wanted me to visit a street named after my grandfather, Quach Xan. I asked what he was known for, and she replied that he was a great leader for the country. Just kind of putting things together, I know that my dad and his brother had a classic story of joining different sides of the war and that would mean my grandfather was a communist.

I had kind of mixed feelings standing there on the street taking pictures by the street sign with an ode to my family name. As a great leader to Vietnam, I wonder what he did? What is interesting about Vietnam is that half of the population was born after the war so it is something that isn’t talked about much.

In Hanoi, we did a walking tour with a student from Hanoi University. We asked what his generation thought about politics and said that mostly everybody was apolitical and most were concerned more about their economic futures rather than the political state of the country.

When we think of wars, history tends to paint everything in black and white strokes. There were the good guys and bad guys. I inquired about my grandfather a little bit with my mom and the war in general and she said really back then, both sides were rather corrupt, and her opinion was neither side really had the people’s interest at heart.

During my time in Vietnam I had a sense that even with my cousins they didn’t want to talk about the war so it is a topic I didn’t approach.

My cousin gave me a quick history of Da Nang, explaining how the city managed to develop quickly and in general Vietnam has managed to raise the living standards of most people around the country. Da Nang used to be a poorer city, but not has attracted a lot of foreign business and tourists.

After the visit to my grandfather’s street we arrived in Hoi An. Half the city had been flooded (which apparently is pretty typical), so we walked around the areas we could. Hoi An is recommended by everyone to visit via people and guidebooks, but I found the city a bit touristy, almost something akin to Venice in Italy. It was overly crowded and touristy by day and early night, but as further nightfall set, there was a charm to the city as the crowds dissipated.

Áo Dài

On the last day of the trip there, I wanted to buy an áo dài (literally translates to long dress). It is a traditional Vietnamese dress that has gone through its own evolution. Originally it was only available in blue and red, but as time has progressed different fabrics, styles, and colors have emerged to make it more modern.

My cousin took me to a mini mall to shop around, and after great negotiation, I bought one that was beautifully hand decorated with a bit more of a modern sensibility. I asked my cousins how often they wear it, and they said not often at all. People used to wear it regularly growing up, but now it really is worn on very rare occasions. They continued to tell me that people now don’t even own any, they just rent it when they need it for pictures on occasions such as weddings.

Growing up I never wore an ao dai, and I kind of wonder why now do I feel the need to reconnect with this item. I wore it for the first time two years ago at international day at church where I asked to borrow my dad’s ao dai. It was very traditionally blue, and I think it was made of silk with a hat. It was a bit janky and needed a bunch of pins to hold it in place. I still have it at home next to the new one I bought from Vietnam, and I have a bit of sadness when I look at the blue ao dai as since my dad is no longer around I’m not really sure what to do with it. My intention was to give it back to him last year, but that never came to fruition.

I wore the new áo dài during Lunar New Year at church last year. Having grown up in an all-Vietnamese church, I’ve adjusted surprisingly quickly to being one of only a few Vietnamese people in my church in Vancouver. This transition makes me wonder: am I simply flexible with my identity, or is it still core to my heart?”

Some Closing Thoughts

Overall Vietnam was not what I initially expected. English for the most part was readily accessible, and everyone super kind. Food scene wise, we mostly stuck to what was advertised on the Vietnam Lonely Planet Book (which was a good foundation of exploring), and friend’s recommendations. We found the high-end food scene in Hanoi spectacular

Food in Hanoi

Cồ Đàm Chay – Vegetarian Tasting Menu
Luk Lak Restaurant
Marou Chocolate

Hanoi Things to Do In Advance

Book the water puppet show in advance as early as possible. It is a free reservation so there is no harm

Customs

Be aware for North America there is a requirement of a visa in advance. You can do this through an e-portal, although the site looked like it was made during the geocities era. Using one of the passport apps on your phone of taking a photo worked for us. If you plan to do multiple visits, there is a better multiple-entry visa you can get in your local country, otherwise a single entry works.
https://evisa.xuatnhapcanh.gov.vn/trang-chu-ttdt

Transit
In terms of getting around, the Grab app is essential. It is pretty much the Uber of Vietnam, but we had a lot of problems of using our foreign North American cards so be sure to bring multiple to get the app to work.

Sim Cards

This is a pretty big problem in the touristy cities. Sim Cards at the airport can be about 4-5x more expensive so you have to go a little bit outside of the main tourist areas to get regular rates. Be comforted though that they rip everyone off equally, locals and foreigners.

Tours

The most memorable trip we did was with Yesd, and we booked the 4 days, 3 nights with a car through Ha Giang

https://yesd.org/ha-giang-comfort-car-ride/

Money

As in all SE Asian countries, you get a better exchange rate if you bring brand new crisp bills. Any bills which have rips or tears get deducted at money exchangers.

Fish Sauce

The Vincom markets have a totally different selection of fish sauce you get from North America. Here the fanciest one we have is Red Boat 40, but they have several brands similar to that there which are able half the price. However, bringing fish sauce home is quite the risky endeavor.

Hope
I was happy to hear that throughout the country was a sense of hope. What I mean is most people seem optimistic about the future economic opportunities and that living standards have gradually improved throughout the years.